Analytics/Wikistats/DumpReports/Future per report

Work in progress

March 2016: this page provides an overview of existing Wikistats reports which focus on Wikimedia content and content contributors (better known to insiders as Wikistats' dump based reports), and seeks your input on which reports are most valuable to you. With your help the WMF Analytics Team can determine which reports should be migrated/replaced first, which later, or not at all.

Please add your signature to those reports you want to remain in some form in a new setup. (three tildes, if signed on).

Site map per project
For each Wikimedia project (Wikibooks,Wiktionary,Wikinews,Wikipedia,Wikiquote,Wikisource,Wikiversity,Wikivoyage,Other Projects) there is site-map page listing all languages. For each language it presents some links to other stats content, plus a set of basic metrics. This core set of metrics can be sorted by almost any column.


 * Keep
 * 1) Erik Zachte (talk)


 * Drop

Note: The page also lists project-wide and Wikimedia-wide links to other content.



Reports per wiki
For more than 800 Wikimedia wikis there is a dedicated page with monthly counts on content and content creators. Arguably for many wikis some of these metrics are vital to assess the health of the editing community for that particular wiki. But the presentation is overcrowded, static, and somewhat disorganized. Broadly speaking these tables fall into two categories: 1) focus on content 2) focus on contributors, with the first table on the page (also the oldest) a hybrid between these two categories.

Main monthly trends, and quarterly rankings
Oldest Wikistats report, with several presentation layers, and as said a hybrid between content itself and content contributors
 * Year over year (YoY) for recent months
 * Absolute values for every month (or every first month of the quarter)
 * Rankings within this project (e.g. Wikipedia), with tiny wikis filtered

Note: some metrics in this first section are not up to date for large Wikipedias, as run time length of data collection became an obstacle.

Drop
 * Keep
 * 1) Erik Zachte (talk) Oldest Wikistats table. Some metrics are very often referred to, others just occasionally. The table is overly complex. I suggest redesign, strip down to essentials, or make it dynamic, where the user can specify which metrics to present, and in which way.
 * 1) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). I don't think this report itself is very useful. Some of the individual components are very useful (users with >5 and >100 edits, article count, and new articles per day) but they can be easily represented elsewhere.

Breakdown of editors by activity level per month

 * Keep
 * 1) Erik Zachte (talk) Combine with chart version (Summary Report, see below).


 * Drop

Breakdown of editors by activity level for all time

 * Keep
 * 1) Erik Zachte (talk) Occasionally very useful, to show how relatively few people do most editing.
 * Drop

Most prolific contributors
Now separate tables show top recently active editors, recently absent editors, bots and anons (~ ip addresses) (last one our of order)




 * Keep
 * 1) Erik Zachte (talk) Keep, but rebuild as dynamic report where user can choose to show active/sleeping users and bots in one table.


 * Drop

Breakdown of articles by size

 * Keep
 * Drop
 * 1) Erik Zachte (talk)
 * 2) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). Agreed, not useful.

Article count per namespace

 * Keep
 * 1) Erik Zachte (talk)
 * Drop

Most edited articles
Currently out of order.
 * Revive
 * 1) Erik Zachte (talk)

Articles with most contributors (aka ZeitGeist)



 * Keep
 * 1) Erik Zachte (talk) Best section on this page to bring some 'color' to the wiki. Note how this is not about most edited articles, but articles which have most contributors.
 * 2) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). When people doing communications ask for "most edited articles", this almost always what they want.
 * Drop

Total active editors for all Wikimedia wikis combined
Totals are deduplicated (each person counts only once). Plus Month over Month, Year over Year, Percentage of maximum value ever.


 * Keep
 * 1) Erik Zachte (talk) Often refered to in discussions.


 * Drop

Summary reports
Some key metrics (with MoM and YoY), but mostly charts.

Scope:
 * A set of metrics, for one wiki (e.g. Commons)
 * A set of metrics, for all wikis in one project combined (e.g. Wikivoyage, see first table)
 * One metric, across all projects (e.g. Active Wikis Per Project)




 * Keep
 * 1) Erik Zachte (talk) (I would love to see a mobile version)


 * Drop

Bar charts per wiki
These metrics correspond 1:1 to the columns in the first table above (the hybrid table): Main monthly trends, and quarterly rankings.

These charts with one bar per month have become too unwieldy, and span several screens, even on a large monitor.




 * Keep
 * Drop


 * 1) Erik Zachte (talk) Either drop, or make more compact. Quarterly (Jan/Apr/Jul/Oct) or half-yearly samples (or averages) could still work.

Comparisons per project
Again, these metrics correspond 1:1 to the columns in the first table above (the hybrid table): Main monthly trends, and quarterly rankings.

These tables in particular are unwieldy, slow to download and display (there is a javascript macro behind every cell, to optimize download size). The monthly granularity is too small. The number of columns too large (280+ for Wikipedia).

Yet the cell coloring can help to quickly spot anomalies. BTW different reports use different cell coloring scheme, without legend (bug, there was one long ago).




 * 1) Erik Zachte (talk) Either drop, or make more compact. Quarterly (Jan/Apr/Jul/Oct) or half-yearly samples (or averages) could still work. Showing only a selection of columns could also help (there are predefined selections for languages spoken in one continent, e.g. Africa), but the languages selected do still show global stats, which is somewhat confusing)

Bot activity per project
For each project there are two reports on bot activity, one about article edits, one about article creations.


 * Keep
 * 1) Erik Zachte (talk) very useful to monitor bot activity per wiki.


 * Drop

Special reports for Wikibooks (activity and content)
Rankings (by size, edits, authors and chapter counts), Tables of Content, Chapter Sizes. Not updated since 2011, but worth remembering.



Erik Zachte (talk) Revive? I don't know. If only people knew about these reports. But it is a medium size project in its own right, with lots of information.


 * Revice
 * Drop

Animation on growth per Wikimedia wiki
Generate new input for this animation from time to time




 * Keep
 * 1) Erik Zachte (talk) Used in keynotes
 * Drop

Animated geographic breakdowns of edits per day
This animation shows the geographic spread of a full day of edits to any Wikipedia. Several representations of the data can be shown. See blog post. Currently it uses a fixed dataset with edits from July 29, 2011.

In the animation switch to any of 5 modes by pressing 1,2,3,4 or 5


 * Keep
 * 1) Erik Zachte (talk) Preferably upgrade, so that it uses yesterday's edits, drawn from hadoop.
 * Drop

State of the wiki, current values for many metrics across one project



 * Keep
 * Drop
 * 1) Erik Zachte (talk) Too unwieldy in its current form. A more dynamic reporting tool (querying a Wikistats database) would bring focus. Also, I copied a few more columns to Sitemap page which is sortable. Preview new version here.

Recent trends within one project



 * Keep
 * Drop
 * 1) Erik Zachte (talk) Too much, too late (= useful in early years). Now too unwieldy in its current form.

Category trees
Static reports. Very outdated (2009). Very unwieldy. Better tools available now.


 * Revive
 * Drop
 * 1) Erik Zachte (talk)

Wikis ranked by creation date, colored by growth rate
Table lists Wikipedias by creation month, and colors by growth rate in contributors, then again in second table by growth rate in articles.


 * Keep
 * 1) Erik Zachte (talk) Keep in some form, although the visualization could be much more awesome.
 * Drop