Analytics/Wikistats/DumpReports/Future per report

Work in progress

March 2016: this page provides an overview of existing Wikistats reports which focus on Wikimedia content and content contributors (better known to insiders as Wikistats' dump based reports), and seeks your input on which reports are most valuable to you. With your help the WMF Analytics Team can determine which reports should be migrated/replaced first, which later, or not at all.

Please add your signature to those reports you want to remain in some form in a new setup. (three tildes, if signed on).

Reports per wiki
For more than 800 Wikimedia wikis there is a dedicated page with monthly counts on content and content creators. Arguably for many wikis some of these metrics are vital to assess the health of the editing community for that particular wiki. But the presentation is overcrowded, static, and somewhat disorganized. Broadly speaking these tables fall into two categories: 1) focus on content 2) focus on contributors, with the first table on the page (also the oldest) a hybrid between these two categories.

Main monthly trends, and quarterly rankings
Oldest Wikistats report, with several presentation layers, and as said a hybrid between content itself and content contributors
 * Year over year (YoY) for recent months
 * Absolute values for every month (or every first month of the quarter)
 * Rankings within this project (e.g. Wikipedia), with tiny wikis filtered

Note: some metrics in this first section are not up to date for large Wikipedias, as run time length of data collection became an obstacle.

Drop
 * Keep
 * 1) Erik Zachte (talk) Oldest Wikistats table. Some metrics are very often referred to, others just occasionally. The table is overly complex. I suggest redesign, strip down to essentials, or make it dynamic, where the user can specify which metrics to present, and in which way.
 * 1) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). I don't think this report itself is very useful. Some of the individual components are very useful (users with >5 and >100 edits, article count, and new articles per day) but they can be easily represented elsewhere.

Breakdown of editors by activity level per month

 * Keep
 * 1) Erik Zachte (talk) Combine with chart version (Summary Report, see below).

Breakdown of editors by activity level for all time

 * Keep
 * 1) Erik Zachte (talk) Occasionally very useful, to show how relatively few people do most editing.

Most prolific contributors
Now separate tables show top recently active editors, recently absent editors, bots and anons (~ ip addresses) (last one our of order)




 * Keep
 * 1) Erik Zachte (talk) Keep, but rebuild as dynamic report where user can choose to show active/sleeping users and bots in one table.

Breakdown of articles by size

 * Keep
 * Drop
 * 1) Erik Zachte (talk)
 * 2) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). Agreed, not useful.

Article count per namespace

 * Keep
 * 1) Erik Zachte (talk)

Most edited articles
Currently out of order.
 * Revive
 * 1) Erik Zachte (talk)

Articles with most contributors (aka ZeitGeist)

 * Keep
 * 1) Erik Zachte (talk) Best section on this page to bring some 'color' to the wiki. Note how this is not about most edited articles, but articles which have most contributors.
 * 2) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). When people doing communications ask for "most edited articles", this almost always what they want.

Summary reports
Some key metrics (with MoM and YoY), but mostly charts.

Scope:
 * A set of metrics, for one wiki (e.g. Commons)
 * A set of metrics, for all wikis in one project combined (e.g. Wikivoyage, see first table)
 * One metric, across all projects (e.g. Active Wikis Per Project)




 * Keep
 * 1) Erik Zachte (talk) (I would love to see a mobile version)

Bar charts per wiki
These metrics correspond 1:1 to the columns in the first table above (the hybrid table): Main monthly trends, and quarterly rankings.

These charts with one bar per month have become too unwieldy, and span several screens, even on a large monitor.




 * Keep
 * Drop
 * 1) Erik Zachte (talk) Either drop, or make more compact. Quarterly (Jan/Apr/Jul/Oct) or half-yearly samples (or averages) could still work.

Comparisons per project
Again, these metrics correspond 1:1 to the columns in the first table above (the hybrid table): Main monthly trends, and quarterly rankings.

These tables in particular are unwieldy, slow to download and display (there is a javascript macro behind every cell, to optimize download size). The monthly granularity is too small. The number of columns too large (280+ for Wikipedia).

Yet the cell coloring can help to quickly spot anomalies. BTW different reports use different cell coloring scheme, without legend (bug, there was one long ago).




 * 1) Erik Zachte (talk) Either drop, or make more compact. Quarterly (Jan/Apr/Jul/Oct) or half-yearly samples (or averages) could still work. Showing only a selection of columns could also help (there are predefined selections for languages spoken in one continent, e.g. Africa), but the languages selected do still show global stats, which is somewhat confusing)

Bot activity per project
For each project there are two reports on bot activity, one about article edits, one about article creations.


 * Keep
 * 1) Erik Zachte (talk) very useful to monitor bot activity per wiki.

State of the wiki, current values for many metrics across one project



 * Keep
 * Drop
 * 1) Erik Zachte (talk) Too unwieldy in its current form. A more dynamic reporting tool (querying a Wikistats database) would bring focus.

Recent trends within one project



 * Keep
 * Drop
 * 1) Erik Zachte (talk) Too much, too late (= useful in early years). Now too unwieldy in its current form.

Category trees
(no image) Static reports. Very outdated (2009). Very unwieldy. Better tools available now.
 * Drop
 * 1) Erik Zachte (talk)