Analytics/Wikistats/DumpReports/Future per report

Work in progress

March 2016: this page provides an overview of existing Wikistats reports which focus on Wikimedia content and content contributors (better known to insiders as Wikistats' dump based reports), and seeks your input on which reports are most valuable to you. With your help the WMF Analytics Team can determine which reports should be migrated/replaced first, which later, or not at all.

Please add your signature to those reports you want to remain in some form in a new setup. (three tildes, if signed on).

Site map per project
For each Wikimedia project (Wikibooks,Wiktionary,Wikinews,Wikipedia,Wikiquote,Wikisource,Wikiversity,Wikivoyage,Other Projects) there is site-map page listing all languages. For each language it presents some links to other stats content, plus a set of basic metrics. This core set of metrics can be sorted by almost any column.


 * Keep
 * 1) Erik Zachte (talk)
 * 2) Liridon (talk)
 * 3) Titodutta (talk) 20:43, 3 May 2016 (UTC)
 * 4) Quite useful, some replacement can be found for Wikipedia but not for other projects — NickK (talk)
 * 5) Useful. Ijon (talk) 00:24, 4 May 2016 (UTC)


 * Drop

Note: The page also lists project-wide and Wikimedia-wide links to other content.



Reports per wiki
For more than 800 Wikimedia wikis there is a dedicated page with monthly counts on content and content creators. Arguably for many wikis some of these metrics are vital to assess the health of the editing community for that particular wiki. But the presentation is overcrowded, static, and somewhat disorganized. Broadly speaking these tables fall into two categories: 1) focus on content 2) focus on contributors, with the first table on the page (also the oldest) a hybrid between these two categories.

Main monthly trends, and quarterly rankings
Oldest Wikistats report, with several presentation layers, and as said a hybrid between content itself and content contributors
 * Year over year (YoY) for recent months
 * Absolute values for every month (or every first month of the quarter)
 * Rankings within this project (e.g. Wikipedia), with tiny wikis filtered

Note: some metrics in this first section are not up to date for large Wikipedias, as run time length of data collection became an obstacle.

Drop
 * Keep
 * 1) Erik Zachte (talk) Oldest Wikistats table. Some metrics are very often referred to, others just occasionally. The table is overly complex. I suggest redesign, strip down to essentials, or make it dynamic, where the user can specify which metrics to present, and in which way.
 * 2) Liridon (talk)
 * 3) Sander.v.Ginkel (talk) 21:08, 3 May 2016 (UTC) Most important basic stats, like average new pages per day. Redesign is suggested.
 * 4) Redesign, not drop. Almost all metrics in itself are useful, and users with >5 and >100 are especially valuable and not available elsewhere — NickK (talk)
 * 5) crucially important.  While it includes things I don't think anyone should care about these days (database size?), it must not be killed before there is demonstrably a stable, citable, alternative source for this data. Ijon (talk) 00:30, 4 May 2016 (UTC)
 * 1) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). I don't think this report itself is very useful. Some of the individual components are very useful (users with >5 and >100 edits, article count, and new articles per day) but they can be easily represented elsewhere.
 * 2) Denny (talk) 21:20, 3 May 2016 (UTC)

Breakdown of editors by activity level per month

 * Keep
 * 1) Erik Zachte (talk) Combine with chart version (Summary Report, see below).
 * 2) Very useful source of information, can be combined with charts — NickK (talk) 22:06, 3 May 2016 (UTC)
 * 3) useful. In particular, giving a bit more of a breakdown than the standard >5 and >100 is useful to assess the sizes of the various editing cohorts in a community. Ijon (talk) 00:31, 4 May 2016 (UTC)
 * Drop

Breakdown of editors by activity level for all time

 * Keep
 * 1) Erik Zachte (talk) Occasionally very useful, to show how relatively few people do most editing.
 * 2) Sander.v.Ginkel (talk) 21:11, 3 May 2016 (UTC) As by Erik Zachte. Maybe also created articles can be added.
 * 3) Denny (talk) 21:20, 3 May 2016 (UTC)
 * 4) Very simple table but very informative — NickK (talk)
 * 5) very useful to get a sense of community make-up. Ijon (talk) 00:32, 4 May 2016 (UTC)
 * Drop

Most prolific contributors
Now separate tables show top recently active editors, recently absent editors, bots and anons (~ ip addresses) (last one our of order)




 * Keep
 * 1) Erik Zachte (talk) Keep, but rebuild as dynamic report where user can choose to show active/sleeping users and bots in one table.
 * 2) Liridon (talk)
 * 3) A very popular report and a good source of information (some wikis have own bots updating similar reports but most do not). Making it dynamic (per Erik) would be a plus — NickK (talk) 22:09, 3 May 2016 (UTC)
 * 4) Important. Frequently used at WMF. Ijon (talk) 00:33, 4 May 2016 (UTC)


 * Drop

Breakdown of articles by size

 * Keep
 * 1) It can make sense to represent this as a pie chart: say, X% of articles of a given wiki are between 2^N and 2^(N+1) bytes. Not very useful in current state, but keep if can be revived — NickK (talk) 22:12, 3 May 2016 (UTC)
 * Drop
 * 1) Erik Zachte (talk)
 * 2) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). Agreed, not useful.
 * 3) Sander.v.Ginkel (talk) 21:10, 3 May 2016 (UTC)

Article count per namespace

 * Keep
 * 1) Erik Zachte (talk)
 * 2) Discovered this report for the first time and found very useful. I have not seen page count in other namespaces and number of binaries per extension anywhere else — NickK (talk)
 * 3) useful for learning about community development. Ijon (talk) 00:35, 4 May 2016 (UTC)
 * Drop
 * 1) Liridon (talk)

Most edited articles
Currently out of order.
 * Revive
 * 1) Erik Zachte (talk)
 * 2) Yes please — NickK (talk) 22:16, 3 May 2016 (UTC)
 * 3) useful. Ijon (talk) 00:35, 4 May 2016 (UTC)
 * Discard

Articles with most contributors (aka ZeitGeist)



 * Keep
 * 1) Erik Zachte (talk) Best section on this page to bring some 'color' to the wiki. Note how this is not about most edited articles, but articles which have most contributors.
 * 2) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). When people doing communications ask for "most edited articles", this almost always what they want.
 * 3) Very interesting section, widely used for communications about most edited articles. In addition it is a very valuable source of information, as articles there usually represent current events, popular topics or lamest edit wars, thus analysing most edited articles several years ago is very interesting — NickK (talk)
 * 4) Interesting, and historically valuable. Ijon (talk) 00:36, 4 May 2016 (UTC)


 * Drop

Total active editors for all Wikimedia wikis combined
Totals are deduplicated (each person counts only once). Plus Month over Month, Year over Year, Percentage of maximum value ever.


 * Keep
 * 1) Erik Zachte (talk) Often refered to in discussions.
 * 2) Denny (talk) 21:20, 3 May 2016 (UTC)
 * 3) per Erik — NickK (talk)
 * 4) yes.  Part of WMF's core "report card" metrics. Ijon (talk) 00:36, 4 May 2016 (UTC)
 * Drop

Summary reports
Some key metrics (with MoM and YoY), but mostly charts.

Scope:
 * A set of metrics, for one wiki (e.g. Commons)
 * A set of metrics, for all wikis in one project combined (e.g. Wikivoyage, see first table)
 * One metric, across all projects (e.g. Active Wikis Per Project)




 * Keep
 * 1) Erik Zachte (talk) (I would love to see a mobile version)
 * 2) Very informative and easy to understand (I particularly love to use the provided example as an illustration of the impact of Wiki Loves on Wikimedia Commons) — NickK (talk)
 * 3) super useful.  Probably the feature I use most, day to day.  Fastest way to get a quick (but surprisingly robust) idea about a particular community. Ijon (talk) 00:37, 4 May 2016 (UTC)
 * Drop

Bar charts per wiki
These metrics correspond 1:1 to the columns in the first table above (the hybrid table): Main monthly trends, and quarterly rankings.

These charts with one bar per month have become too unwieldy, and span several screens, even on a large monitor.




 * Keep
 * 1) Redesign into something nicer, like Commons charts — NickK (talk) 22:33, 3 May 2016 (UTC)
 * Drop


 * 1) Erik Zachte (talk) Either drop, or make more compact. Quarterly (Jan/Apr/Jul/Oct) or half-yearly samples (or averages) could still work.
 * 2) Ijon (talk) 23:45, 3 May 2016 (UTC) I've never found these useful, given the data is available in the table

Comparisons per project
Again, these metrics correspond 1:1 to the columns in the first table above (the hybrid table): Main monthly trends, and quarterly rankings.

These tables in particular are unwieldy, slow to download and display (there is a javascript macro behind every cell, to optimize download size). The monthly granularity is too small. The number of columns too large (280+ for Wikipedia).

Yet the cell coloring can help to quickly spot anomalies. BTW different reports use different cell coloring scheme, without legend (bug, there was one long ago).




 * 1) Erik Zachte (talk) Either drop, or make more compact. Quarterly (Jan/Apr/Jul/Oct) or half-yearly samples (or averages) could still work. Showing only a selection of columns could also help (there are predefined selections for languages spoken in one continent, e.g. Africa), but the languages selected do still show global stats, which is somewhat confusing)
 * 2) Denny (talk) 21:21, 3 May 2016 (UTC)
 * 3) I sometimes use those: they are useful even though they are big. Showing only a selection of columns can be a solution (either split into several pages with something like 20 wikis per page or user-defined selections). Alternatively, we can keep online just a few first rows and make the entire table available as CSV or similar — NickK (talk) 22:36, 3 May 2016 (UTC)

Bot activity per project
For each project there are two reports on bot activity, one about article edits, one about article creations.


 * Keep
 * 1) Erik Zachte (talk) very useful to monitor bot activity per wiki.
 * 2) I am not aware of any other tool to measure bot activity — NickK (talk)
 * 3) useful. Keep, per NickK Ijon (talk) 23:47, 3 May 2016 (UTC)
 * Drop

Edits and reverts per wiki
Tables and charts on edit activity (registered users, anonymous users, bots). Also breakdown by type of edit/revert and type of reverted editor. Also most often reverted editors, most frequently reverting editors.


 * Keep
 * 1) Erik Zachte (talk) particularly useful to monitor long term edit trends
 * 2) Did not know that it existed, very interesting information — NickK (talk) 23:21, 3 May 2016 (UTC)
 * 3) Useful, though indeed, under-utilized because under-discovered. Ijon (talk) 23:52, 3 May 2016 (UTC)


 * Drop

Special reports for Wikibooks (activity and content)
Rankings (by size, edits, authors and chapter counts), Tables of Content, Chapter Sizes. Not updated since 2011, but worth remembering.



Erik Zachte (talk) Revive? I don't know. If only people knew about these reports. But it is a medium size project in its own right, with lots of information.


 * Revive
 * Drop
 * 1) not useful enough to be worth the time Ijon (talk) 00:00, 4 May 2016 (UTC)

Animation on growth per Wikimedia wiki
Generate new input for this animation from time to time




 * Keep
 * 1) Erik Zachte (talk) Used in keynotes
 * 2) Good visualisation, useful as an illustration of growth of different projects — NickK (talk) 23:24, 3 May 2016 (UTC)
 * Drop
 * 1) I would like the successor of Wikistats to focus on making available data that is hard/impossible for volunteer developers to produce.  Visualizations can always be created on top of available data, so are far less important, in my opinion, to prioritize for a WMF effort.  Since I understand this exercise (this opinion-poll) to be input for WMF prioritization, I'd hate to lose an actual data source because folks found this visualization of existing data fun (I think it's fun too). Ijon (talk) 00:21, 4 May 2016 (UTC)

State of the wiki, current values for many metrics across one project



 * Keep
 * 1) A good set of statistics that can be usable, but this table really needs to be sortable. Keep if can be converted into something sortable or more dynamic, drop otherwise — NickK (talk) 23:27, 3 May 2016 (UTC)
 * Drop
 * 1) Erik Zachte (talk) Too unwieldy in its current form. A more dynamic reporting tool (querying a Wikistats database) would bring focus. Also, I copied a few more columns to Sitemap page which is sortable. Preview new version here.

Recent trends within one project



 * Keep
 * 1) Visualisation of such trends is useful. It would be better to keep several key metrics (new articles per month, new editors etc.) and have a comparison (and/or ranking) for main wikis. The current format is unfortunately far from easy to understand — NickK (talk) 23:32, 3 May 2016 (UTC)
 * Drop
 * 1) Erik Zachte (talk) Too much, too late (= useful in early years). Now too unwieldy in its current form.
 * 2) Not as useful, these days. Ijon (talk) 00:22, 4 May 2016 (UTC)

Category trees
Static reports. Very outdated (2009). Very unwieldy. Better tools available now.


 * Revive
 * Drop
 * 1) Erik Zachte (talk)
 * 2) Obsolete. HTML is a very bad format for visualising trees — NickK (talk) 23:33, 3 May 2016 (UTC)
 * 3) Obsolete. Ijon (talk) 00:22, 4 May 2016 (UTC)

Wikis ranked by creation date, colored by growth rate
Table lists Wikipedias by creation month, and colors by growth rate in contributors, then again in second table by growth rate in articles.


 * Keep
 * 1) Erik Zachte (talk) Keep in some form, although the visualization could be much more awesome.
 * Drop
 * 1) I did not get the point.  seems to be very close to this but interactive. Perhaps merge if some features are still not available in this animation — NickK (talk)
 * 2) not useful enough. Ijon (talk) 00:23, 4 May 2016 (UTC)