Analytics/Wikistats/DumpReports/Future per report

Work in progress

March 2016: this page provides an overview of existing Wikistats reports which focus on Wikimedia content and content contributors (better known to insiders as Wikistats' dump based reports), and seeks your input on which reports are most valuable to you. With your help the WMF Analytics Team can determine which reports should be migrated/replaced first, which later, or not at all.

Please add your signature to those reports you want to remain in some form in a new setup. (three tildes, if signed on).

Site map per project
For each Wikimedia project (Wikibooks,Wiktionary,Wikinews,Wikipedia,Wikiquote,Wikisource,Wikiversity,Wikivoyage,Other Projects) there is site-map page listing all languages. For each language it presents some links to other stats content, plus a set of basic metrics. This core set of metrics can be sorted by almost any column.


 * Keep
 * 1) Erik Zachte (talk)
 * 2) Liridon (talk)
 * 3) Titodutta (talk) 20:43, 3 May 2016 (UTC)
 * 4) Quite useful, some replacement can be found for Wikipedia but not for other projects — NickK (talk)
 * 5) Useful. Ijon (talk) 00:24, 4 May 2016 (UTC)
 * 6) Like to see my progress, also in comparison with others. --Hekaheka (talk) 07:43, 4 May 2016 (UTC)
 * 7) Anthere (talk) 07:56, 4 May 2016 (UTC)


 * Drop

Note: The page also lists project-wide and Wikimedia-wide links to other content.



Reports per wiki
For more than 800 Wikimedia wikis there is a dedicated page with monthly counts on content and content creators. Arguably for many wikis some of these metrics are vital to assess the health of the editing community for that particular wiki. But the presentation is overcrowded, static, and somewhat disorganized. Broadly speaking these tables fall into two categories: 1) focus on content 2) focus on contributors, with the first table on the page (also the oldest) a hybrid between these two categories.

Main monthly trends, and quarterly rankings
Oldest Wikistats report, with several presentation layers, and as said a hybrid between content itself and content contributors
 * Year over year (YoY) for recent months
 * Absolute values for every month (or every first month of the quarter)
 * Rankings within this project (e.g. Wikipedia), with tiny wikis filtered

Note: some metrics in this first section are not up to date for large Wikipedias, as run time length of data collection became an obstacle.


 * Keep
 * 1) Erik Zachte (talk) Oldest Wikistats table. Some metrics are very often referred to, others just occasionally. The table is overly complex. I suggest redesign, strip down to essentials, or make it dynamic, where the user can specify which metrics to present, and in which way.
 * 2) Liridon (talk)
 * 3) Sander.v.Ginkel (talk) 21:08, 3 May 2016 (UTC) Most important basic stats, like average new pages per day. Redesign is suggested.
 * 4) Redesign, not drop. Almost all metrics in itself are useful, and users with >5 and >100 are especially valuable and not available elsewhere — NickK (talk)
 * 5) crucially important.  While it includes things I don't think anyone should care about these days (database size?), it must not be killed before there is demonstrably a stable, citable, alternative source for this data. Ijon (talk) 00:30, 4 May 2016 (UTC)
 * 6) Per Ijon. Ed [talk] [en:majestic titan] 01:46, 4 May 2016 (UTC)
 * 7) MFriedman (talk) 07:45, 4 May 2016 (UTC) Per Erik Zachte.
 * 8) Effeietsanders (talk) 07:46, 4 May 2016 (UTC) These data are very helpful to quickly get a grasp of a project, and give helpful information that can be used in PR and conversations with external partners. They care less about the whole, and much more about the individual projects and how those developed.
 * 9) Anthere (talk) 07:55, 4 May 2016 (UTC)
 * Drop
 * 1) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). I don't think this report itself is very useful. Some of the individual components are very useful (users with >5 and >100 edits, article count, and new articles per day) but they can be easily represented elsewhere.
 * 2) Denny (talk) 21:20, 3 May 2016 (UTC)


 * Other
 * 1) If I had to choose, I'd prefer to keep the comparisons which display this information by metric, with more context. But are the reports really the main cost? I'd think most work would go into calculating these metrics (i.e. in the WikiCounts scripts). Nemo 06:37, 4 May 2016 (UTC)

Breakdown of editors by activity level per month

 * Keep
 * 1) Erik Zachte (talk) Combine with chart version (Summary Report, see below).
 * 2) Very useful source of information, can be combined with charts — NickK (talk) 22:06, 3 May 2016 (UTC)
 * 3) useful. In particular, giving a bit more of a breakdown than the standard >5 and >100 is useful to assess the sizes of the various editing cohorts in a community. Ijon (talk) 00:31, 4 May 2016 (UTC)
 * 4) Anthere (talk) 07:55, 4 May 2016 (UTC)
 * Drop

Breakdown of editors by activity level for all time

 * Keep
 * 1) Erik Zachte (talk) Occasionally very useful, to show how relatively few people do most editing.
 * 2) Sander.v.Ginkel (talk) 21:11, 3 May 2016 (UTC) As by Erik Zachte. Maybe also created articles can be added.
 * 3) Denny (talk) 21:20, 3 May 2016 (UTC)
 * 4) Very simple table but very informative — NickK (talk)
 * 5) very useful to get a sense of community make-up. Ijon (talk) 00:32, 4 May 2016 (UTC)
 * Drop

Most prolific contributors
Now separate tables show top recently active editors, recently absent editors, bots and anons (~ ip addresses) (last one our of order)




 * Keep
 * 1) Erik Zachte (talk) Keep, but rebuild as dynamic report where user can choose to show active/sleeping users and bots in one table.
 * 2) Liridon (talk)
 * 3) A very popular report and a good source of information (some wikis have own bots updating similar reports but most do not). Making it dynamic (per Erik) would be a plus — NickK (talk) 22:09, 3 May 2016 (UTC)
 * 4) Important. Frequently used at WMF. Ijon (talk) 00:33, 4 May 2016 (UTC)
 * 5) Simple, but extremely important to get in touch with a community. I always tech to use this in cross-wiki communication, to find the appropriate person (of course the numbers alone don't tell, but this table gives a good starting point with all the nice links and summary of data). Nemo 06:34, 4 May 2016 (UTC)
 * 6) --Hekaheka (talk) 07:47, 4 May 2016 (UTC)
 * 7) Anthere (talk) 07:54, 4 May 2016 (UTC)


 * Drop

Breakdown of articles by size

 * Keep
 * 1) It can make sense to represent this as a pie chart: say, X% of articles of a given wiki are between 2^N and 2^(N+1) bytes. Not very useful in current state, but keep if can be revived — NickK (talk) 22:12, 3 May 2016 (UTC)
 * Drop
 * 1) Erik Zachte (talk)
 * 2) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). Agreed, not useful.
 * 3) Sander.v.Ginkel (talk) 21:10, 3 May 2016 (UTC)

Article count per namespace

 * Keep
 * 1) Erik Zachte (talk)
 * 2) Discovered this report for the first time and found very useful. I have not seen page count in other namespaces and number of binaries per extension anywhere else — NickK (talk)
 * 3) useful for learning about community development. Ijon (talk) 00:35, 4 May 2016 (UTC)
 * Drop
 * 1) Liridon (talk)

Most edited articles
Currently out of order.
 * Revive
 * 1) Erik Zachte (talk)
 * 2) Yes please — NickK (talk) 22:16, 3 May 2016 (UTC)
 * 3) useful. Ijon (talk) 00:35, 4 May 2016 (UTC)
 * 4) Ed [talk] [en:majestic titan] 01:47, 4 May 2016 (UTC)


 * Discard
 * 1) Useless trivia. Nemo 06:31, 4 May 2016 (UTC)

Articles with most contributors (aka ZeitGeist)



 * Keep
 * 1) Erik Zachte (talk) Best section on this page to bring some 'color' to the wiki. Note how this is not about most edited articles, but articles which have most contributors.
 * 2) Neil P. Quinn-WMF (talk) 23:39, 31 March 2016 (UTC). When people doing communications ask for "most edited articles", this almost always what they want.
 * 3) Very interesting section, widely used for communications about most edited articles. In addition it is a very valuable source of information, as articles there usually represent current events, popular topics or lamest edit wars, thus analysing most edited articles several years ago is very interesting — NickK (talk)
 * 4) Interesting, and historically valuable. Ijon (talk) 00:36, 4 May 2016 (UTC)
 * 5) Ed [talk] [en:majestic titan] 01:47, 4 May 2016 (UTC)
 * 6) Particularly useful also to find interesting/important discussions on backstage wikis. Nemo 06:32, 4 May 2016 (UTC)
 * 7) Anthere (talk) 07:41, 4 May 2016 (UTC)
 * Drop

Total active editors for all Wikimedia wikis combined
Totals are deduplicated (each person counts only once). Plus Month over Month, Year over Year, Percentage of maximum value ever.


 * Keep
 * 1) Erik Zachte (talk) Often refered to in discussions.
 * 2) Denny (talk) 21:20, 3 May 2016 (UTC)
 * 3) per Erik — NickK (talk)
 * 4) yes.  Part of WMF's core "report card" metrics. Ijon (talk) 00:36, 4 May 2016 (UTC)
 * 5) Anthere (talk) 07:53, 4 May 2016 (UTC)
 * Drop
 * Other
 * 1) Useful, but lower in priority than the comparison tables (which also contain this information). The work, in both cases, is mostly in analysis of the deduplication method, though; not as much in the specific way the numbers are presented. Nemo 06:31, 4 May 2016 (UTC)

Summary reports
Some key metrics (with MoM and YoY), but mostly charts.

Scope:
 * A set of metrics, for one wiki (e.g. Commons)
 * A set of metrics, for all wikis in one project combined (e.g. Wikivoyage, see first table)
 * One metric, across all projects (e.g. Active Wikis Per Project)




 * Keep
 * 1) Erik Zachte (talk) (I would love to see a mobile version)
 * 2) Very informative and easy to understand (I particularly love to use the provided example as an illustration of the impact of Wiki Loves on Wikimedia Commons) — NickK (talk)
 * 3) super useful.  Probably the feature I use most, day to day.  Fastest way to get a quick (but surprisingly robust) idea about a particular community. Ijon (talk) 00:37, 4 May 2016 (UTC)
 * 4) Anthere (talk) 07:43, 4 May 2016 (UTC)
 * Drop
 * 1) I've never used these. If I have to choose, I prefer the plots with all wikis (those which were disabled). Nemo 06:29, 4 May 2016 (UTC)

Bar charts per wiki
These metrics correspond 1:1 to the columns in the first table above (the hybrid table): Main monthly trends, and quarterly rankings.

These charts with one bar per month have become too unwieldy, and span several screens, even on a large monitor.




 * Keep
 * 1) Redesign into something nicer, like Commons charts — NickK (talk) 22:33, 3 May 2016 (UTC)
 * 2) Important to get a sense of the trend of each metric. Nemo 06:08, 4 May 2016 (UTC)
 * Drop


 * 1) Erik Zachte (talk) Either drop, or make more compact. Quarterly (Jan/Apr/Jul/Oct) or half-yearly samples (or averages) could still work.
 * 2) Ijon (talk) 23:45, 3 May 2016 (UTC) I've never found these useful, given the data is available in the table

Comparisons per project
Again, these metrics correspond 1:1 to the columns in the first table above (the hybrid table): Main monthly trends, and quarterly rankings.

These tables in particular are unwieldy, slow to download and display (there is a javascript macro behind every cell, to optimize download size). The monthly granularity is too small. The number of columns too large (280+ for Wikipedia).

Yet the cell coloring can help to quickly spot anomalies. BTW different reports use different cell coloring scheme, without legend (bug, there was one long ago).



Drop

 * 1) Erik Zachte (talk) Either drop, or make more compact. Quarterly (Jan/Apr/Jul/Oct) or half-yearly samples (or averages) could still work. Showing only a selection of columns could also help (there are predefined selections for languages spoken in one continent, e.g. Africa), but the languages selected do still show global stats, which is somewhat confusing)

Keep

 * 1) I sometimes use those: they are useful even though they are big. Showing only a selection of columns can be a solution (either split into several pages with something like 20 wikis per page or user-defined selections). Alternatively, we can keep online just a few first rows and make the entire table available as CSV or similar — NickK (talk) 22:36, 3 May 2016 (UTC)
 * 2) The most useful pages of WikiStats. In the last decade, I've always used these tables to quickly grasp what's going on in each project on all language, in a way which is impossible with aggregated data: the discoveries have often helped me find some nice activity/experiment on some wiki, or some situation in need for help from the crosswiki community, and nice discussions or best practices have ensued. The tabular format is also surprisingly effective, thanks to the colours which allow quick skimming. Admittedly, most people only care about their own courtyard and couldn't care less of comparing their own hyper-perfect project to the others; by the way, see m:Category:Cross-project comparisons. Nemo 06:13, 4 May 2016 (UTC)

Other

 * Denny (talk) 21:21, 3 May 2016 (UTC)
 * Why are these being discussed all together? Do you expect that in whatever other system is coming it will be as easy to fix one as to fix all of them? Nemo 06:13, 4 May 2016 (UTC)

Bot activity per project
For each project there are two reports on bot activity, one about article edits, one about article creations.


 * Keep
 * 1) Erik Zachte (talk) very useful to monitor bot activity per wiki.
 * 2) I am not aware of any other tool to measure bot activity — NickK (talk)
 * 3) useful. Keep, per NickK Ijon (talk) 23:47, 3 May 2016 (UTC)
 * Drop

Edits and reverts per wiki
Tables and charts on edit activity (registered users, anonymous users, bots). Also breakdown by type of edit/revert and type of reverted editor. Also most often reverted editors, most frequently reverting editors.


 * Keep
 * 1) Erik Zachte (talk) particularly useful to monitor long term edit trends
 * 2) Did not know that it existed, very interesting information — NickK (talk) 23:21, 3 May 2016 (UTC)
 * 3) Useful, though indeed, under-utilized because under-discovered. Ijon (talk) 23:52, 3 May 2016 (UTC)
 * 4) This information is necessary to Wikimedia survival: we can't let communities navigate in the dark without having such information. See also FlaggedRevs, Limits to configuration changes, m:Research:The sudden decline of Italian Wikipedia for some cases of usage in discussions. Nemo 06:19, 4 May 2016 (UTC)
 * 5) Anthere (talk) 07:50, 4 May 2016 (UTC)
 * Drop

Special reports for Wikibooks (activity and content)
Rankings (by size, edits, authors and chapter counts), Tables of Content, Chapter Sizes. Not updated since 2011, but worth remembering.



Erik Zachte (talk) Revive? I don't know. If only people knew about these reports. But it is a medium size project in its own right, with lots of information.


 * Revive
 * Drop
 * 1) not useful enough to be worth the time Ijon (talk) 00:00, 4 May 2016 (UTC)

Animation on growth per Wikimedia wiki
Generate new input for this animation from time to time




 * Keep
 * 1) Erik Zachte (talk) Used in keynotes
 * 2) Good visualisation, useful as an illustration of growth of different projects — NickK (talk) 23:24, 3 May 2016 (UTC)
 * 3) Anthere (talk) 07:51, 4 May 2016 (UTC)
 * Drop
 * 1) I would like the successor of Wikistats to focus on making available data that is hard/impossible for volunteer developers to produce.  Visualizations can always be created on top of available data, so are far less important, in my opinion, to prioritize for a WMF effort.  Since I understand this exercise (this opinion-poll) to be input for WMF prioritization, I'd hate to lose an actual data source because folks found this visualization of existing data fun (I think it's fun too). Ijon (talk) 00:21, 4 May 2016 (UTC)
 * Other
 * 1) It's equally important to calculate the data and to make it readable, which is why WikiStats didn't just publish CSV files but also tables, charts, plots etc. I don't know how useful this is, although I made some nice discoveries with it a couple times and perhaps I showed it at some event. Nemo 06:21, 4 May 2016 (UTC)

State of the wiki, current values for many metrics across one project



 * Keep
 * 1) A good set of statistics that can be usable, but this table really needs to be sortable. Keep if can be converted into something sortable or more dynamic, drop otherwise — NickK (talk) 23:27, 3 May 2016 (UTC)
 * Drop
 * 1) Erik Zachte (talk) Too unwieldy in its current form. A more dynamic reporting tool (querying a Wikistats database) would bring focus. Also, I copied a few more columns to Sitemap page which is sortable. Preview new version here.
 * Other
 * 1) Cannot live without this if the comparison tables are killed, but I've always used the comparison tables more for this sort of thing. A few times these saved my day though. Nemo 06:22, 4 May 2016 (UTC)
 * 2) Maintain it in a reworked format Anthere (talk) 07:52, 4 May 2016 (UTC)

Recent trends within one project



 * Keep
 * 1) Visualisation of such trends is useful. It would be better to keep several key metrics (new articles per month, new editors etc.) and have a comparison (and/or ranking) for main wikis. The current format is unfortunately far from easy to understand — NickK (talk) 23:32, 3 May 2016 (UTC)
 * Drop
 * 1) Erik Zachte (talk) Too much, too late (= useful in early years). Now too unwieldy in its current form.
 * 2) Not as useful, these days. Ijon (talk) 00:22, 4 May 2016 (UTC)
 * Other
 * 1) Same as with charts. Nemo 06:28, 4 May 2016 (UTC)

Category trees
Static reports. Very outdated (2009). Very unwieldy. Better tools available now.


 * Revive
 * Drop
 * 1) Erik Zachte (talk)
 * 2) Obsolete. HTML is a very bad format for visualising trees — NickK (talk) 23:33, 3 May 2016 (UTC)
 * 3) Obsolete. Ijon (talk) 00:22, 4 May 2016 (UTC)
 * Other
 * 1) Erik's perl scripts are still the best we have for analysis of a category tree... we regularly get requests for pageviews, for instance. This has been for a long time the only way to find circular categories in a project, it's been unusable for a long while though. The successor should probably use the same code to get the data and then apply some nicer browsing. Nemo 06:27, 4 May 2016 (UTC)

Wikis ranked by creation date, colored by growth rate
Table lists Wikipedias by creation month, and colors by growth rate in contributors, then again in second table by growth rate in articles.


 * Keep
 * 1) Erik Zachte (talk) Keep in some form, although the visualization could be much more awesome.
 * Drop
 * 1) I did not get the point.  seems to be very close to this but interactive. Perhaps merge if some features are still not available in this animation — NickK (talk)
 * 2) not useful enough. Ijon (talk) 00:23, 4 May 2016 (UTC)
 * Other
 * 1) I don't use this, personally; it's mainly a research tool though. This sort of information is useful to have in some compact form and it used to exist nowhere else when this report was created, but since then much of these details for many wikis (at least the biggest ones) has been copied elsewhere. Nemo 06:27, 4 May 2016 (UTC)