Product Analytics/Comparison datasets

Jump to navigation Jump to search


We maintain a wiki comparison dataset (which we used to call a wiki segmentation dataset) to show a simple, snapshot comparison of our wikis (as opposed to, say, Wikistats 2, which is mean to show simple trends within individual wikis or wiki groups). Code, Google spreadsheet.


  • We plan to update this dataset roughly yearly. When we do, we should continue to make the old versions accessible (e.g. as another tab in the Google Spreadsheet) so people can perform comparisons if they wish.


The New Readers project created a country comparison dataset in 2016. We could adopt this as a product.

We would need to provide a bidirectional wiki-country mapping (with possibly a selection of different mapping types: most commonly accessed language, languages accounting for more than 10% of pageviews...)


Ethnologue is the best data source. The Wikipedia Cultural Diversity Observatory created a language-territories mapping, partly with Ethnologue data.