Analytics/Data Releases/status

Jump to navigation Jump to search

Last update on: 2013-08-monthly


Diederik working on a data publication policy for new dataset release, consulting with others. Diederik also in contact with an external party to talk about mirroring XML dumps.


Investigating setting up mirror of with 3rd party.


We have not introduced any new data sources as part of our analytics platform in April.


No news.


We delivered many following analyses in June, including one of Arabic cohort using UMAPI v1. Erik Zachte provided an analysis of Commons uploaders, and we provided the Wikipedia Zero team with a number of datasets to help them in tracking adoption of the Wikipedia Zero project across the globe. We supported the VisualEditor and Editor Engagement teams with experimental design, data modeling and data analysis for two controlled experiments: a test of the impact of impact of notifications and a first test of the impact of Visual Editor on new contributors. The tests were carried out in June and the reports are being updated with the results of the analysis. We started using the EE-dashboard instance on Labs to host dashboards related to editor engagement projects, that were previously hosted on the Toolserver (see the metrics and features dashboards for the English Wikipedia). Last, we worked with the Features engineering team to expand MediaWiki's instrumentation and collect data on cluster-wide user preference changes and edit-related events to support VisualEditor analysis.


  • Erik Zachte published data and longitudinal analyses of edit and revert trends for Wikimedia projects (read the announcement). We provided data and ad-hoc analysis for the presentation A State of Decline? The State of Wikimedia Communities as of July 2013 at the July 2013 Monthly Metrics Meeting.
  • We published the analysis of a controlled experiment that we ran in June to test the Impact of notifications on new contributors and a pre-release A/B test of Visual Editor on the English Wikipedia. We performed an extensive audit of the quality of the data collected during and after the VE test, taking into account browser limitations and known bugs, and posted an update on the state of the analysis. We released via our open data repository the complete dataset of the sample of new registered users who participated in the split test to ensure the replicability of the analysis.
  • We released real-time dashboards on edit activity, new account registrations and reverts for the 10 Wikipedias on which VE has been rolled out. (endeesfrheitnlplrusv)


In August, we attended WikiSym and Wikimania. Dario Taraborelli gave a keynote address on actionable Wikipedia research at WikiSym, where several other Wikipedia research papers were presented. At Wikimania, we hosted two sessions focused on Wikimedia data and analytics tools. We also worked with Platform engineering this month on analyzing and visualizing HTTPS failure rates by country, in preparation for the switch to HTTPS as a default. We released new dashboards for the launch of notifications on 5 other Wikipedias and continued to provide ad-hoc support to teams in Editor Engagement. Last, we continued screening and interviewing candidates for an open research analyst position.