Analytics/Data Quality/status

Jump to navigation Jump to search

Last update on: 2014-03-monthly


  • The Analytics team discovered that we were incorrectly tagging traffic as Wikipedia Zero and this lead to overcounting W0 traffic by approx. 10%. A joint effort by the Wikipedia Zero, Ops and Analytics team uncovered the root cause and a fix was applied.


We created dashboards for several Wikipedia Zero partners (Orange Madagascar, Bangalink, Umniah Jordan), and identified and fixed Wikipedia Zero data issues in collaboration with the Zero team.


We identified issues with over-counting page views, and deployed a fix in November. Data from July onward were restated.


The team continues to spend a large amount of time on data quality. The primary effort in December was in isolating and fixing an error in WikiStats that inflated page views from July to December by a significant amount. The error was patched in early December and the statistics were recalculated. There were also issues with Wikipedia Zero traffic and an outage caused by a single point of failure in the legacy infrastructure.


Review of 2013 traffic trends by the Wikimedia Analytics Team.

The team has spent an intense month analyzing data to explain the page view issues identified in December. The team's report was shared at the February metrics meeting.


We've fixed the following production issues:

  • Resolved on No sampled-1000 tsv file for 2014-02-06 on stat1002;
  • Wikipedia Zero team investigated ~30% increase of number of lines zero tsvs between 20140218 and 20140220 file;
  • Wikipedia Zero team investigated on light drop in zero requests around 2014-02-08;
  • Data for ULSFO Cache performance prepared for Ops blog post.


We fixed a number of issues around data quality in Wikistats, Wikipedia Zero and Wikimetrics.