Analytics/Data Quality/status

Last update on: 2014-03-monthly

2013-09-monthly
* The Analytics team discovered that we were incorrectly tagging traffic as Wikipedia Zero and this lead to overcounting W0 traffic by approx. 10%. A joint effort by the Wikipedia Zero, Ops and Analytics team uncovered the root cause and a fix was applied.

2013-10-monthly
We created dashboards for several Wikipedia Zero partners (Orange Madagascar, Bangalink, Umniah Jordan), and identified and fixed Wikipedia Zero data issues in collaboration with the Zero team.

2013-11-monthly
We identified issues with over-counting page views, and deployed a fix in November. Data from July onward were restated.

2013-12-monthly
The team continues to spend a large amount of time on data quality. The primary effort in December was in isolating and fixing an error in WikiStats that inflated page views from July to December by a significant amount. The error was patched in early December and the statistics were recalculated. There were also issues with Wikipedia Zero traffic and an outage caused by a single point of failure in the legacy infrastructure.

2014-01-monthly
The team has spent an intense month analyzing data to explain the page view issues identified in December. The team's report was shared at the February metrics meeting.

2014-02-monthly
We've fixed the following production issues:
 * Resolved on No sampled-1000 tsv file for 2014-02-06 on stat1002;
 * Wikipedia Zero team investigated ~30% increase of number of lines zero tsvs between 20140218 and 20140220 file;
 * Wikipedia Zero team investigated on light drop in zero requests around 2014-02-08;
 * Data for ULSFO Cache performance prepared for Ops blog post.

2014-03-monthly
We fixed a number of issues around data quality in Wikistats, Wikipedia Zero and Wikimetrics.