Analytics/Visualization, Reporting & Applications/status

Jump to navigation Jump to search

Last update on: 2013-08-monthly


Limn released on GitHub!


The Analytics team published the source code for Limn to GitHub this month.


Phabricator to track GitHub Limn


We started working on the migration from dygraphs to d3.js, and improving documentation to ease the on-boarding of our new front-end engineer.


Finishing first steps of conversion to d3. Start to migrate some more dashboards to use Limn.


Initial prototype of d3-based graphing is done, which now needs to be integrated into Limn. The team has been training other groups to use Limn to create custom dashboards.


The migration to d3 rendering was completed. The design for the new Edit UI is complete. We are porting over the monthly metrics meeting dashboard to the new Limn and are aiming to show it off in December.


We are porting over the monthly metrics meeting dashboard to the new Limn and are aiming to show it off in December.


David Schoonover and Dan Andreescu are working on a major rework of Limn, using Knockout.js and d3.js. The team hopes to have this ready to present the metrics for the December 6 metrics meeting at the Wikimedia Foundation.


A major rework of Limn to use d3.js and Knockout.js is complete and will be used for the next ReportCard. Dan Andreescu and David Schoonover are working on graph editing and geospatial data visualization.


The team made performance improvements and added new visualizations. Bar, line, and geo plots can now be built ad hoc from arbitrary data. Evan Rosen's Grantmaking and Programs dashboard was migrated to this new version of Limn. Current work includes a collaboration with the E3 team to provide visualizations, and development of a MediaWiki extension that will allow creation and editing of graphs.


Highlights in the past month include

  • Stacked charts
  • Debianization & Puppetization
  • New E3 and Grantmaking dashboards
  • Ad-hoc visualization of datasource


We published our monthly report card. As part of Wikimedia's ongoing mobile initiative, we also helped develop analytics that would support ongoing delivery and planning of mobile functionality:

  • We've started to analyze mobile site pageviews by device class, in order to determine how we will invest in building applications and sites that support various device formats.
  • We've also started to perform session analysis of mobile site visits, in order to help us understand user behavior when using the mobile sites, which will inform decisions about ongoing development efforts. At present, this data is only for internal consumption by the Mobile team.
  • A new overall mobile pageviews report is now available, which has improved the accuracy of our reporting due to changes in how the MobileFrontend extension requests a wiki article (improving performance).
  • More information about how we're calculating mobile pageviews is available in our documentation.

We also introduced new dashboards for our Editor engagement team, that will help them monitor the usage of the new Notifications system. Finally, we've added pageview stats for the Hungarian and Ukranian Wikivoyages.


For the mobile team, we started collecting pageview counts for both official and non-official Wikipedia apps. We changed our Kafka import configuration so that the raw webrequest folders are directly queryable using Hive. The decision was made to re-platform the UMAPI codebase; we have spent quite some time specifying user stories and had productive discussions about the architecture during the Amsterdam Hackathon. On the development side, the 'page count' metric was introduced. We adapted Ori Livneh's Mediawiki Vagrant VM to also support UMAPI in combination with test data. This will make it much easier to debug issues and open development up to community members. We also fixed numerous stability bugs.


This month, we completed the end-user documentation of UserMetrics (v1). We rebranded UserMetrics as Wikimetrics, and we will slowly start to use that as the new name when referring to UserMetrics v2 or UserMetrics replatforming. We focused on laying out the foundation of Wikimetrics: a new database design, a new job queue design and lots of unit tests. In addition, we started working on porting over some of the features of UserMetrics v1 (like the 'namespace edits' metric and UI components), we added user roles (so users can only see their own metrics) and authentication using OAuth. Last, we fixed some minor issues in UserMetrics v1, among which handling of user names with comma, single and double quotes.


Wikimetrics: We successfully launched the initial version of Wikimetrics: see This version has support for cohort upload and two metrics: 1) bytes added and 2) namespace edits. We are working on adding support for time-series and aggregators. In the coming sprints we will focus on adding new metrics.

Wikipedia Zero: Dashboards have been moved off of Hadoop for the time being and are now being populated again. We have identified some issues with logrotation that are causing gaps in the graphs, and will look into these problems. Also, we have been working on technical handoff as Evan Rosen leaves the Foundation.

Limn: No development news.

Wikistats: No development news.


In close collaboration with Dario, Jaime and Jessie, we have worked on new features for Wikimetrics. In particular, we are adding new metrics such as survival, pages created, aggregation of metrics, metadata in the CSV output, a support page and we have now more than 90% test coverage of the codebase. In preparation for the reinstallation of the Hadoop cluster, we moved all Wikipedia Zero jobs off the cluster. We took this opportunity to add additional monitoring to the creation of Wikipedia Zero dashboards. We have worked with Wikipedia Zero to identify a problem with Geolocation of requests that has created large jumps in total traffic. We spent quite some time creating a more robust process for updating and monitoring This dashboard is used by various internal stakeholders and receives its information from different datastreams using different scripts. We have been working on running these scripts under the general purpose stats user, adding additional monitoring to prevent stale data and puppetized some of the jobs.