Analytics
From MediaWiki.org
Howdy from the Wikimedia Analytics team! We're currently working on building a Data Services Platform (codenamed Kraken) to power the next-generation of intelligence and analytics, as well as a new and wondrous Reportcard.
If you're enthusiastic about analytics, you might want to meet the team, or subscribe to the Analytics Mailing List (don't worry, it's low-traffic). If you want to see Kraken in action, have a look at the Hive tutorial
Contents |
Team[edit]
- Dan Andreescu
- Andrew Otto
- Dave Schoonover
- Diederik van Liere
- Erik Zachte
- Kraig Parkinson
- Stefan Petrea
Planning[edit]
Stakeholder Corner[edit]
- Product Codes — required application identifier needed to get data into the cluster.
- Metrics Definitions — canonical definition of metrics.
- Dreams — features, metrics/queries, and visualizations people would like to see someday. Importantly, don't worry about these things being reasonable, well-scoped, or put in the right place. Just add stuff here you think would be useful.
Projects[edit]
Kraken[edit]
- "Kraken" is the codename for the cluster and software which powers the Data Services Platform.
- Project Info
- Product Codes
- Cluster Dataflow Diagram
- Latest status:
[edit status] • [add new]2013-05-monthly:We continued our efforts of increasing our monitor coverage of the different webrequest dataflows. On the udp2log side, we added monitoring per DC/server role. Every month, we work on improving the robustness and security of the analytics-related servers that we run: we moved the multicast relay from Oxygen to Gadolinium, we upgraded Oxygen to Ubuntu Precise, and we moved all the Limn-based dashboards from the Kripke labs instance to the Limn0 labs instance. Continous integration for webstatscollector, wikistats and udp-filters now works. The puppet module for Hadoop has been merged in the Operations reposotiry; this is a big step forward in moving Kraken from beta to production status. Magnus Edenhill demonstrated varnishkafka based on Kafka 0.8; on a local machine varniskafka was able to process 140k msgs/s and we are planning to do production testing mid June. Last, we separated the Kraken machines from the other production servers by installing network ACLs.
Limn[edit]
- Limn is a drop-in GUI toolkit for building visualizations. It powers the WMF Monthly Reportcard.
- Project Info
- Source: https://github.com/wikimedia/limn
- Issues: https://github.com/wikimedia/limn/issues
- Latest status:
[edit status] • [add new]2013-05-monthly:For the mobile team, we started collecting pageview counts for both official and non-official Wikipedia apps. We changed our Kafka import configuration so that the raw webrequest folders are directly queryable using Hive. The decision was made to re-platform the UMAPI codebase; we have spent quite some time specifying user stories and had productive discussions about the architecture during the Amsterdam Hackathon. On the development side, the 'page count' metric was introduced. We adapted Ori Livneh's Mediawiki Vagrant VM to also support UMAPI in combination with test data. This will make it much easier to debug issues and open development up to community members. We also fixed numerous stability bugs.
Reportcard[edit]
- The WMF Monthly Reportcard is a visual dashboard of high-level metrics on the health and success of the various Wikimedia projects.
- Project Info
- Site: http://reportcard.wmflabs.org/
- Latest status:
[edit status] • [add new]2013-02-monthly:Reportcard was updated.
Wikistats[edit]
- Project Info
- Latest status:
Logging Infrastructure[edit]
- Project Info
- Components:
- Resources:
- Latest status:
[edit status] • [add new]2013-02-monthly:It was a quiet month for the logging infrastructure; things were running fine. We have been working on a patch to fix bug 45178, which we will try to deploy in March.
Data Releases[edit]
- Project Info
- Page View Analytics
- Latest status:
[edit status] • [add new]2013-05-28:No news.
See Also[edit]
Research & Notes[edit]
Management[edit]
- Using Mingle to track improvements: https://mingle.corp.wikimedia.org/projects/analytics/wiki/Engaging_in_Continuous_Improvement