Mission[edit | edit source]
The Analytics team empowers and supports data informed decision making across the the Foundation and the Community. The team is composed of two groups, Development and Research and Data.
Contact[edit | edit source]
To reach the team, please email firstname.lastname@example.org (public archived mailing list). For sensitive requests, please email Toby Negrin (email@example.com) for Development requests or Dario Taraborelli (firstname.lastname@example.org) for Research and Data.
Team[edit | edit source]
Toby Negrin, Director
Builds the infrastructure, tools and datasets that enable the organization and the community to easily access, process and act on our data in a way that is consistent with our values.
Supports the organization in making research-informed decisions, to better understand our editor community and projects, and to determine the impact of new programs and products that the Foundation is designing.
Projects[edit | edit source]
|Project||Description||Project lead & team||Status|
|Kraken||A robust, distributed computing and data services platform built on top of Hadoop.||Diederik van Liere, Andrew Otto, Dan Andreescu||
We continue to make progress on the Hadoop/Kafka roll-out. We've encountered some issues with cross-data center latencies with Varnish-Kafka that we are currently debugging. We are also testing the Kafka-tee component that provides backwards compatibility for udp2log subscribers. Finally, we are finishing a report for the Mobile team on browser breakdowns using Kafka-provided data on Hadoop.
|Limn||Limn is a GUI for constructing beautiful visualizations without need of programming skills.||Dan Andreescu, Diederik van Liere||
We've rolled out some minor changes that make creating dashboards easier and more intuitive.
|Wikimetrics||Cohort analysis of Wikimedia editors||Dan Andreescu, Stefan Petrea, Diederik van Liere||
Work progresses on enhancing Wikimetrics into a more flexible general tool. This month we completed work on a Vagrant deployment environment which will make it easier for the community to work on Wikimetrics. We've also made progress on the scheduler, reporting enhancements and a deployment issue.
|Kafka||Logging infrastructure for Analytics||
We've increased the throughput on Kafka from 6K Requests Per Second (RPS) to 50K RPS to test stability under higher loads.
|Data Quality||Data Quality issues that we discover and resolve.||Andrew Otto, Diederik van LiereChristian Aistleitner,||
We've fixed the following production issues:
|Research and Data||
This month, we welcomed Leila Zia as the newest addition to the team. Leila joins the Foundation as a research scientist after completing a PhD in management science and engineering at Stanford University. Her work will initially focus on modeling editor lifecycles to better understand what affects their survival and retention.
We hosted the first public Research and Data showcase, a monthly showcase of research conducted by the team and other researchers in the organization. This month, we presented two studies on Wikipedia article creation trends and on the measurement of mobile browsing sessions. The showcase is hosted at the Wikimedia Foundation and live streamed on YouTube every 3rd Wednesday of the month at 11.30am Pacific Time.
We attended the 17th ACM Conference on Computer-supported cooperative work and Social Computing (CSCW '14) in Baltimore. Research on Wikipedia and wiki-based collaboration has been a major focus of CSCW in the past, and this year three Wikipedia research papers were presented. We hosted a session to discuss collaboration opportunities for researchers interested in tackling problems of strategic importance for Wikimedia (a detailed CSCW '14 report will follow on wiki-research-l).
We started creating public documentation for data sources and tools used by the team for research and data analysis and porting docs previously hosted on internal wikis (for example: analytics/geolocation).We continued to provide ad-hoc support to various teams at the Foundation and worked closely with the Growth and Mobile teams to prepare and review results for their respective quarterly reviews.