Mission[edit | edit source]
The Analytics team empowers and supports data informed decision making across the the Foundation and the Community. The team is composed of two groups, Development and Research and Data.
Contact[edit | edit source]
To reach the team, please email firstname.lastname@example.org (public archived mailing list). For sensitive requests, please email Toby Negrin (email@example.com) for Development requests or Dario Taraborelli (firstname.lastname@example.org) for Research and Data.
Team[edit | edit source]
Toby Negrin, Director
Builds the infrastructure, tools and datasets that enable the organization and the community to easily access, process and act on our data in a way that is consistent with our values.
Supports the organization in making research-informed decisions, to better understand our editor community and projects, and to determine the impact of new programs and products that the Foundation is designing.
Projects[edit | edit source]
|Project||Description||Project lead & team||Status|
|Kraken||A robust, distributed computing and data services platform built on top of Hadoop.||Diederik van Liere, Andrew Otto, Dan Andreescu||
We reached a milestone in our ability to deploy Java applications at the Foundation this month when we stood up an Archiva build artifact repository. This enables us to consistently deploy Java libraries and applications and will be used in Hadoop and Search initially.
The first Analytics use case for this system will be Camus, Linked-In's open source application for loading Kafka data into Hadoop. Once this is productized, we'll have the ability to regularly load log data from our servers into Hadoop for processing and analysis.
|Limn||Limn is a GUI for constructing beautiful visualizations without need of programming skills.||Dan Andreescu, Diederik van Liere||
No work this month.
|Wikimetrics||Cohort analysis of Wikimedia editors||Dan Andreescu, Diederik van Liere||
We did some significant architectural work on WikiMetrics this month to prepare it for its role as our recurrent report scheduling and generation system. The first use case for this system will be the Editor Engagement Vital Signs project, which will provide daily updates on key metrics around participation.
|Kafka||Logging infrastructure for Analytics||
We continue to investigate network issues between our data centers that are causing occasionally delivery issues. As noted above, we are currently deploying Camus, our software for transferring data between Kafka and Hadoop.
|Data Quality||Data Quality issues that we discover and resolve.||Andrew Otto, Diederik van LiereChristian Aistleitner,||
We fixed a number of issues around data quality in Wikistats, Wikipedia Zero and Wikimetrics.
|Research and Data||
This month we concluded the first stage of work on metrics standardization. We created an overview of the project with a timeline and a list of milestones and deliverables. We also gave an update on metrics standardization during the March session of the Research and Data monthly showcase. The showcase also hosted a presentation by Aaron Halfaker on his research on the impact of quality control mechanisms on the growth of Wikipedia.
We submitted 8 session proposals for Wikimania '14, authored or co-authored by members of the research team.
We completed the handover of Fundraising analytics tools and knowledge transfer in preparation for a new full-time research position that we will be opening shortly to support the Fundraising team.We continued to provide support to teams in focus area (Growth and Mobile) with an analysis of the impact of the rollout of the new onboarding workflows across multiple wikis; an analysis of mobile browsing sessions and ongoing analysis of mobile user acquisition tests. We also supported the Ops team in measuring the impact of the deployment of the ULSFO cluster, which provides caching for West USA and East Asia.