Analytics/Reportcard/status

Last update on: 2012-05-monthly

2011-04-30
Erik Zachte, Nimish Gautam and Erik Möller are investigating visualization toolkits to use in the report card (a monthly report of key metrics to measure community health). Additionally, they are streamlining and modularizing the report creation process.

2011-06-02
Erik Zachte, Nimish Gautam Erik Möller, Mani Pande and Asher Feldman laid down the requirements and groundwork of the next version of the Report Card. Erik Zachte's scripts will be modified to enter the data into a database, that can then be accessed with a dedicated API to automatically generate the report card and other charts using a visualization framework. The API will also be puclicly available for third parties to access the data.

2011-06-14
Doing:
 * Finalizing DB definition, creating DB with skeleton data
 * Finalizing visualization library selection based on comprehensive assessment

Done:
 * Oriented new team member Asher Feldman; completed hand-over of Account Creation Project tech support
 * Developed API response object definition
 * Defined core metrics for first dashboard
 * Assessed state of available gender stats via user prefs
 * Built out "metric adding" functionality

2011-06-30
Erik Zachte and Nimish Gautam started a development sprint and worked on the back-end infrastructure, supported by Asher Feldman & Sam Reed. The information stored in a database is accessed via a new MediaWiki extension ("MetricsReporting", see in SVN), and the visualization part uses JQplot. The team hopes to demonstrate a prototype for the next report card in early July.

2011-08-01
The team started their second sprint in July, whose goal was to incorporate key metrics into the Report card such as editors by geography, page views (both mobile and non-mobile) and gender breakdown of editors. Nimish Gautam worked on the infrastructure and analytics for editor by geography. Sam Reed implemented a generic CSV importer, and looked at how to use the Google API to automatically draw data about offline usage into the Report card from Google Spreadsheets.

2011-08-31
Nimish Gautam and Sam Reed worked on allowing content from CSV files and from Google Spreadsheets into the dashboard. Nimish also mined data to identify editors by geography, and worked on a page views tab, using the WURFL library to estimate mobile page views and device capabilities.

2011-09-30
Erik Zachte fixed bugs in Wikistats and continued to automate the process of statistics generation. He also started to publish summaries for all Wikimedia wikis, using the India report card as a model. Nimish Gautam continued to work on improved mobile logs analysis, general data quality issues, and integration of targets into the dashboard.

2011-10-31
Erik Zachte discovered inconsistencies in the report card numbers, which were investigated and attributed to packet loss of up to 25%. This has now been fixed; further steps to prevent similar issues include adding monitoring to the process, and versioning the configuration files.

2011-11-30
Erik Zachte added new visualizations showing the geographical distribution of page view and mobile page view. Asher Feldman tweaked kernel parameters and Tim Starling made changes the logging script (udp2log) to fix the packet loss issue on our logging servers.

2011-12-31
The reportcard 2.0 was moved to the Labs environment, and its source code centralized. The back-end and front-end code of stats.grok.se was rewritten and is being deployed to Labs as well. A renewed effort is expected as new employees come on board in January.

2012-01-31
The new key/value storage approach has been approved by Rob Lanphier. Andrew Otto and Diederik van Liere have started working on a data pipeline framework. All reportcard related code can be found in git.

2012-02-29
<section begin=2012-02-29 />Erik Zachte generated a world map with per country coloring of Wikipedia usage.<section end=2012-02-29 />

2012-02-21
<section begin=2012-02-21/>Full team has been working for one week in a Scrummy way with daily checkins. We have chosen the dygraphs javascript visualization library and we have fully focused on getting a working frontend prototype.<section end=2012-02-21/>

2012-03-15
<section begin=2012-03-30/> <section end=2012-03-30/>
 * Demoed 1st iteration beginning of March
 * Finetuning UI
 * Preparing for metrics meeting of March

2012-03-31
<section begin=2012-03-31/>The analytics team is finetuning the interface of the new Report card. The test site in Labs is currently unavailable. The team is working towards showcasing a first report card prototype by April 6th, the date of the next metrics meeting for the Wikimedia Foundation. This prototype will replicate readers and pageviews. The team will also make a serious attempt at getting editor data up and running, and add the ability to add and signal benchmarks, for the April 6th meeting.<section end=2012-03-31/>

2012-04-10
<section begin=2012-04-10/>The beta of the new Reportcard dashboard is up on labs.<section end=2012-04-10/>

2012-04-monthly
<section begin=2012-04-monthly/>David Schoonover and Fabian Kaelin continued work on the new Report card, whose prototype is available on Wikimedia Labs. Andrew Otto has been working with the Operations team to puppetize existing services, and to add a third server (Oxygen) to run filters; we are in the process of migrating bayes to stat1. Andre Engels, Erik Zachte and Diederik van Liere have worked on new mobile reports that will be integrated into the new report card.<section end=2012-04-monthly/>

2012-05-monthly
<section begin="2012-05-monthly"/> There were three main goals for work on the Reportcard this month.

Firstly and foremost, Fabian Kaelin and Erik Zachte updated the datasets to include April's data. The whole team contributed to improving the graphs' appearance.

Second, Dave Schoonover knocked out the high-priority requests presenters who use Reportcard in their presentations:
 * You'll now see "callouts" next to each graph highlighting the current month's value, and the change over last month/year.
 * The front page has been streamlined, and now loads only the "core" graphs; tabs at the top let you access the others.
 * Projects and Languages now have consistent coloring across graphs.

Finally, the team has been working hard behind the scenes to make the framework behind the Reportcard, named "Limn", a best-of-breed project for general use. While not ready for public consumption, we implemented a GUI for selecting and manipulating datasets, and began work to support multiple visualization types. We now have multiple staging environments, including both test and dev targets. We hope to be in a place to open-source the framework next month. <section end="2012-05-monthly"/>