Analytics/Reportcard/status

Last update on: 2011-12-31

2011-04-30
Erik Zachte, Nimish Gautam and Erik Möller are investigating visualization toolkits to use in the report card (a monthly report of key metrics to measure community health). Additionally, they are streamlining and modularizing the report creation process.

2011-06-02
Erik Zachte, Nimish Gautam Erik Möller, Mani Pande and Asher Feldman laid down the requirements and groundwork of the next version of the Report Card. Erik Zachte's scripts will be modified to enter the data into a database, that can then be accessed with a dedicated API to automatically generate the report card and other charts using a visualization framework. The API will also be puclicly available for third parties to access the data.

2011-06-14
Doing:
 * Finalizing DB definition, creating DB with skeleton data
 * Finalizing visualization library selection based on comprehensive assessment

Done:
 * Oriented new team member Asher Feldman; completed hand-over of Account Creation Project tech support
 * Developed API response object definition
 * Defined core metrics for first dashboard
 * Assessed state of available gender stats via user prefs
 * Built out "metric adding" functionality

2011-06-30
Erik Zachte and Nimish Gautam started a development sprint and worked on the back-end infrastructure, supported by Asher Feldman & Sam Reed. The information stored in a database is accessed via a new MediaWiki extension ("MetricsReporting", see in SVN), and the visualization part uses JQplot. The team hopes to demonstrate a prototype for the next report card in early July.

2011-08-01
The team started their second sprint in July, whose goal was to incorporate key metrics into the Report card such as editors by geography, page views (both mobile and non-mobile) and gender breakdown of editors. Nimish Gautam worked on the infrastructure and analytics for editor by geography. Sam Reed implemented a generic CSV importer, and looked at how to use the Google API to automatically draw data about offline usage into the Report card from Google Spreadsheets.

2011-08-31
Nimish Gautam and Sam Reed worked on allowing content from CSV files and from Google Spreadsheets into the dashboard. Nimish also mined data to identify editors by geography, and worked on a page views tab, using the WURFL library to estimate mobile page views and device capabilities.

2011-09-30
Erik Zachte fixed bugs in Wikistats and continued to automate the process of statistics generation. He also started to publish summaries for all Wikimedia wikis, using the India report card as a model. Nimish Gautam continued to work on improved mobile logs analysis, general data quality issues, and integration of targets into the dashboard.

2011-10-31
Erik Zachte discovered inconsistencies in the report card numbers, which were investigated and attributed to packet loss of up to 25%. This has now been fixed; further steps to prevent similar issues include adding monitoring to the process, and versioning the configuration files.

2011-11-30
Erik Zachte added new visualizations showing the geographical distribution of page view and mobile page view. Asher Feldman tweaked kernel parameters and Tim Starling made changes the logging script (udp2log) to fix the packet loss issue on our logging servers.

2011-12-31
The reportcard 2.0 was moved to the Labs environment, and its source code centralized. The back-end and front-end code of stats.grok.se was rewritten and is being deployed to Labs as well. A renewed effort is expected as new employees come on board in January.

2012-1-15
 The new key/value storage approach has been approved by Robla. Andrew and Diederik have started working on a data pipeline framework, see gerrit.wikimedia.org/analytics/reportcard. All reportcard related code can be found in git as well gerrit.wikimedia.org/analytics/reportcard