User Metrics

UserMetrics is the name of a platform developed by the Wikimedia Editor Engagement Experimentation (E3) team to measure user activity based on a set of standardized metrics. Using this platform, a set of key metrics can be selected and applied to a cohort of users to measure their overall productivity. The platform is designed for extensibility (creating new metrics, modifying metric parameters) and to support various types of cohort analysis and program evaluation in a user-friendly way. It accepts requests via a RESTful API and returns responses in JSON format.

0.1.4-dev
To be released on March 25th 2013.


 * User Session Management
 * flask-login
 * Implementation of request caching outside the runtime
 * start with pickle objects
 * Client for programmatically accessing API
 * https://github.com/rfaulkner/umapi_client
 * More robust testing.
 * added some testing
 * Interface to load cohorts
 * Request notifications listener
 * Update request_manager module to handle larger responses on the queue
 * Response Handler that builds responses for caching as they finish

0.1.3-dev
Released on March 11th 2013.


 * Expanded multi-project support
 * pure JSON responses
 * expose all parameters in JSON responses
 * Deal with undefined metric values in a standard way
 * Make all metrics relative to a user event (e.g. registration) as metrics like survival and threshold. This can be bettet
 * Refactor API source - break up functionality for easier maintenance and extensibility.
 * Manage requests via a separate job manager.user_metrics.api.engine.request_manager module.
 * Generate unique hashes from Request objects. build_key_signature method in user_metrics.api.engine.data

0.1.2-dev
Released on February 26th 2013.


 * Cross project support, cohorts and requests outside of enwiki are supported
 * Additional aggregators based on numpy methods (mean, median, max, min)
 * New script "run_ssh_tunnels" that sets up multiple connections to database hosts on different local ports
 * Redefined configuration settings to more easily allow for defining new connections and throttling max threads

0.1.1-dev
Released on January 30th 2013, codename "UMAPI".


 * Initial release
 * Raw, aggregate, and time-series processing of English Wikipedia user cohorts over various metrics
 * Metrics: http://meta.wikimedia.org/wiki/Research:Metrics#User_metrics
 * API implementation in flask that exposes these requests via HTTP urls
 * API also exposes single user requests

Future Work

 * Redefine API request entry points:
 * run/cohort/..
 * run/set/..
 * run/user/..
 * Build filters that allow new cohorts to be generated from requests
 * OAuth/OpenID integration. This is essential for programmatic access.
 * HTTPS

Rationale
The metrics API is a project whose aim is to create a way to extract the above metrics in a predictable, consistent, and reproducible way. The code base can be found here and is also pushed to a Gerrit project. This project will soon be replicated to the Wikimedia Github account and is currently deployed on the Wikimedia stats cluster. Documentation for the source is also hosted on the stats cluster here. The domain of the API is Wikipedia projects (the current focus is on English WP and is currently being expanded).



Documentation

 * Source code documentation
 * Formal metric definition and background research

UserMetrics API
API requests are encoded in HTTP GET request urls and responses in formatted JSON. The elements of a request include the following:


 * User cohort name (a cohort is simply a set of user IDs in Wikipedia)
 * Metric handle (friendly name of the metric to be computed)
 * Request parameters:
 * Aggregator handle. Presence of the variable aggregator in the query string set to the friendly name of the aggregator.
 * Time series flag. Presence of the variable time_series in the query string.
 * Metric parameters. Any metric specific parameters specified in the query string.