User Metrics

UserMetrics is the name of a platform developed by the Wikimedia Editor Engagement Experimentation (E3) team to measure user activity based on a set of standardized metrics. Using this platform, a set of key metrics can be selected and applied to a cohort of users to measure their overall productivity. The platform is designed for extensibility (creating new metrics, modifying metric parameters) and to support various types of cohort analysis and program evaluation in a user-friendly way. It accepts requests via a RESTful API and returns responses in JSON format.

0.1.4-dev
To be released on March 25th 2013.


 * User Session Management
 * flask-login
 * Implementation of request caching outside the runtime
 * start with pickle objects
 * Client for programmatically accessing API
 * https://github.com/rfaulkner/umapi_client
 * More robust testing.
 * added some testing
 * Interface to load cohorts
 * Request notifications listener
 * Update request_manager module to handle larger responses on the queue
 * Response Handler that builds responses for caching as they finish

0.1.3-dev
Released on March 11th 2013.


 * Expanded multi-project support
 * pure JSON responses
 * expose all parameters in JSON responses
 * Deal with undefined metric values in a standard way
 * Make all metrics relative to a user event (e.g. registration) as metrics like survival and threshold. This can be bettet
 * Refactor API source - break up functionality for easier maintenance and extensibility.
 * Manage requests via a separate job manager.user_metrics.api.engine.request_manager module.
 * Generate unique hashes from Request objects. build_key_signature method in user_metrics.api.engine.data

0.1.2-dev
Released on February 26th 2013.


 * Cross project support, cohorts and requests outside of enwiki are supported
 * Additional aggregators based on numpy methods (mean, median, max, min)
 * New script "run_ssh_tunnels" that sets up multiple connections to database hosts on different local ports
 * Redefined configuration settings to more easily allow for defining new connections and throttling max threads

0.1.1-dev
Released on January 30th 2013, codename "UMAPI".


 * Initial release
 * Raw, aggregate, and time-series processing of English Wikipedia user cohorts over various metrics
 * Metrics: http://meta.wikimedia.org/wiki/Research:Metrics#User_metrics
 * API implementation in flask that exposes these requests via HTTP urls
 * API also exposes single user requests

Future Work

 * Redefine API request entry points:
 * run/cohort/..
 * run/set/..
 * run/user/..
 * Build filters that allow new cohorts to be generated from requests
 * OAuth/OpenID integration. This is essential for programmatic access.
 * HTTPS

Rationale
The metrics API is a project whose aim is to create a way to extract the above metrics in a predictable, consistent, and reproducible way. The code base can be found here and is also pushed to a Gerrit project. This project will soon be replicated to the Wikimedia Github account and is currently deployed on the Wikimedia stats cluster. Documentation for the source is also hosted on the stats cluster here. The domain of the API is Wikipedia projects (the current focus is on English WP and is currently being expanded).



Documentation

 * Source code documentation
 * Formal metric definition and background research

Introduction
stub

How to find a Cohort
stub

How to make new a Cohort
stub

Exhaustive List of Metrics
stub

Exhaustive List of Aggregators
stub

Request Types
Below is an example of each of the request types in the API.

Aggregate
URL:

metrics.wikimedia.org/cohorts/e3_pef1_control/revert_rate?aggregator=average

Params:


 * aggregator - The aggregator applied to all of the raw data

Time series
Story:

If would like to examine the cohort "e3_pef1_control" over a time period beginning at 2012-09-01 and running through to 2013-01-01 at 10 day intervals

URL:

metrics.wikimedia.org/cohorts/e3_pef1_control/bytes_added?aggregator=sum&time_series&interval=240&start=20120901&end=20130101&group=input

Params:

Breaking this down I have the following switches set:


 * start - start date timestamp (can be "YYYYMMDD" or "YYYYMMDDHHmmSS")
 * end - end date timestamp (can be "YYYYMMDD" or "YYYYMMDDHHmmSS")
 * interval - Length in hours of each time slice
 * time_series - This parameter must simply be present to initiate a time series request
 * aggregator - The aggregator applied to each time slice of data

Single User Endpoint
stub

Cohort Operations
stub

UserMetrics API
API requests are encoded in HTTP GET request urls and responses in formatted JSON. The elements of a request include the following:


 * User cohort name (a cohort is simply a set of user IDs in Wikipedia)
 * Metric handle (friendly name of the metric to be computed)
 * Request parameters:
 * Aggregator handle. Presence of the variable aggregator in the query string set to the friendly name of the aggregator.
 * Time series flag. Presence of the variable time_series in the query string.
 * Metric parameters. Any metric specific parameters specified in the query string.