User Metrics

UserMetrics is the name of a platform developed by the Wikimedia Editor Engagement Experimentation (E3) team to measure user activity based on a set of standardized metrics. Using this platform, a set of key metrics can be selected and applied to a cohort of users to measure their overall productivity. The platform is designed for extensibility and to support cohort analysis and project evaluation in a user-friendly way. It accepts requests via an API and returns responses in JSON format.

Rationale
The metrics API is a project whose aim is to create a way to extract the above metrics in a predictable, consistent, and reproducible way. The code base can be found here and is also pushed to a Gerrit project. This project will soon be replicated to the Wikimedia Github account and is currently deployed on the Wikimedia stats cluster. Documentation for the source is also hosted on the stats cluster here. The domain of the API is Wikipedia projects (the current focus is on English WP and is currently being expanded).



Documentation
Source code docs can be found at http://stat1.wikimedia.org/rfaulk/pydocs/_build/.

Formal metric definitions can be found at https://meta.wikimedia.org/wiki/Research:Metrics.

UserMetrics API
API requests are encoded in HTTP GET request urls and responses in formatted JSON. The elements of a request include the following:


 * User cohort name (a cohort is simply a set of user IDs in Wikipedia)
 * Metric handle (friendly name of the metric to be computed)
 * Request parameters:
 * Aggregator handle. Presence of the variable aggregator in the query string set to the friendly name of the aggregator.
 * Time series flag. Presence of the variable time_series in the query string.
 * Metric parameters. Any metric specific parameters specified in the query string.

The following is an example of a formatted http request url:

The above request computes the threshold metric on the e3_pef2_control control group from the post edit feedback experiment. The "average" aggregator is applied with time series over 24 hour intervals, and the threshold parameter t is set to 100 hours.