UserMetrics/Guide

About
UserMetrics is the name of a platform developed by the Wikimedia Foundation to measure user activity based on a set of standardized metrics. Using this platform, a set of key metrics can be selected and applied to a cohort of users to measure their overall productivity. The platform is designed for extensibility (creating new metrics, modifying metric parameters) and to support various types of cohort analysis and program evaluation in a user-friendly way. It accepts requests via a RESTful API and returns responses in JSON format.

As of May 2013, the UserMetrics API is used internally at the Wikimedia Foundation by the Editor Engagement, Global Education and Grantmaking programs. The scope of the project is being extended to include external customers, researchers, and community members. If you are interested in using the UserMetrics API, please contact [mailto:usermetrics@wikimedia.org usermetrics@wikimedia.org].

Project home page and API home
The project home page and UserMetricsAPI home are here:

Project home page: http://mediawiki.org/wiki/UserMetrics

The project home page is the main developer hub for the project, hosting updates and resources for developers and end users.

UserMetrics API home page: http://metrics.wikimedia.org

The UserMetrics API home page is the public URL of the API. Access to the API is currently restricted to internal users and early testers. To obtain credentials, please contact us at [mailto:usermetrics@wikimedia.org usermetrics@wikimedia.org].

Additional information: code repository and bug reports
If you would like additional information about the project, please see:

Code repository: https://github.com/wikimedia/user_metrics

Bug reports: Bugs and feature suggestions should be reported via Bugzilla. https://bugzilla.wikimedia.org/buglist.cgi?component=User%20Metrics&product=Analytics

Contact us
To reach us and/or obtain help, please write to: [mailto:usermetrics@wikimedia.org usermetrics@wikimedia.org].

To receive updates about any UserMetrics service or data quality issues, please join the umapi-alerts list.

Rationale
The UserMetrics API grew out of a need to study data collected via user tagging User Tagging, which is used to identify groups of users so that they can be studied collectively--all subjects of an experiment, for example, or all users who created accounts at an outreach event. The API permits us to easily and efficiently generate reports that provide information about how groups of users behave, for example, how quickly a group of users became productive editors, or how likely a group is to remain active over time.

A second important aim of the project is to develop a standardized set of metrics Metrics Standardization that permits everyone in the organization to have the same understanding of what we mean when we say “an editor has been retained” or “an editor is active.” Because the UserMetrics API uses a standardized set of metrics, the reports generated by the system can be used together—either within a project (to compare an experimental group to a control group, for example) or across the organization (to give an overall sense of the productivity of various efforts). The system is designed to be both flexible and extensible, so that existing metrics can be customized as needed, and new metrics can be added over time.

A third aim of the project is to provide an intuitive workflow Workflow that can be used by any internal team to access and analyze the data required to evaluate an initiative. Metrics can be easily retrieved from the UserMetrics API home page, or via a client, that can automatically generate reports based on the metrics of interest.

User Tagging
The UserMetrics API is designed to leverage the information assigned and stored via userTags, which permit us to permanently associate an arbitrary set of metadata (e.g., “subject of e3 experiment”) to a registered user of a specific project (e.g., "enwiki"). Tags are associated with a userId at the time of account creation or at the time a user undergoes a specific treatment or participates in a given initiative, and are stored in a repository where they can be accessed by the UserMetricsAPI and used to generate   cohorts. Once a tag has been assigned, it cannot be removed or changed.

UserTags can represent any number of user attributes. Tags can identify users as experimental subjects, or users who have created accounts in response to either calls to action or outreach events, or users who are part of a specific program (e.g., Global Education). UserTags do not reflect data that is already captured by MediaWiki or any data that conflict with our privacy policy.

For more information about userTagging, please see: http://www.mediawiki.org/wiki/Usertagging.

Metrics Standardization
Standardized metrics can be applied to any Wikimedia project to help evaluate the impact of initiatives in an unambiguous and consistent way. The set of metrics used by the UserMetricsAPI can be used at the project level to measure the success of an experimental treatment or outreach initiative, or on the organizational level to compare the impact of projects across the organization. In each case, the qualities of interest—user retention or user contribution (quality, quantity, type)--are measured consistently and clearly defined so that all users can see what the numbers mean.

For more background information, please see: http://meta.wikimedia.org/wiki/Research:Metrics

Workflow
The UserMetricsAPI is designed to streamline the process of obtaining and analyzing the data needed to evaluate projects and initiatives. Any authorized UserMetrics user can access the API to generate reports, depending on permissions associated with her user account. Broadly, the workflow can be described as follows:


 * 1) 	Define cohorts. Cohorts can be defined by specifying custom lists of usernames/userIds, or by selecting and combining existing userTags. Examples of cohorts: Users in E3 experimental group, students enrolled in a Global Education class, VisualEditor adopters, new users registered on mobile devices.
 * 2) 	Measure the quality, productivity, or retention of these cohorts via a standard set of metrics:
 * 3) *	revert rate: proportion of reverted edits within 24 hours of registration.
 * 4) *	threshold: reached is a user makes 1 edit to the main namespace within 24 hours of registration
 * 5) *	blocks: number of times user blocked within 24 hours of registration
 * 6) *… or other metrics..
 * 7) 	Compare cohorts against each other or against a baseline:

Cohort
A cohort is a set of users sharing one or more property or attribute—the time of account creation, for example, or participation in an outreach event or experimental group.

The UserMetricsAPI generates cohorts based on userTag information. At its most basic, a cohort can be identified by a single userTag (e.g., “e3_experimental_group”). Cohorts can also be generated from a combination of multiple tags (“e3_experimental_group” and “e3_control_group”). Tags are combined using Boolean operators to reflect either the union or intersection of the groups. For more information, see multiple-tag cohorts.

Metric
Metrics are well-defined values or sets of values that can be computed for any user registered in Wikimedia projects, and are typically used in aggregate to compare different user groups (i.e., cohorts) against each other. The metrics computed by the UserMetricsAPI help us understand user activity and behavior--from the quality, quantity and type of user contribution, to how well our editors are retained. For example, we could look at the value of the “bytes_added” metric to see how many bytes of content a student has added to a given wiki in the last week, but if we are interested in evaluating the success of her class, we would more likely look at the number of bytes added by the entire class (i.e., the “enwiki_editing_class” cohort). In this case, the bytes_added metric is used to help determine if the class is successful. We could look at additional metrics to provide a fuller picture: the revert rate of student edits, for example, or the  survival rate of users in the student cohort. We can’t directly measure the class’s “success,” but we can measure a number of more concrete quantities that help us determine it and compare it with other classes or other similar initiatives.

All metrics are standardized and clearly defined so that we can easily understand what their values mean and consistently use the same standards to evaluate the efficacy of programs and initiatives over time. Note that metrics are dependent on the context in which they are measured and therefore only make sense in these contexts. An editor with a high revert rate could be a vandal, or an advanced user removing vandalized text. In the case of our class of new enwiki users, a high revert rate is more likely vandalism.

The value of a metric returned for each user may be defined (e.g., “true” to indicate that a user reached a threshold of 1 edit in her first 24 hours of account activity) or undefined, which would be the case if a user has not been active for a full 24 hours, and we do not yet know if she will reach the threshold or not. Defined values may be of different types: Boolean (a true or false value indicating whether a threshold has been reached, for example), integer (e.g., edit count), or float (e.g., proportion of reverted edits to total edits). The value of a metric may change over time. As a user makes additional edits, for example, the size of his contribution changes and the value of metrics, such as ‘bytes_added’ will change accordingly. However, once the time over which a given metric is defined has elapsed (e.g., the first week after registration), the metric should also return the same value.

The set of metrics supported by the UserMetricsAPI is in no way exhaustive. The system has been designed to be easily extensible, so that new metrics can be added and parameterized in different ways.

The backend
The backend of the UserMetrics API consists of two main pieces: the UserTag repository, which contains information about each userTag and the users to which it has been applied, and the UserMetrics engine, which receives metric request URLs and returns metric data as JSON objects.

The UserTag repository
The UserTag repository contains information about each user tag and the users to which it has been applied. In addition, the repository stores the name of the team using the tag, as well as the name of the cohort owner (the person responsible for maintaining cohort membership and keeping cohort information up-to-date). The repository consists of four MySQL tables (usertags_meta, ut_tags, api_user, and api_group). Because these tables may contain sensitive information, they are not publicly accessible. If you need access to specific information stored in the repository, please e-mail [mailto:usermetrics@wikimedia.org usermetrics@wikimedia.org].

The definition of each usertag, as well as other relevant metadata, is stored in the usertags_meta table:

Note that each tag has a unique Id (e.g., 64, in the above example) and a human-readable name (e3_ob4a). All tags are project-specific (e.g., ‘enwiki’). Currently, each usertag is applied to only one project; in the future, tags may be applied to several projects. In this case, the value of the ‘utm_project’ field would contain the names of all relevant projects. The usertags_meta table also contains a description of each tag and numerical codes representing the owner and the team using the tag. Owner and team names can be looked up in the api_user and api_group tables, respectively. The ‘utm_touched’ column represents the date the tag was most recently applied to a user. This information—the Ids of users associated with each tag—is stored in the ut_tag table. Finally, the ‘utm_enabled’ value indicates whether or not the tag is current. A value of ‘1’ indicates that the tag is relevant and should be included in the UserMetrics API application (where it can be selected from a menu and used to define a cohort). A value of ‘0’ indicates that the tag is no longer relevant, and should be archived. All tags—both current and archived—remain in the usertags_meta table.

The usertags table contains the name of the project (e.g., ‘enwiki’ or ‘arwiki’) and the userId of the users associated with each tag. Tags are identified by a unique numeric identifier referenced in the usertags_meta table; further information about each tag can be referenced in the usertags_meta table.

The api_user and api_group tables identify the groups and individuals using the repository. Each group (e.g., ‘e3’ or ‘mobile’) is assigned a unique numerical identification.