Analytics/Wikimetrics/Help

About
Wikimetrics is a web application developed by the Wikimedia Foundation to measure user activity based on a set of standardized metrics. Using this system, a set of key metrics can be selected and applied to a cohort of users to measure their overall productivity. The system is designed for extensibility (creating new metrics, modifying metric parameters) and to support various types of cohort analysis and program evaluation in a user-friendly way. Reports are returned in both JSON and CVS format.

As of September 2013, Wikimetrics is used internally at the Wikimedia Foundation and by external customers, researchers, and community members. If you are interested in using Wikimetrics, you can try out the application here: http://metrics.wmflabs.org/.

Project and application home page
The project home page and Wikimetrics home page are here:

Project home page: https://www.mediawiki.org/wiki/Analytics/Wikimetrics

The project home page is the main developer hub for the project, hosting updates and resources for developers and end users.

Wikimetrics application home page: http://metrics.wmflabs.org/

The Wikimetrics home page is the public URL of the application. To use the application, log in using one of the supported services (e.g., a Google account). See Accessing Wikimetrics for more information.

Contributing to the project: code repository and bug reports
If you would like to contribute to the project, please see:

Code repository: http://git.wikimedia.org/summary/analytics%2Fwikimetrics.git (Internal git/gerrit repository)
 * https://github.com/wikimedia/analytics-wikimetrics (mirror on github)

Bug reports: Bugs and feature suggestions should be reported via Bugzilla.

https://bugzilla.wikimedia.org/buglist.cgi?component=Wikimetrics&product=Analytics

Contact us
To reach us and/or obtain help, please join our mailing list.

Wikimetrics mailing list: https://lists.wikimedia.org/mailman/listinfo/wikimetrics

Rationale
The Wikimetrics application grew out of a need to study data collected via user tagging, which is used to identify groups of users (i.e., ‘cohorts’) so that they can be studied collectively--all subjects of an experiment, for example, or all users who created accounts at an outreach event. The Wikimetrics application permits us to easily and efficiently generate reports that provide information about how a group of users behaves as a whole, for example, how quickly a group of users became productive editors, or how likely a group is to remain active over time.

A second important aim of the project is to develop a standardized set of metrics that permits everyone in the organization to have the same understanding of what we mean when we say “an editor has been retained” or “an editor is active.” Because the Wikimetrics application uses a standardized set of metrics, the reports generated by the system can be used together—either within a project (to compare an experimental group to a control group, for example) or across the organization (to give an overall sense of the productivity of various efforts). The system is designed to be both flexible and extensible, so that existing metrics can be customized as needed, and new metrics can be added over time.

A third aim of the project is to provide an intuitive workflow that can be used by any internal team to access and analyze the data required to evaluate an initiative. Metrics can be easily retrieved via the application’s user-friendly form interface.

User Tagging
The Wikimetrics application is designed to leverage the information assigned and stored via user tags, which permit us to permanently associate an arbitrary set of metadata (e.g., “subject of e3 experiment”) to a registered user of a specific project (e.g., "enwiki"). Tags are associated with a userId at the time of account creation or at the time a user undergoes a specific treatment or participates in a given initiative, and are stored in a repository where they can be accessed by the Wikimetrics application and used to generate cohorts. Once a tag has been assigned, it cannot be removed or changed.

User tags can represent any number of user attributes. Tags can identify users as experimental subjects, or users who have created accounts in response to either calls to action or outreach events, or users who are part of a specific program (e.g., Global Education). User tags do not reflect any data that conflicts with our privacy policy.

For more information about user tagging, please see: http://www.mediawiki.org/wiki/Usertagging.

Metrics Standardization
Standardized metrics can be applied to any Wikimedia project to help evaluate the impact of initiatives in an unambiguous and consistent way. The set of metrics  used by the Wikimetrics application can be used at the project level to measure the success of an experimental treatment or outreach initiative, or on the organizational level to compare the impact of projects across the organization. In each case, the qualities of interest—user retention or user contribution (quality, quantity, type)--are measured consistently and clearly defined so that all users can see what the numbers mean.

For more background information, please see: http://meta.wikimedia.org/wiki/Research:Metrics

Workflow
The Wikimetrics application is designed to streamline the process of obtaining and analyzing the data needed to evaluate projects and initiatives. Any authorized Wikimetrics user can use the system to generate reports, using cohorts she or he created. Broadly, the workflow can be described as follows:


 * 1) Define cohorts. Cohorts can be defined by specifying custom lists of usernames/userIds. Examples of cohorts:
 * 2) *Users in E3 experimental group
 * 3) *Students enrolled in a Global Education class
 * 4) *VisualEditor adopters
 * 5) *New users registered on mobile devices.
 * 6) Measure the quality, productivity, or retention of these cohorts via a standard set of metrics:
 * 7) *revert rate: proportion of reverted edits within 24 hours of registration
 * 8) *threshold: reached if a user makes 1 edit to the main namespace within 24 hours of registration
 * 9) *blocks: number of times user blocked within 24 hours of registration
 * 10) *… or other metrics..
 * 11) Compare cohorts against each other or against a baseline:

Cohort
A cohort is a set of users sharing one or more property or attribute—the time of account creation, for example, or participation in an outreach event or experimental group. The users in a cohort can belong to the same wiki project, or to different projects (enwiki, arwiki, etc). Examples of useful cohorts might be Wikipedia editors that participated in an outreach event, Wikimedia Commons users that are also active on other wikis, or users that underwent a particular treatment.

The Wikimetrics application generates cohorts based on user tag information. Each cohort is identified by a single user tag (e.g., “e3_experimental_group”). At this time, all cohorts are private; if you create a cohort by uploading a list of users via the application, for example, only you will have access to that information.

Metric
Metrics are well-defined values or sets of values that can be computed for any user registered in Wikimedia projects, and are typically used in aggregate to compare different user groups (i.e., cohorts) against each other. The metrics computed by the Wikimetrics application help us understand user activity and behavior--from the quality, quantity and type of user contribution, to how well our editors are retained. For example, we could look at the value of the “bytes_added” metric to see how many bytes of content a student has added to a given wiki in the last week, but if we are interested in evaluating the success of her class, we would more likely look at the number of bytes added by the entire class (i.e., the “enwiki_editing_class” cohort). In this case, the bytes_added metric is used to help determine if the class is successful. We could look at additional metrics to provide a fuller picture: the revert rate of student edits, for example, or the survival rate of users in the student cohort. We can’t directly measure the class’s “success,” but we can measure a number of more concrete quantities that help us determine it and compare it with other classes or other similar initiatives.

All metrics are standardized and clearly defined so that we can easily understand what their values mean and consistently use the same standards to evaluate the efficacy of programs and initiatives over time. Note that metrics are dependent on the context in which they are measured and therefore only make sense in these contexts. An editor with a high revert rate could be a vandal, or an advanced user removing vandalized text. In the case of our class of new enwiki users, a high revert rate is more likely vandalism. The value of a metric returned for each user may be defined (e.g., “true” to indicate that a user reached a threshold of 1 edit in her first 24 hours of account activity) or undefined, which would be the case if a user has not been active for a full 24 hours, and we do not yet know if she will reach the threshold or not. Defined values may be of different types: Boolean (a true or false value indicating whether a threshold has been reached, for example), integer (e.g., edit count), or float (e.g., proportion of reverted edits to total edits).

The value of a metric may change over time. As a user makes additional edits, for example, the size of his contribution changes and the value of metrics, such as ‘bytes_added’ will change accordingly. However, once the time over which a given metric is defined has elapsed (e.g., the first week after registration), the metric should also return the same value. The set of metrics supported by the Wikimetrics application is in no way exhaustive. The system has been designed to be easily extensible, so that new metrics can be added and parameterized in different ways. Metrics are easy to implement if you develop python or if you can show that a new type of measurement might be useful. In the latter case, either the analytics team or community members are likely to help you implement the new metric. To contribute code, please have a look at our repository.

Reports
The Wikimetrics application returns information in the form of a report. Reports contain the values of a selected metric for a specified cohort, as well as the settings used to generate the data. For example, a report might contain the number of new pages created by each member of a cohort over a two week period. The name of the cohort, metric, as well as the start and end date of the time interval will be included with the retrieved information.

Reports are available as either JSON or a CSV file. They are available for thirty days after generation, and can be accessed from the reports page (you must be logged in). If you would like to keep a report for longer than thirty days, please download and save it.

Technical overview
….To come...

The UserTag repository
….To come...

The Wikimetrics application
….To come...