Usertagging

What's usertagging
Usertagging is the equivalent of revtagging for user accounts. It's the ability to associate an arbitrary set of metadata (or tags) to a user_id at the time of the account creation for analytics purposes. Every account registration is stored in MediaWiki with a unique user_id in the user table: usertagging allows us to attach supplementary, schema-free data to a user_id.

What is it for
MediaWiki stores by default a series of data about registered users in the user table. The data held in this table is needed to operate the website. Some of this data (for example the user registration date, edit_count) is also used for analytics purposes, to study editor activity and retention. With several projects and editor engagement experiments generating new account registration, we need to store a richer set of user metadata to be able to assess the impact of these projects. Only data that is compliant with the Wikimedia Foundation's privacy policy will be stored.

The focus of usertagging is primarily on storing metadata about account registrations, but it can be in principle extended to include supplementary data (like participating in an experiment). The following are examples of user metadata that are currently not included in the user table and that we may need to capture as part of analytics for product experimentation and community outreach activities:


 * user 3456 signed up via the Article Feedback "signup" call-to-action
 * user 6789 was bucketed as a participant in experimental condition X of experiment Y
 * user 2468 created an account via the Global Education portal
 * user 1357 registered as part of an outreach event

What usertagging is not
Usertagging will not include user data that is already captured by MediaWiki (this needs to be expanded). It will not include any data that conflicts with our privacy policy.

Why do we need this
There are two primary use cases for usertagging:
 * pull from the database lists of user_id that correspond to specific campaigns or projects;
 * analyze the long-term effects of specific campaigns or experimental treatments on editor engagement or retention.

If we observe in the monthly editor metrics a sudden burst in the volume of active editors, we want to be able to identify if a specific project, initiative or campaign is associated with a group of particularly active editors. Being able to identify cohorts or groups of users by treatment will also allow us to compare them against a set of engagement metrics: this data will tell us, for example, that while group A and group B are indistinguishable in terms of aggregate volume of edits, they differ significantly in terms of quality of work as measured by aggregate revert rates.

E3 experiments

 * Motivation
 * Permanently associate user_ids with data about treatments they undergo


 * Required data
 * Campaign: Experiment
 * Subcampaign: Bucket
 * Refs
 * E3

Call to actions

 * Motivation
 * Tag users who registered as a result of a CTA, such as those used by Article Feedback


 * Required data
 * Campaign: AFT
 * Subcampaign: Signup-CTA
 * Additional data:
 * AFT post id
 * AFT page
 * Refs
 * Article feedback CTAs

Global education

 * Motivation
 * Measure productivity of students participating in the Global Education Program by discipline, college, course etc


 * Required data
 * Campaign: GlobalEd
 * Additional data:
 * tbd
 * Refs
 * http://outreach.wikimedia.org/wiki/Wikipedia_Education_Program

Outreach

 * Motivation
 * Measure activity of participants in outreach events such as editathons.


 * Required data
 * Campaign: Outreach
 * Additional data:
 * tbd
 * Refs
 * http://outreach.wikimedia.org

Editor growth and contribution program

 * Motivation
 * Permanently associate user_ids with data about their editing entry points

(Not sure if this is what is meant here...)
 * Required data
 * Campaign: Contribution
 * Sub-campaign: Project_Phase
 * Additional data:
 * Registration date
 * Referral URLs
 * Refs
 * Editor Growth and Contribution Program