Wikimedia Product/Analytics Infrastructure/Experiment Platform

This effectively describes the Sampling Controller. When a client queries the configuration API for sampling rate, it should send some minimally identifying information about itself (and the user) to receive a determination of enrollment in an online experiment (A/B test). On the backend, Product/Program Managers and Analysts define a target population to sample and target platforms. Additionally, the experiment design may require that an editor receives the same treatment on multiple platforms (any device they're logged in on).

The following are possible use cases:


 * Cross-platform intervention on a cohort of newly registered editors who have made fewer than 3 edits in the first month since creating their account
 * "Advertising" existence of Hindi & Marathi Wikipedias to Hindi & Marathi speakers (based on system/browser language settings) in India who are browsing English Wikipedia
 * Assessing how a change to how Chinese characters are displayed in a mobile app affects reading length/depth

To that end, we should consider having clients include the following information when requesting configuration settings, in addition to the "platform" & "client ID" that the client would send anyway:


 * USER: If logged in: {wiki, user ID} or username
 * REGION: Geographic region (at state level?)
 * LANGS: Ordered languages (top N?)
 * STREAMS (maybe?): List of streams the client can send events to (that Stream Manager has registered at initialization)
 * CONFIG (maybe?): Configuration hash for versioning

The backend uses this data to check whether the client is/should be enrolled in any experiments. Potential ideas for responses:


 * A single sampling rate (expressed as a probability, e.g.  or  ) which is used for every stream in STREAMS
 * A list of per-stream sampling rates (expressed as probabilities); e.g.
 * Optionally, the sampling controller responds with the group assignment. If the user is already enrolled in, say, a cross-platform experiment, then the tag is recalled from the database. If the user is not yet in an experiment but is on a platform with an active experiment, the sampling controller performs an enrollment random roll and if it enrolls the client, it follows-up with a group assignment random roll. It then saves the {client_id, experiment_tag} pair in the database. If enrolled, the backend responds with the following experiment information:

{ "tag": "growth/some_experiment/some_group", "start_dt": "2019-05-01T00:00:01Z", "end_dt": "2019-05-14T23:59:59Z", "streams": ["stream1", "stream4"] }

This is remembered by the client (to the extent possible), especially the experiment's expiration date. Affected streams are tagged while the experiment is active, and the UI/UX is configured appropriately during the duration of the experiment. Once the  is reached, the relevant UI/UX should return to the default configuration and any events sent to the affected streams cease to be tagged as belonging to an experimental group.