Wikimedia Product/Analytics Infrastructure/Experiment Platform

Wishlist

 * Access to Ongoing experiments’ metadata across Foundation to avoid conflicts
 * Access to previous experiments and their outcomes (including decisions made based on results) for setting baselines and expectations of possible changes
 * A global queue of experiments to avoid clashing, users receiving multiple interventions/treatments which affects the results
 * Experiment targeting (specific users satisfying conditions/criteria)
 * E.g. users who have registered with a verified email address
 * Power analysis tools for determining sample sizes
 * Automatic determination of experiment duration based on data of new user registrations, how many existing registered editors there are, how many anonymous editors, how many readers or active users (in mobile apps sense)
 * Predictive models for determining who is the most likely to benefit from intervention to inform targeting (e.g. if a new editor is on a “likely to drop out (stop editing) trajectory” and then putting them into the treatment group)
 * Defining QA metrics and tools for implementing it (e.g. no sudden biases in sampling like IE users not represented)
 * Server-side sampling based on editing activity (for example) and then recording if someone is in an experiment
 * Different randomization strategies (e.g. not equal weights given to groups, treatment group bigger than control group)
 * Bandit optimization
 * Bayesian optimization
 * Dashboards for monitoring results of test
 * Automated report generation, and then it’s up to the data/product analyst to interpret the results in it for the product manager
 * Library of success metrics, their definitions, and implementations
 * Teams have KPIs and so when they deploy experiments they can specify which KPI is going to be impacted
 * POTENTIALLY we may want to use third-party tools like Optimizely
 * Require experiment design document/fill out form that includes questions on measurement, targeting/sampling
 * Cohorts of users (by week, by wiki)
 * Built-in multilevel modeling for cross-wiki, cross-cohort experiments
 * Sequential testing (cf. New Stats Engine whitepaper from Optimizely)
 * Why you won’t need to set a sample size in advance
 * How Stats Engine enables you to confidently check experiments as often as you want
 * How you can test as many variations and goals without worrying about hidden sources of error
 * Exporting a dataset of group assignments
 * Also IMPORTING a dataset of group assignments
 * Feed output of one experiment into another experiment???

= Next up = Interview with analysts to see what more could be included in this list