Moderator Tools/Automoderator/Measurement plan

This is a summary of the Automoderator measurement plan, outlining how we will evaluate whether the project is successful in meeting its goals and understand what impact it is having on Wikimedia projects.

The page is divided into three hypotheses we have about how Automoderator will be impactful. Each hypothesis has a couple of top-level KRs (the main numbers we're interested in) followed by a table detailing our current research questions and their associated evaluation methods or metrics. This document is not fixed or final and will evolve as we learn more. We may find that some questions are not feasible to answer with the available data, or might identify new questions we have further down the line.

We really want to know what you think about this on the project talk page - does this capture the main data points you think we should track? Is anything missing or do you have ideas we could incorporate?

QN = Quantitative measure (data)

QL = Qualitative measure (e.g. surveys, unstructured feedback)

Hypotheses #1
Automoderator will extend the reach of patrollers by reducing their overall workload in reviewing and reverting recent changes, and effectively enabling them to spend more time on other activities.

Top level KRs:


 * Automoderator has a baseline accuracy of 90%.
 * Moderator editing activity increases by 10% in non-patrolling workflows (e.g. content contributions or other moderation processes).