Moderator Tools/Automoderator/Measurement plan

This is a summary of the current draft of the Automoderator measurement plan, outlining how we will evaluate whether the project is successful in meeting its goals and understand what impact it is having on Wikimedia projects.

The page is divided into three hypotheses we have about Automoderator. Each hypothesis has a couple of top-level KRs (the main numbers we're interested in) followed by a table detailing our current research questions and their associated evaluation methods or metrics. The research questions are informed both by our internal discussions on the project, and conversations we have had with editors (e.g. here on MediaWiki).

This document is not fixed or final and will evolve as we learn more - we can't guarantee that this page will stay up to date following the initial community discussions we have about it, but we will endeavour to at least detail changes in project updates. We may find that some questions are not feasible to answer with the available data, or might identify new questions we have further down the line.

We really want to know what you think about this on the project talk page - does this capture the main data points you think we should track? Is anything missing or do you have ideas we could incorporate? What does 'success' look like for you in this project?

QN = Quantitative measure (data)

QL = Qualitative measure (e.g. surveys, unstructured feedback)

Hypothesis #1
Automoderator will extend the reach of patrollers by reducing their overall workload in reviewing and reverting recent changes, and effectively enabling them to spend more time on other activities.

Top level KRs:


 * 1) Automoderator has a baseline accuracy of 90%.
 * 2) Moderator editing activity increases by 10% in non-patrolling workflows (e.g. content contributions or other moderation processes).

Hypothesis #2
Communities are enthusiastic to use and engage with Automoderator because they trust that it is effective in countering vandalism. Top level KRs:
 * 1) Automoderator is enabled on two Wikimedia projects by the end of FY23/24.
 * 2) 5% of patrollers engage with Automoderator tools and processes on projects where it is enabled.

Hypothesis #3
When good faith edits are reverted by Automoderator, the editors in question are able to report false positives, and the revert actions are not detrimental to the editors’ journey, because it is clear that Automoderator is an automated tool which is not making a judgement about them individually.

Note: As editors’ experiences and journeys widely vary based on device, the following metrics where relevant should be split by platform and device.

Top level KRs:


 * 1) 70% of users who start filing a false positive report are able to successfully post their report.
 * 2) 90% of false positive reports receive a response or action from another editor.

Guardrails
In addition to this goal-focused measurement plan, we are also planning to define 'guardrails' - metrics that we will monitor to ensure we're avoiding negative impacts of Automoderator. For example, do fewer new editors stick around because Automoderator reverts are frustrating, or do patrollers become too complacent because they put too much trust in Automoderator. These guardrails have not yet been documented, but we'll share them here when they have.

If you have thoughts about what could go wrong with this project, and data points we could be monitoring to verify these scenarios, please let us know.