Moderator Tools/Automoderator/Measurement plan

This is a summary of the current draft of the Automoderator measurement plan, outlining how we will evaluate whether the project is successful in meeting its goals, and to understand what impact it is having on Wikimedia projects.

The page is divided into three hypotheses we have about Automoderator. Each hypothesis has two top-level data points (the most important numbers we're interested in) followed by a table detailing our current research questions and the evaluation methods or metrics we'll use to test them. The research questions are informed both by our internal discussions on the project, and conversations we have had with editors (e.g. here on MediaWiki).

This document is not fixed or final and will change as we learn more. Unfortunately we can't guarantee that this page will stay up to date following the initial community discussions we have about it. We may find that some questions are not feasible to answer with the available data, or might identify new questions we have further down the line. We aim to share any major changes in project updates.

We really want to know what you think about this plan on the project talk page - does this capture the main data points you think we should track? Is anything missing or do you have ideas we could incorporate? What data would help you decide whether this project was successful?

QN = Quantitative measure (data)

QL = Qualitative measure (e.g. surveys, unstructured feedback)

Hypothesis #1

Hypothesis: Automoderator will extend the reach of patrollers by reducing their overall workload in reviewing and reverting recent changes, and effectively enabling them to spend more time on other activities.

Top level data:

Automoderator has a baseline accuracy of 90%.
Moderator editing activity increases by 10% in non-patrolling workflows (e.g. content contributions or other moderation processes).

Research questions and evaluation methods
Research questions	Evaluation method/metric(s)	Notes
Is automoderator effective in countering vandalism on wikis? What is the efficiency of Automoderator in countering vandalism on wikis? To what extent does Automoderator minimize reader's exposure to vandalised content?	[QN] While the thresholds for success can vary based on the community, the team would consider the following as successes: Automoderator reverts X% of all actual vandalism Automoderator has a baseline accuracy of 90% when reverting vandalism	We don't yet know what a reasonable level of coverage is for Automoderator, so we will define X as we progress with the project. Each community will be able to customise the accuracy and coverage level for their community, so 90% would be a baseline figure applying to the most permissive option available.
	[QN] How long does vandalism stay in articles before being reverted, and how many readers see that vandalism. Average time for vandalism to be reverted Pageviews received by vandalised pages before revert	Pageview data is not currently available on a per-revision basis, but this is something we can start collecting (T346350).
Does Automoderator reduce the workload of human patrollers in countering vandalism?	[QN] Proportion of edits reverted back by Automoderator, human patrollers, and tool assisted human patrollers across the time periods of 1 hr, 8 hrs, 24 hrs, and 48 hrs, after an edit takes place.	'Tool assisted human patrollers' means patrollers using tools like Huggle and SWViewer.
	[QN/QL] Does the volume of various content moderation backlogs reduce? New pages to be patrolled RC patrol / FlaggedRevisions Others?	Here we are hypothesising that patrollers might spend their additional time in other venues. We may need to start with some qualitative research here to understand which backlogs we can/should monitor.
Does Automoderator help patrollers spend their time on other activities of their interest? Is there any significant shift in distribution between areas of activity on wiki by patrollers post-Automoderator?	[QN] Distribution of contributions/actions (pre and post deployment) by patrollers across: Tentative list of contributions Edits Content namespace Content contributions Non content contributions (categories, template tagging etc.) Non-content namespaces Talk page activity (+ village pump) Other edits % edits that are reverts among an editor’s contributions Average diff ‘size’ of edits (content namespaces) Log actions RC patrol (positive review edits as well as reverts; only on some wikis) New page patrol (if applicable) The patrollers of the pilot wikis will be surveyed to Identify the areas of contributions that they are currently engaged in Understand what patrollers would like to do if the overall load of recent changes patrol has reduced This will be used for comparison with the insights we get from data later on.	There are a wide range of possible ways to look at this, so we may need to speak to patrollers to understand which activities to consider.
	[QL] Perception of patrollers in how they are contributing to the wiki post-deployment. Qualitative changes in workflows compared to pre-Automoderator deployment. As in - are they actually doing non-patroller work or simply more specialized patroller work that Automoderator can’t handle?

Hypothesis #2

Hypothesis: Communities are enthusiastic to use and engage with Automoderator because they trust that it is effective in countering vandalism.

Top level data:

Automoderator is enabled on two Wikimedia projects by the end of FY23/24 (June 2024).
5% of patrollers engage with Automoderator tools and processes on projects where it is enabled.

Research questions and evaluation methods
Research questions	Evaluation method/metric(s)	Notes
Are communities enthusiastic to use Automoderator?	[QL] Sentiment towards Automoderator specifically and/or automated moderation tools broadly, both among administrators and non-administrator editors. [QL] Presence of custom documentation for Automoderator (e.g. guidance or guidelines on use) [QL] Uptake of Automoderator by specialized counter-vandalism groups (especially crosswiki ones) - stewards, global sysops, SWMT [QN] String (TranslateWiki) and documentation (MediaWiki) translation activity.
Are communities enthusiastic to use Automoderator?	[QN] Do communities enable Automoderator, and keep it enabled? If so, how long? What is the percentage of time the Automoderator is enabled on a wiki? If it is turned off, how long does it take before it's turned on? Is there any change in discussion activity (for example, talk page) to adjust the threshold during turn-off period? If Automoderator is turned off, why? We could add an intervention where we ask why and prompt for a response.
Are communities actively engaging with Automoderator because they believe it is an important part of their workflows?	Note: may change based on the final design/form Automoderator takes [QN] What proportion of false positive report logs are reviewed and are yet to be reviewed?
	Note: may change based on the final design/form Automoderator will take [QN] What is the usage of model exploration/visualisation tools? Number of unique users that accessed the tool Average time spent per session
	Note: may be expanded based on the final design/form Automoderator will take [QN] How often is Automoderator’s configuration adjusted? And by how many different administrators?	This may only be relevant when Automoderator is initially enabled and configured. After this we may not expect high activity levels.
Are communities able to understand the impact of Automoderator on the health of their community?	[QL] UX testing of Automoderator configuration page and dashboards (if relevant)	On our first pilot wikis we may need to simply have a json or similar page, before Community Configuration is ready to provide a better front-end experience.

Hypothesis #3

Hypothesis: When good faith edits are reverted by Automoderator, the editors in question are able to report false positives, and the revert actions are not detrimental to the editors’ journey, because it is clear that Automoderator is an automated tool which is not making a judgement about them individually.

Note: As editors’ experiences and journeys widely vary based on device, the following metrics where relevant should be split by platform and device.

Top level data:

90% of false positive reports receive a response or action from another editor.

Research questions and evaluation methods
Research questions	Evaluation method/metric(s)	Notes
Are good faith editors aware of the reverts made by Automoderator and able to report if they believe it is a false positive?	[QL/QN] What is the perception of good faith newcomers when their edit has been reverted by Automoderator? Are they aware of what an Automoderator is? Are they aware that their edit has been reverted? Are they aware of the reporting workflow? Have they been successful in filing a report?	This may be a survey, interviews, or using QuickSurveys.
Are users who intend to submit a false positive report able to successfully submit one?	[QN] What proportion of users who have started the report filing process completed it? Where is user drop off taking place for users who weren’t able to complete the process? What’s a “baseline” for false positive report frequency based on existing anti-vandalism bots? Who files these report processes - the person whose edit was reverted, someone with patroller rights, or someone else? [QL] UX testing of the false positive reporting stream.
What is the effect of Automoderator in new editors’ contribution journey? Is it detrimental in nature or not?	[QN] A/B experiment: Automoderator will randomly choose between taking and not taking a revert action on a newcomer (details to be defined). The treatment group will be newcomers on whom Automoderator takes a revert action on, and the control group will be newcomers on whom Automoderator should have taken a revert action on (based on the revert risk score) but hasn't, as part of the experiment, but were later taken action on by human moderators. [QL] Quicksurveys or similar short survey tool may be feasible. Do editors whose edits are reverted by Automoderator understand what vandalism is? Do they agree with its assessment of their edit? What impact does this have on their editing motivations?	Retention and surveying new editors is hard, but we have a lot of experience with this at the Wikimedia Foundation in the Growth team. We will be meeting with them to learn more about the options we have for evaluating this research question.

Guardrails

In addition to this goal-focused measurement plan, we are also planning to define 'guardrails' - metrics that we will monitor to ensure we're avoiding negative impacts of Automoderator. For example, do fewer new editors stick around because Automoderator reverts are frustrating, or do patrollers become too complacent because they put too much trust in Automoderator? These guardrails have not yet been documented, but we'll share them here when they have.

If you have thoughts about what could go wrong with this project, and data points we could be monitoring to verify these scenarios, please let us know.

Pilot phase metrics

While the measurement plan can be helpful to understand and evaluate the impact of the project in the long term, we have identified some metrics to focus on for the pilot phase. The goal of these is to provide an overview of Automoderator's activity to the team and also the community, and monitoring to making sure that nothing abnormal. If you have suggestions for any other metrics that we should be tracking during the pilot phase, please leave a message on the talk page.

Indicator for	Metric(s)	Dimensions
Volume	Number of edits being reverted by Automoderator (absolute & percentage of all reverts)	Anonymous users, newcomers^[1], non-newcomers^[2]
Accuracy (False positives)	Percentage of Automoderator's reverts reverted back	Anonymous users, newcomers^[1], non-newcomers^[2]
Accuracy (False negatives)	Proportion of reverts not performed by Automoderator while it is turned on	-
Efficiency	Average time taken for Automoderator to revert an edit	-
-	Average time taken for Automoderator's reverts to be reverted back	-
Guardrail	Post deployment, proportion of edits reverted by performer	Automoderator, humans, and tool-assisted humans (if applicable)

Notes

↑ Users has made less than 50 edits, and the account age is less than 30 days.
↑ All registered users apart from newcomers.

[new-1] Users has made less than 50 edits, and the account age is less than 30 days.

[non_new-2] All registered users apart from newcomers.

[1]

[2]