Moderator Tools/Automoderator

Groupe :	Moderator Tools
Membres de l'équipe :	Claudia Lo (design researcher), Jason Sherman (software engineer), Susana Cardenas Molinar (software engineer), Dennis Mburugu (engineering manager), Aishwarya Vardhana (designer), Krishna Chaitanya Velaga (analyst)
Direction :	Sam Walton (product manager)

This page is a translated version of the page Moderator Tools/Automoderator and the translation is 46% complete.

Automodération

Plan de mesure

Tests

Déploiement

L'équipe Moderator Tools travaille actuellement sur un projet concernant l'élaboration d'un outil d'automodération pour les projets Wikimedia, Il permettra aux modérateurs de configurer la prévention ou la réversion automatique des mauvaises modifications en fonction du score d'un modèle d'apprentissage automatique. En termes plus simples, nous construisons un logiciel qui remplit une fonction similaire à celle des robots luttant contre le vandalisme tels que ClueBot NG, SeroBOT, Dexbot et Salebot, et disponible pour toutes les communautés linguistiques. A MediaWiki extension is now under development - Extension:AutoModerator .

Notre hypothèse est que si nous permettons aux communautés de prévenir et annuler le vandalisme évident, les modérateurs auront plus de temps à consacrer aux autres activités.

Nous allons étudier et explorer cette idée pendant le reste de l'année 2023, et nous nous attendons à pouvoir commencer les travaux d'ingénierie au début de l'année civile 2024.

Dernière mise à jour (février 2024) : Des maquettes ont été postées pour la version initiale des pages d'accueil et de configuration. Les idées et suggestions sont les bienvenues !

Mises à jour précédentes

février 2024: Nous avons mis à disposition les résultats initiaux de notre processus de test.
octobre 2023: We are looking for input and feedback on our measurement plan, to decide what data we should use to evaluate the success of this project, and have made testing data available to collect input on Automoderator's decision-making.
août 2023: Nous avons récemment présenté ce projet au Wikimania ainsi que d'autres projets axés sur la modération. Voir l'enregistrement de la session.

Motivation

Présentation Wikimania (13:50)

Un nombre important de modifications sont effectuées sur les projets Wikimedia qui pourraient être annulées sans ambiguïté, en ramenant les pages à leur état précédent. Patrollers and administrators have to spend a lot of time manually reviewing and reverting these edits, which contributes to a feeling on many larger wikis that there is an overwhelming amount of work requiring attention compared to the number of active moderators. Nous souhaitons réduire ces surcharges et ainsi libérer du temps pour que les modérateurs puissent travailler sur d'autres tâches.

Appel de la communauté Wikipedia indonésienne (11:50)

Many online community websites, including Reddit, Twitch, and Discord, provide 'automoderation' functionality, whereby community moderators can set up a mix of specific and algorithmic automated moderation actions. On Wikipedia, AbuseFilter provides specific, rules-based, functionality, but can be frustrating when moderators have to, for example, painstakingly define a regular expression for every spelling variation of a swear word. It is also complicated and easy to break, causing many communities to avoid using it. At least a dozen communities have anti-vandalism bots, but these are community maintained, requiring local technical expertise and usually having opaque configurations. These bots are also largely based on the ORES damaging model, which has not been trained in a long time and has limited language support.

Buts

Reduce moderation backlogs by preventing bad edits from entering patroller queues.
Give moderators confidence that automoderation is reliable and is not producing significant false positives.
Ensure that editors caught in a false positive have clear avenues to flag the error / have their edit reinstated.

Y a-t-il d'autres points à prendre en considération ?

Recherche d'architecture

We delved into a comprehensive design research process to establish a strong foundation for the configuration tool for Automoderator. At the core of our approach is the formulation of essential design principles for shaping an intuitive and user-friendly configuration interface.

We looked at existing technologies and best practices and this process is known as desk research. This allowed us to gain valuable insights into current trends, potential pitfalls, and successful models within the realm of automated content moderation. We prioritized understanding the ethical implications of human-machine learning interaction, and focused on responsible design practices to ensure a positive and understandable user experience. We honed in on design principles that prioritize transparency, user empowerment, and ethical considerations.

Modèle

This project will leverage the new revert risk models developed by the Wikimedia Foundation Research team. Il existe deux versions pour ce modèle :

A multilingual model, with support for 47 languages.
Un modèle qui ne dépend pas de la langue.

These models can calculate a score for every revision denoting the likelihood that the edit should be reverted. We envision providing communities with a way to set a threshold for this score, above which edits would be automatically prevented or reverted.

The models currently only support Wikipedia, but could be trained on other Wikimedia projects. Additionally they are currently only trained on the main (article) namespace. Once deployed, we could re-train the model on an ongoing basis as false positives are reported by the community.

Before moving forward with this project we would like to provide opportunities for testing out the model against recent edits, so that patrollers can understand how accurate the model is and whether they feel confident using it in the way we're proposing.

Vous avez des remarques à propos de ces modèles ?
What percentage of false positive reverts would be the maximum you or your community would accept?

Solution potentielle

We are envisioning a tool which could be configured by a community's moderators to automatically prevent or revert edits. Reverting edits is the more likely scenario - preventing an edit requires high performance so as not to impact edit save times. Additionally, it provides less oversight of what edits are being prevented, which may not be desirable, especially with respect to false positives. Moderators should be able to configure whether the tool is active or not, have options for how strict the model should be, determine the localised username and edit summary used, and more.

Lower thresholds would mean more edits get reverted, but the false positive rate is higher, while a high threshold would revert a smaller number of edits, but with higher confidence.

While the exact form of this project is still being explored, the following are some feature ideas we are considering, beyond the basics of preventing or reverting edits which meeting a revert risk threshold.

Tests

If communities have options for how strict they want the automoderator to be, we need to provide a way to test those thresholds in advance. This could look like AbuseFilter’s testing functionality, whereby recent edits can be checked against the tool to understand which edits would have been reverted at a particular threshold.

Quelle importance a pour vous ce type de test de fonctionnalité ? Y a-t-il des tests de fonctionnalité que vous trouveriez particulièrement utiles ?

Configuration de la communauté

A core aspect of this project will be to give moderators clear configuration options for setting up the automoderator and customising it to their community’s needs. Rather than simply reverting all edits meeting a threshold, we could, for example, provide filters for not operating on editors with certain user groups, or avoiding certain pages.

What configuration options do you think you would need before using this software?

Remontée des faux positifs

Machine learning models aren't perfect, and so we should expect that there will be a non-zero number of false positive reverts. There are at least two things we need to consider here: the process for a user flagging that their edit was falsely reverted so it can be reinstated, and providing a mechanism for communities to provide feedback to the model over time so that it can be re-trained.

The model is more sensitive to edits from new and unregistered users, as this is where most vandalism comes from. We don't want this tool to negatively impact the experience of good faith new users, so we need to create clear pathways for new users to understand that their edit has been reverted, and be able to reinstate it. This needs to be balanced with not providing easy routes for vandals to undo the tool's work, however.

Although these models have been trained on a large amount of data, false positive reporting by editors can provide a valuable dataset for ongoing re-training of the model. Nous devons trouver comment permettre aux éditeurs expérimentés de renvoyer des données faussement positives au modèle afin qu'il puisse s'améliorer au fil du temps.

How could we provide clear information and actions for editors on the receiving end of a false positive, in a way which isn’t abused by vandals?
Quelles sont vos remarques à propos des faux positifs ?

Maquettes

Our current plans for Automoderator have two UI components:

Page d'accueil.

A landing page with information about Automoderator, a way to appeal the bot’s decisions, and a link to configure the bot.

Page de configuration.
Modifier la page de configuration.
Enregistrer les modifications de la page de configuration.

The configuration page, which will be generated by Community Configuration . In the MVP, admins will be able to turn Automoderator on or off, configure its threshold (i.e. how it should behave), and customize its default edit summary and username. We anticipate that we'll add more configuration options over time in response to feedback. Once the page is saved, if the user has turned Automoderator on, it will start running immediately.

Autres questions ouvertes

If your community uses a volunteer-maintained anti-vandalism bot, what has your experience of that bot been? Quel serait votre sentiment si le fonctionnement s'arrêtait ?
Pensez-vous que votre communauté l'utiliserait ? Quelle serait son intégration dans vos autres flux de travail et vos outils ?
Quel(s) point(s) devrions nous prendre en compte et que nous n'avons pas documenté(s) ci-dessus ?