Talk:Moderator Tools/Automoderator

Concern re. False Positives
Hi! Thanks for your work. As prompted on the project page, I will be providing answers to some of your questions. For context, I am mostly active on the English Wikipedia, and I am not an admin. However, I have dealt with my fair share of vandalism, and have seen what ClueBot NG can do.

Problem: My biggest concern with this kind of tool is how false positive reporting is handled. New user retention is incredibly important. Many reverts by ClueBot NG are of edits made by relatively new users. The notice that ClueBot NG provides is already a little intimidating, and reporting false positives requires interface literacy and confidence that new users often don't have.

Suggestion: I think it's important that the revert notice placed on talk pages is reasonably friendly, and that reporting false positives is very easy. I imagine a little box that says "click here to review your edit and report a false positive" that opens a pop-up containing a very simple diff viewer and a clearly labelled option like "I think this edit was constructive and the revert was a false positive", or something along those lines. Reporting false positives needs to be almost ridiculously easy to ensure that affected newcomers actually do so and don't get discouraged from editing. Of course, this needs to be balanced against the workload of false positive review. On that, I have two comments:
 * From what I have seen, true vandals are not particularly concerned with not being blocked. They rarely ask to have blocks reviewed, for example. They make their disruptive edits until they're blocked, and then that's it. The people who actually engage with User Warnings, for example, are mostly confused or overwhelmed, but rarely editing truely in bad faith. I don't think that there would be significant abuse of false positive reporting. The vast majority of vandals don't care enough to engage with the anti-vandalism tools and warnings in any meaningful way. However, I don't know what the current situation with ClueBot NG is; the share of false positive reports that are abusive would be an interesting fact to consider here, but I just don't know it.
 * This applies more so to Wikis that don't have automated anti-vandalism yet: Even if 100% of users whose edits were reverted reported false positives, reviewing those will not be significantly more workload than manually reviewing the edits in question in the first place.

That's my two cents. Also, if there is a large volume of discussion and anyone wants to restructure the talk page here later on, feel free to move this around however you need. Actualcpscm (talk) 11:44, 1 June 2023 (UTC)


 * One thing I've learned working on abuse filters is that vandals, presented with obligatory "describe in a few words how you improved this article" rarely have much to say. They either leave or enter some nonsense as their edit summary. Edit summary is an extremely valuable signal, and it's too bad it's only optional. Most vandals (mostly IPs) that I see are school kids; they use wiki to chat, write silly things about their schoolmates, make comments how something they're learning is stupid; my fear is that they'll use the false positive reporting button as just another toy, and until they're blocked (or at least slowed down by the system) they'll make much fun with it and of us. I do feel sorry for a few constructive (IP) users who are unable to save their edits (we help them publish when we see those edits in edit filter logs), but I feel much more sorry for experienced users who'd burned out chasing vandals instead of working on the stuff they liked.
 * I do agree that notices should be less intimidating, and should probably depend on the severity of the offense (swear words & garbage vs. something more sensical). p o nor     (talk) 16:25, 1 June 2023 (UTC)
 * That's actually a really cool idea about the different notices. Maybe this could be implemented in accordance with the certainty that the edit was vandalism; if .95 < x < .98, then it's a nice warning, and if x > .98, it's more firm, or something like that. I'm making these numbers up, but you get the idea. I really like this.
 * Regarding edit summaries, my experience hasn't been quite so clear-cut. A lot of the slightly clever vandals, those who aren't caught by ClueBot NG immediately, leave very plausible edit summaries ("grammar", "fixed spelling mistake", "added references"), etc. I guess the clever vandals aren't the ones they would be targetting with this new system, though. It has been mentioned that its accuracy would likely be lower than ClueBot NG.
 * It's possible that the report button would be abused in the way that you describe, I just don't think it's very likely. I really wonder what the current situation with ClueBot NG's false positive reporting is. That might allow us to base our intuitions on some actual facts :) Actualcpscm (talk) 16:56, 1 June 2023 (UTC)
 * @Actualcpscm Thanks for taking the time to share your thoughts! I completely agree that false positives are something we need to think a lot about, to make sure that they have a minimal effect on good faith contributors. Some open questions on my mind include:
 * What notice should the user receive that their edit has been reverted - if any? We could imagine a notification, a talk page message, some other kind of UI popup, or something else entirely.
 * Should that notification provide an easy way to reinstate the edit, or contain a reporting mechanism so that an experienced editor can review and reinstate the edit if appropriate?
 * If other editors need to review false positives, how could we make that process engaging, so that it isn't abandoned?
 * In terms of how this works for ClueBot NG and other bots - I agree it would be useful to learn more about this. CBNG has a dedicated false positive review tool, but it's not clear to me whether anyone is actually reviewing these. I'm putting this on my TODO list for research for this project! Samwalton9 (WMF) (talk) 10:38, 2 June 2023 (UTC)
 * Samwalton9 (WMF) thanks for your questions!
 * I think talk page messages are pretty good, mostly because they're easy to spot and to handle. The big red notification on the bell is good UI design, it's clear even to people unfamiliar with Wikipedia. It might be nice to have some explanation about what user talk pages are, since a new editor might not be aware of them and just click on the notification.
 * I don't think there should be any direct way to reinstate the edit, that would invite abuse very openly. A reporting mechanism would be much better, imo.
 * I would suggest an interface similar to AntiVandal, Huggle, or WikiLoop DoubleCheck. It's basically the same activity ("evaluate this diff"), just with a slightly different context. Ideally, this should run in-browser (unlike Huggle) and be accessible to a large group of editors (e.g. extended-confirmed). If it's a very fast diff browser like Huggle, I think reports should be reviewed by at least two editors to ensure fairness. However, I'm not sure that recruiting editors into this would be successful.
 * What you bring up about CBNG's false positive review tool is concerning. I always assumed that the false positive reports get reviewed by someone, but it doesn't necessarily look like they do. The interface you linked to does not even provide the opportunity to do so directly, so I do wonder what is going on here. I will ask around about that. Actualcpscm (talk) 10:50, 2 June 2023 (UTC)

Model should include checkuser data
There's obviously a whole slew of privacy and security issues which would need to be addressed, but having the ability to include checkuser data in the model would be immensely powerful. There are certain patterns which are easily recognized as being indicative of disruptive behavior, such as rapidly changing IP address ranges, use of IP ranges which geolocate to many different countries, use of IP addresses which are known to host certain kinds of proxies, use of user agent strings which are clearly bogus, etc. These are obviously not definitive indicators, but are common patterns that checkusers look for and should be included in the training data. RoySmith (talk) 12:28, 1 June 2023 (UTC)


 * Hi! Here Diego from WMF Research. Both models are already considering user information such as account creation time, number of previous edits and the user groups that the editor belongs to. All this just based on public data, retrieved directly from the MediaWiki API. Diego (WMF) (talk) 14:39, 1 June 2023 (UTC)

General feedback

 * This is a wonderful idea; relying on individual community bots run by one or two users for such a central task has always worried me. ClueBot NG does a good job, but it's certainly aged and there are surely new algorithms and ideas that can perform even better. ToBeFree (talk) 17:28, 1 June 2023 (UTC)
 * @ToBeFree Thanks for the positive comment! I agree about the reliance on individual technical volunteers to maintain important tools. In terms of ClueBot NG, there's a good chance that this tool actually wouldn't be better than ClueBot in terms of accuracy, given that ClueBot has been trained on English Wikipedia specifically and has been learning for many many years. It still might be an improvement in terms of features, so maybe there's room for both tools to run alongside each other, or perhaps we can build functionality that ClueBot could also utilise. We'll have to investigate further once we get into community testing of the model. Samwalton9 (WMF) (talk) 10:41, 2 June 2023 (UTC)

Incorporating edit histories
RoySmith suggested just above that the model should include checkuser data in its evaluation, which made me think that some other data external to the edit itself might be relevant. For example, if an account has 3 or 4 edits that were all identified as vandalism, and they make another edit, that is probably going to be vandalism too. Evaluating edit histories, especially checking for the percentage of edits identified as vandalism, might be something to consider for the model. I am assuming that there will be a user whitelist (like with ClueBot NG iirc) so that the model does not have to check the super extensive histories of long-term editors every time. Maybe ClueBot NG already incorporates this kind of data? I don't think so, though. Actualcpscm (talk) 17:33, 1 June 2023 (UTC)


 * @Actualcpscm This is a great point. A very simple implementation of this tool could simply be 'revert any edit above a threshold', but there's lots of contextual clues we can use to make better guesses about whether an edit should be reverted or not. The counter-example to the one you described would be something like 'an editor reverts their own edit'. By itself that might look like, for example, removal of content, but if it's their own edit we probably shouldn't be reverting that. If you have any other ideas like this I'd love to hear them, they're the kind of thing we might otherwise only start noticing once we look closer at individual reverts the tool proposes. Samwalton9 (WMF) (talk) 10:53, 2 June 2023 (UTC)

Revert vs. prevent or throttle
You say: "Reverting edits is the more likely scenario - preventing an edit [may] impact edit save times."

I don't know what changes then: if you're not preventing bad edits in real time, those edits will (have to) be seen and checked by RC patrollers. The biggest burden is not one bad-faith user doing one bad edit, but one user doing a quick series of bad edits (example, avoided the too-many-consonants AF), on one or more pages. I had to ask some global patrollers to stop reverting edits while the vandal is on the wiki because that only triggers more vandalism (today's example, see indiv. page edit histories). Unless you can block or slow down vandals in those 20 minutes, you better leave them alone.

The worst vandalisms, which are in most cases by IP or very new editors (2-3 hours max), can be handled by abuse filters, either by disallowing edits or blocking users. Abuse filters are not perfect, smarter vandals learn how to evade them, but they work in real time. I assume Automoderator is trained only on edits that went past the existing filters, so it's also unlikely that it'll ever replace them.

How about you don't touch any users with clean history and 66+ edits, revert bad edits of registered users with not so many edits whenever there's time, but prevent or throttle (really) bad edits by IPs and very new registered users in real time? I think that'd be most helpful! p o nor    (talk) 21:29, 1 June 2023 (UTC)


 * @Ponor Thanks for sharing your thoughts! I agree that preventing edits may be a more ideal situation, the problem is that these checks (like AbuseFilter) would need to happen every time someone clicks 'Save' on an edit, and prevent anything else from happening for that user before all the checks have happened. Filters already take quite a lot of that time, and running the edit through a model is likely to take even more time, in addition to other technical issues to consider. There's some discussion on this at Community Wishlist Survey 2022/Admins and patrollers/Expose ORES scores in AbuseFilter and the associated Phabricator ticket. This is something I'm still looking into, however (T299436).
 * You raise a helpful point about vandals continually reverting, in a way that wouldn't make this tool helpful, and I'd like to think more about how we could avoid those issues.
 * I do agree that we'll want to build in features/configuration for avoiding users with, for example, more than a specific number of total edits. That said, the model is already primarily flagging edits by unregistered and brand new users more than experienced editors, so this may not be necessary.
 * Samwalton9 (WMF) (talk) 11:11, 2 June 2023 (UTC)