Talk:Moderator Tools/Automoderator

Concern re. False Positives
Hi! Thanks for your work. As prompted on the project page, I will be providing answers to some of your questions. For context, I am mostly active on the English Wikipedia, and I am not an admin. However, I have dealt with my fair share of vandalism, and have seen what ClueBot NG can do.

Problem: My biggest concern with this kind of tool is how false positive reporting is handled. New user retention is incredibly important. Many reverts by ClueBot NG are of edits made by relatively new users. The notice that ClueBot NG provides is already a little intimidating, and reporting false positives requires interface literacy and confidence that new users often don't have.

Suggestion: I think it's important that the revert notice placed on talk pages is reasonably friendly, and that reporting false positives is very easy. I imagine a little box that says "click here to review your edit and report a false positive" that opens a pop-up containing a very simple diff viewer and a clearly labelled option like "I think this edit was constructive and the revert was a false positive", or something along those lines. Reporting false positives needs to be almost ridiculously easy to ensure that affected newcomers actually do so and don't get discouraged from editing. Of course, this needs to be balanced against the workload of false positive review. On that, I have two comments:
 * From what I have seen, true vandals are not particularly concerned with not being blocked. They rarely ask to have blocks reviewed, for example. They make their disruptive edits until they're blocked, and then that's it. The people who actually engage with User Warnings, for example, are mostly confused or overwhelmed, but rarely editing truely in bad faith. I don't think that there would be significant abuse of false positive reporting. The vast majority of vandals don't care enough to engage with the anti-vandalism tools and warnings in any meaningful way. However, I don't know what the current situation with ClueBot NG is; the share of false positive reports that are abusive would be an interesting fact to consider here, but I just don't know it.
 * This applies more so to Wikis that don't have automated anti-vandalism yet: Even if 100% of users whose edits were reverted reported false positives, reviewing those will not be significantly more workload than manually reviewing the edits in question in the first place.

That's my two cents. Also, if there is a large volume of discussion and anyone wants to restructure the talk page here later on, feel free to move this around however you need. Actualcpscm (talk) 11:44, 1 June 2023 (UTC)


 * One thing I've learned working on abuse filters is that vandals, presented with obligatory "describe in a few words how you improved this article" rarely have much to say. They either leave or enter some nonsense as their edit summary. Edit summary is an extremely valuable signal, and it's too bad it's only optional. Most vandals (mostly IPs) that I see are school kids; they use wiki to chat, write silly things about their schoolmates, make comments how something they're learning is stupid; my fear is that they'll use the false positive reporting button as just another toy, and until they're blocked (or at least slowed down by the system) they'll make much fun with it and of us. I do feel sorry for a few constructive (IP) users who are unable to save their edits (we help them publish when we see those edits in edit filter logs), but I feel much more sorry for experienced users who'd burned out chasing vandals instead of working on the stuff they liked.
 * I do agree that notices should be less intimidating, and should probably depend on the severity of the offense (swear words & garbage vs. something more sensical). p o nor     (talk) 16:25, 1 June 2023 (UTC)
 * That's actually a really cool idea about the different notices. Maybe this could be implemented in accordance with the certainty that the edit was vandalism; if .95 < x < .98, then it's a nice warning, and if x > .98, it's more firm, or something like that. I'm making these numbers up, but you get the idea. I really like this.
 * Regarding edit summaries, my experience hasn't been quite so clear-cut. A lot of the slightly clever vandals, those who aren't caught by ClueBot NG immediately, leave very plausible edit summaries ("grammar", "fixed spelling mistake", "added references"), etc. I guess the clever vandals aren't the ones they would be targetting with this new system, though. It has been mentioned that its accuracy would likely be lower than ClueBot NG.
 * It's possible that the report button would be abused in the way that you describe, I just don't think it's very likely. I really wonder what the current situation with ClueBot NG's false positive reporting is. That might allow us to base our intuitions on some actual facts :) Actualcpscm (talk) 16:56, 1 June 2023 (UTC)
 * @Actualcpscm Thanks for taking the time to share your thoughts! I completely agree that false positives are something we need to think a lot about, to make sure that they have a minimal effect on good faith contributors. Some open questions on my mind include:
 * What notice should the user receive that their edit has been reverted - if any? We could imagine a notification, a talk page message, some other kind of UI popup, or something else entirely.
 * Should that notification provide an easy way to reinstate the edit, or contain a reporting mechanism so that an experienced editor can review and reinstate the edit if appropriate?
 * If other editors need to review false positives, how could we make that process engaging, so that it isn't abandoned?
 * In terms of how this works for ClueBot NG and other bots - I agree it would be useful to learn more about this. CBNG has a dedicated false positive review tool, but it's not clear to me whether anyone is actually reviewing these. I'm putting this on my TODO list for research for this project! Samwalton9 (WMF) (talk) 10:38, 2 June 2023 (UTC)
 * Samwalton9 (WMF) thanks for your questions!
 * I think talk page messages are pretty good, mostly because they're easy to spot and to handle. The big red notification on the bell is good UI design, it's clear even to people unfamiliar with Wikipedia. It might be nice to have some explanation about what user talk pages are, since a new editor might not be aware of them and just click on the notification.
 * I don't think there should be any direct way to reinstate the edit, that would invite abuse very openly. A reporting mechanism would be much better, imo.
 * I would suggest an interface similar to AntiVandal, Huggle, or WikiLoop DoubleCheck. It's basically the same activity ("evaluate this diff"), just with a slightly different context. Ideally, this should run in-browser (unlike Huggle) and be accessible to a large group of editors (e.g. extended-confirmed). If it's a very fast diff browser like Huggle, I think reports should be reviewed by at least two editors to ensure fairness. However, I'm not sure that recruiting editors into this would be successful.
 * What you bring up about CBNG's false positive review tool is concerning. I always assumed that the false positive reports get reviewed by someone, but it doesn't necessarily look like they do. The interface you linked to does not even provide the opportunity to do so directly, so I do wonder what is going on here. I will ask around about that. Actualcpscm (talk) 10:50, 2 June 2023 (UTC)

Model should include checkuser data
There's obviously a whole slew of privacy and security issues which would need to be addressed, but having the ability to include checkuser data in the model would be immensely powerful. There are certain patterns which are easily recognized as being indicative of disruptive behavior, such as rapidly changing IP address ranges, use of IP ranges which geolocate to many different countries, use of IP addresses which are known to host certain kinds of proxies, use of user agent strings which are clearly bogus, etc. These are obviously not definitive indicators, but are common patterns that checkusers look for and should be included in the training data. RoySmith (talk) 12:28, 1 June 2023 (UTC)


 * Hi! Here Diego from WMF Research. Both models are already considering user information such as account creation time, number of previous edits and the user groups that the editor belongs to. All this just based on public data, retrieved directly from the MediaWiki API. Diego (WMF) (talk) 14:39, 1 June 2023 (UTC)

General feedback

 * This is a wonderful idea; relying on individual community bots run by one or two users for such a central task has always worried me. ClueBot NG does a good job, but it's certainly aged and there are surely new algorithms and ideas that can perform even better. ToBeFree (talk) 17:28, 1 June 2023 (UTC)
 * @ToBeFree Thanks for the positive comment! I agree about the reliance on individual technical volunteers to maintain important tools. In terms of ClueBot NG, there's a good chance that this tool actually wouldn't be better than ClueBot in terms of accuracy, given that ClueBot has been trained on English Wikipedia specifically and has been learning for many many years. It still might be an improvement in terms of features, so maybe there's room for both tools to run alongside each other, or perhaps we can build functionality that ClueBot could also utilise. We'll have to investigate further once we get into community testing of the model. Samwalton9 (WMF) (talk) 10:41, 2 June 2023 (UTC)

Incorporating edit histories
RoySmith suggested just above that the model should include checkuser data in its evaluation, which made me think that some other data external to the edit itself might be relevant. For example, if an account has 3 or 4 edits that were all identified as vandalism, and they make another edit, that is probably going to be vandalism too. Evaluating edit histories, especially checking for the percentage of edits identified as vandalism, might be something to consider for the model. I am assuming that there will be a user whitelist (like with ClueBot NG iirc) so that the model does not have to check the super extensive histories of long-term editors every time. Maybe ClueBot NG already incorporates this kind of data? I don't think so, though. Actualcpscm (talk) 17:33, 1 June 2023 (UTC)


 * @Actualcpscm This is a great point. A very simple implementation of this tool could simply be 'revert any edit above a threshold', but there's lots of contextual clues we can use to make better guesses about whether an edit should be reverted or not. The counter-example to the one you described would be something like 'an editor reverts their own edit'. By itself that might look like, for example, removal of content, but if it's their own edit we probably shouldn't be reverting that. If you have any other ideas like this I'd love to hear them, they're the kind of thing we might otherwise only start noticing once we look closer at individual reverts the tool proposes. Samwalton9 (WMF) (talk) 10:53, 2 June 2023 (UTC)
 * Hi again Samwalton9 (WMF). I'm pinging you because I assume you have a lot going on, but if you prefer that I don't, just let me know.
 * Outside checkuser data, edit histories, and administrative logs, there isn't much data on most editors (and that's a good thing). A closely related consideration would be the user logs, specifically block logs and filter logs. If an IP has been temp-blocked 3 times and tagged as belonging to a school district, a spike in suspicious activity will most likely not be made up of constructive editing. That seems like a reasonable factor to incorporate into the model.
 * If I had to come up with something more, the time of day in the editor's timezone might be relevant. Maybe vandalism is more prevalent in the evenings (think alcohol consumption) or at night? This hypothesis is a stretch, I think the only way to figure this out would be testing if allowing the model access to such data makes it more accurate. I don't know if there actuall is a ToD - vandalism correlation, I'm just hypothesizing.
 * I don't know exactly how you will create this tool. If you're planning on creating an artifical neural network (like CBNG uses right now), it might be worth trying some wacky stuff. From my very limited knowledge, it's not unusual to just throw all available data at an ANN and see how it behaves. That's what I was imagining above, too; the data we mentioned, like edit histories, is fed into the model and considered alongside the specific edit being evaluated. The decision then just depends on the score provided by the ANN.
 * However, if you're expecting these to be manually written checks, I would forget time of day and filter logs. For a human coder, that would be way too much effort for a minute accuracy boost, if anything.
 * On a related note: I'm not sure if CBNG does this outside of the editor whitelist, but Huggle records user scores. Spitballing again: edits that just barely pass the check cause the user to be assigned a suspicion score, which is incorporated into the model. For the first edit a user makes, this wouldn't make any difference (since their suspicion score would be 0), but if they make 5 edits that scrape by by the skin of their teeth, maybe there's something fishy going on ---> The higher a user's suspicion score, the lower the threshold of certainty needed to revert / warn (or, in the ANN, the higher the likelihood that this is in fact vandalism).
 * Also, if we're in an ideal world, the model could consider the topic of the article where the edit was made in relation to the edit's contents. If an edit to an article about racial slurs includes a racial slur, that's less indicative of vandalism than the same slur being added to a small BLP or science-related article. On the other hand, I'm sure that some article categories generally have a bigger problem with vandalism than others. More data for the ANN! Actualcpscm (talk) 11:48, 2 June 2023 (UTC)

Revert vs. prevent or throttle
You say: "Reverting edits is the more likely scenario - preventing an edit [may] impact edit save times."

I don't know what changes then: if you're not preventing bad edits in real time, those edits will (have to) be seen and checked by RC patrollers. The biggest burden is not one bad-faith user doing one bad edit, but one user doing a quick series of bad edits (example, avoided the too-many-consonants AF), on one or more pages. I had to ask some global patrollers to stop reverting edits while the vandal is on the wiki because that only triggers more vandalism (today's example, see indiv. page edit histories). Unless you can block or slow down vandals in those 20 minutes, you better leave them alone.

The worst vandalisms, which are in most cases by IP or very new editors (2-3 hours max), can be handled by abuse filters, either by disallowing edits or blocking users. Abuse filters are not perfect, smarter vandals learn how to evade them, but they work in real time. I assume Automoderator is trained only on edits that went past the existing filters, so it's also unlikely that it'll ever replace them.

How about you don't touch any users with clean history and 66+ edits, revert bad edits of registered users with not so many edits whenever there's time, but prevent or throttle (really) bad edits by IPs and very new registered users in real time? I think that'd be most helpful! p o nor    (talk) 21:29, 1 June 2023 (UTC)


 * @Ponor Thanks for sharing your thoughts! I agree that preventing edits may be a more ideal situation, the problem is that these checks (like AbuseFilter) would need to happen every time someone clicks 'Save' on an edit, and prevent anything else from happening for that user before all the checks have happened. Filters already take quite a lot of that time, and running the edit through a model is likely to take even more time, in addition to other technical issues to consider. There's some discussion on this at Community Wishlist Survey 2022/Admins and patrollers/Expose ORES scores in AbuseFilter and the associated Phabricator ticket. This is something I'm still looking into, however (T299436).
 * You raise a helpful point about vandals continually reverting, in a way that wouldn't make this tool helpful, and I'd like to think more about how we could avoid those issues.
 * I do agree that we'll want to build in features/configuration for avoiding users with, for example, more than a specific number of total edits. That said, the model is already primarily flagging edits by unregistered and brand new users more than experienced editors, so this may not be necessary.
 * Samwalton9 (WMF) (talk) 11:11, 2 June 2023 (UTC)
 * Thanks, @Sam. Is there any estimate of the time needed for a check? I'm thinking that these models are way more optimized than AbuseFilters, of which there are tens or hundreds running in sequence, indiscriminately. Automoderator could run on some 20% of all edits (~IP users edits rate from the main page here); you'll make those editors wait for a fraction of a second, but if it's helping experienced users moderate the content you'll save many seconds of their time. Your link cluebot FP tells me you should focus solely on most blatant vandalisms, users who make many consecutive unconstructive edits that cannot be caught by AF even in principle. Ever since we started more aggressive filtering I've been checking AF logs on a daily basis, and I don't think anyone else does. Add one more log (false positives + falsely claimed false positives), there will be ~no one to watch it: at some point you get so overwhelmed with stupid work that you start caring more about your time than the time of those bypassers. I know this sounds rough, but it's just the way it is. p o nor     (talk) 11:54, 2 June 2023 (UTC)
 * @Ponor I've asked another WMF team to look into how much time this would take in T299436, so hopefully we'll know before we make a final decision on this. I agree with the concern about very few people checking false positive logs - creating another new venue to review edits isn't ideal for the reasons you describe. We'll have to think about this in more detail. Samwalton9 (WMF) (talk) 12:13, 2 June 2023 (UTC)