Talk:Moderator Tools/Automoderator

Concern re. False Positives
Hi! Thanks for your work. As prompted on the project page, I will be providing answers to some of your questions. For context, I am mostly active on the English Wikipedia, and I am not an admin. However, I have dealt with my fair share of vandalism, and have seen what ClueBot NG can do.

Problem: My biggest concern with this kind of tool is how false positive reporting is handled. New user retention is incredibly important. Many reverts by ClueBot NG are of edits made by relatively new users. The notice that ClueBot NG provides is already a little intimidating, and reporting false positives requires interface literacy and confidence that new users often don't have.

Suggestion: I think it's important that the revert notice placed on talk pages is reasonably friendly, and that reporting false positives is very easy. I imagine a little box that says "click here to review your edit and report a false positive" that opens a pop-up containing a very simple diff viewer and a clearly labelled option like "I think this edit was constructive and the revert was a false positive", or something along those lines. Reporting false positives needs to be almost ridiculously easy to ensure that affected newcomers actually do so and don't get discouraged from editing. Of course, this needs to be balanced against the workload of false positive review. On that, I have two comments:
 * From what I have seen, true vandals are not particularly concerned with not being blocked. They rarely ask to have blocks reviewed, for example. They make their disruptive edits until they're blocked, and then that's it. The people who actually engage with User Warnings, for example, are mostly confused or overwhelmed, but rarely editing truely in bad faith. I don't think that there would be significant abuse of false positive reporting. The vast majority of vandals don't care enough to engage with the anti-vandalism tools and warnings in any meaningful way. However, I don't know what the current situation with ClueBot NG is; the share of false positive reports that are abusive would be an interesting fact to consider here, but I just don't know it.
 * This applies more so to Wikis that don't have automated anti-vandalism yet: Even if 100% of users whose edits were reverted reported false positives, reviewing those will not be significantly more workload than manually reviewing the edits in question in the first place.

That's my two cents. Also, if there is a large volume of discussion and anyone wants to restructure the talk page here later on, feel free to move this around however you need. Actualcpscm (talk) 11:44, 1 June 2023 (UTC)


 * One thing I've learned working on abuse filters is that vandals, presented with obligatory "describe in a few words how you improved this article" rarely have much to say. They either leave or enter some nonsense as their edit summary. Edit summary is an extremely valuable signal, and it's too bad it's only optional. Most vandals (mostly IPs) that I see are school kids; they use wiki to chat, write silly things about their schoolmates, make comments how something they're learning is stupid; my fear is that they'll use the false positive reporting button as just another toy, and until they're blocked (or at least slowed down by the system) they'll make much fun with it and of us. I do feel sorry for a few constructive (IP) users who are unable to save their edits (we help them publish when we see those edits in edit filter logs), but I feel much more sorry for experienced users who'd burned out chasing vandals instead of working on the stuff they liked.
 * I do agree that notices should be less intimidating, and should probably depend on the severity of the offense (swear words & garbage vs. something more sensical). p o nor     (talk) 16:25, 1 June 2023 (UTC)
 * That's actually a really cool idea about the different notices. Maybe this could be implemented in accordance with the certainty that the edit was vandalism; if .95 < x < .98, then it's a nice warning, and if x > .98, it's more firm, or something like that. I'm making these numbers up, but you get the idea. I really like this.
 * Regarding edit summaries, my experience hasn't been quite so clear-cut. A lot of the slightly clever vandals, those who aren't caught by ClueBot NG immediately, leave very plausible edit summaries ("grammar", "fixed spelling mistake", "added references"), etc. I guess the clever vandals aren't the ones they would be targetting with this new system, though. It has been mentioned that its accuracy would likely be lower than ClueBot NG.
 * It's possible that the report button would be abused in the way that you describe, I just don't think it's very likely. I really wonder what the current situation with ClueBot NG's false positive reporting is. That might allow us to base our intuitions on some actual facts :) Actualcpscm (talk) 16:56, 1 June 2023 (UTC)

Model should include checkuser data
There's obviously a whole slew of privacy and security issues which would need to be addressed, but having the ability to include checkuser data in the model would be immensely powerful. There are certain patterns which are easily recognized as being indicative of disruptive behavior, such as rapidly changing IP address ranges, use of IP ranges which geolocate to many different countries, use of IP addresses which are known to host certain kinds of proxies, use of user agent strings which are clearly bogus, etc. These are obviously not definitive indicators, but are common patterns that checkusers look for and should be included in the training data. RoySmith (talk) 12:28, 1 June 2023 (UTC)


 * Hi! Here Diego from WMF Research. Both models are already considering user information such as account creation time, number of previous edits and the user groups that the editor belongs to. All this just based on public data, retrieved directly from the MediaWiki API. Diego (WMF) (talk) 14:39, 1 June 2023 (UTC)