Talk:Moderator Tools/Automoderator

Concern re. False Positives
Hi! Thanks for your work. As prompted on the project page, I will be providing answers to some of your questions. For context, I am mostly active on the English Wikipedia, and I am not an admin. However, I have dealt with my fair share of vandalism, and have seen what ClueBot NG can do.

Problem: My biggest concern with this kind of tool is how false positive reporting is handled. New user retention is incredibly important. Many reverts by ClueBot NG are of edits made by relatively new users. The notice that ClueBot NG provides is already a little intimidating, and reporting false positives requires interface literacy and confidence that new users often don't have.

Suggestion: I think it's important that the revert notice placed on talk pages is reasonably friendly, and that reporting false positives is very easy. I imagine a little box that says "click here to review your edit and report a false positive" that opens a pop-up containing a very simple diff viewer and a clearly labelled option like "I think this edit was constructive and the revert was a false positive", or something along those lines. Reporting false positives needs to be almost ridiculously easy to ensure that affected newcomers actually do so and don't get discouraged from editing. Of course, this needs to be balanced against the workload of false positive review. On that, I have two comments:
 * From what I have seen, true vandals are not particularly concerned with not being blocked. They rarely ask to have blocks reviewed, for example. They make their disruptive edits until they're blocked, and then that's it. The people who actually engage with User Warnings, for example, are mostly confused or overwhelmed, but rarely editing truely in bad faith. I don't think that there would be significant abuse of false positive reporting. The vast majority of vandals don't care enough to engage with the anti-vandalism tools and warnings in any meaningful way. However, I don't know what the current situation with ClueBot NG is; the share of false positive reports that are abusive would be an interesting fact to consider here, but I just don't know it.
 * This applies more so to Wikis that don't have automated anti-vandalism yet: Even if 100% of users whose edits were reverted reported false positives, reviewing those will not be significantly more workload than manually reviewing the edits in question in the first place.

That's my two cents. Also, if there is a large volume of discussion and anyone wants to restructure the talk page here later on, feel free to move this around however you need. Actualcpscm (talk) 11:44, 1 June 2023 (UTC)


 * One thing I've learned working on abuse filters is that vandals, presented with obligatory "describe in a few words how you improved this article" rarely have much to say. They either leave or enter some nonsense as their edit summary. Edit summary is an extremely valuable signal, and it's too bad it's only optional. Most vandals (mostly IPs) that I see are school kids; they use wiki to chat, write silly things about their schoolmates, make comments how something they're learning is stupid; my fear is that they'll use the false positive reporting button as just another toy, and until they're blocked (or at least slowed down by the system) they'll make much fun with it and of us. I do feel sorry for a few constructive (IP) users who are unable to save their edits (we help them publish when we see those edits in edit filter logs), but I feel much more sorry for experienced users who'd burned out chasing vandals instead of working on the stuff they liked.
 * I do agree that notices should be less intimidating, and should probably depend on the severity of the offense (swear words & garbage vs. something more sensical). p o nor     (talk) 16:25, 1 June 2023 (UTC)
 * That's actually a really cool idea about the different notices. Maybe this could be implemented in accordance with the certainty that the edit was vandalism; if .95 < x < .98, then it's a nice warning, and if x > .98, it's more firm, or something like that. I'm making these numbers up, but you get the idea. I really like this.
 * Regarding edit summaries, my experience hasn't been quite so clear-cut. A lot of the slightly clever vandals, those who aren't caught by ClueBot NG immediately, leave very plausible edit summaries ("grammar", "fixed spelling mistake", "added references"), etc. I guess the clever vandals aren't the ones they would be targetting with this new system, though. It has been mentioned that its accuracy would likely be lower than ClueBot NG.
 * It's possible that the report button would be abused in the way that you describe, I just don't think it's very likely. I really wonder what the current situation with ClueBot NG's false positive reporting is. That might allow us to base our intuitions on some actual facts :) Actualcpscm (talk) 16:56, 1 June 2023 (UTC)
 * Re: abusing the report button – I can provide some observations from patrolling the Spanish Wikipedia's edit filter false positive reports. Whenever a user hits a disallow filter, they are provided with the option to file a report, which creates a new post in this page. Believe it or not, some vandals will report being caught by a vandalism filter; out of the 100 or so reports currently on the page, about 15 are garden-variety vandalism. (A few users try disguising it by saying "I was making improvements to the page, don't know why it is being filtered", but most of the reports are equally as nonsensical as the original edits caught by the filter.) Anecdotical, but I hope it helps. :) -- FlyingAce (talk) 17:23, 5 June 2023 (UTC)
 * Thanks for this useful information @FlyingAce! Do you think that all false positives are reviewed, or do some languish without being looked at? Samwalton9 (talk) 17:47, 5 June 2023 (UTC)
 * @Samwalton9: More than a few of them do languish... I did a quick count and about 10 unanswered reports are more than 3 months old (a couple of them are from last year), about 40 are 1-2 months old and 20 or so were submitted within the last month (the remaining ~30 reports have been answered and will be archived soon). To be fair, this may not be representative of every project; the unanswered reports involve private filters, and eswiki does not have the edit filter helper permission, so these can only be answered by sysops or edit filter managers (and to be honest, admin noticeboards in general have been pretty backlogged lately).
 * I'm guessing that regular users would not be as limited to handle false positive reports from this new tool, since reverted edits are visible in the history; even if this were limited to, say, rollbackers, that would still be a lot more users than just sysops or EFMs. FlyingAce (talk) 18:35, 5 June 2023 (UTC)
 * Fyi: I checked the list of CBNG false positive reports, and some of the unreviewed ones are as old as 2013. It says here that these reports are supposed to be reviewed by admins, but I haven't been able to find an on-wiki page with this backlog. Actualcpscm (talk) 09:04, 6 June 2023 (UTC)
 * @Actualcpscm Yeah I'm hoping to find out more about this - the interface makes it very hard to understand how many reports are being actioned. I actually can't even log in to that tool, the OAuth flow seems to be broken for me. Can you? Samwalton9 (WMF) (talk) 11:50, 6 June 2023 (UTC)
 * There's another tool linked at User:ClueBot NG#Dataset Review Interface which is currently broken. Samwalton9 (WMF) (talk) 11:52, 6 June 2023 (UTC)
 * Samwalton9 (WMF) Same on my end, just an OAuth loop.
 * Quick summary of my experiences:
 * The link provided here, which is http://review.cluebot.cluenet.org/ (and already labelled as broken), gives me an NXDOMAIN error. The domain registration doesn't expire until January 2024, though.
 * There are two links at the reporting tutorial here. The one labelled "review interface" (https://cluebotng-review.toolforge.org/) just gets stuck loading with no content. The only way I've been able to access anything is by following the link intended for submitting a report (https://cluebotng.toolforge.org/) and navigating to the list, but as you mentioned, logging in doesn't work. Actualcpscm (talk) 12:29, 6 June 2023 (UTC)
 * @Actualcpscm Thanks for taking the time to share your thoughts! I completely agree that false positives are something we need to think a lot about, to make sure that they have a minimal effect on good faith contributors. Some open questions on my mind include:
 * What notice should the user receive that their edit has been reverted - if any? We could imagine a notification, a talk page message, some other kind of UI popup, or something else entirely.
 * Should that notification provide an easy way to reinstate the edit, or contain a reporting mechanism so that an experienced editor can review and reinstate the edit if appropriate?
 * If other editors need to review false positives, how could we make that process engaging, so that it isn't abandoned?
 * In terms of how this works for ClueBot NG and other bots - I agree it would be useful to learn more about this. CBNG has a dedicated false positive review tool, but it's not clear to me whether anyone is actually reviewing these. I'm putting this on my TODO list for research for this project! Samwalton9 (WMF) (talk) 10:38, 2 June 2023 (UTC)
 * Samwalton9 (WMF) thanks for your questions!
 * I think talk page messages are pretty good, mostly because they're easy to spot and to handle. The big red notification on the bell is good UI design, it's clear even to people unfamiliar with Wikipedia. It might be nice to have some explanation about what user talk pages are, since a new editor might not be aware of them and just click on the notification.
 * I don't think there should be any direct way to reinstate the edit, that would invite abuse very openly. A reporting mechanism would be much better, imo.
 * I would suggest an interface similar to AntiVandal, Huggle, or WikiLoop DoubleCheck. It's basically the same activity ("evaluate this diff"), just with a slightly different context. Ideally, this should run in-browser (unlike Huggle) and be accessible to a large group of editors (e.g. extended-confirmed). If it's a very fast diff browser like Huggle, I think reports should be reviewed by at least two editors to ensure fairness. However, I'm not sure that recruiting editors into this would be successful.
 * What you bring up about CBNG's false positive review tool is concerning. I always assumed that the false positive reports get reviewed by someone, but it doesn't necessarily look like they do. The interface you linked to does not even provide the opportunity to do so directly, so I do wonder what is going on here. I will ask around about that. Actualcpscm (talk) 10:50, 2 June 2023 (UTC)
 * My two cents re: CluebotNG. I'm a bit surprised people portray ClueBotNG as intimidating. When reverting, it says to the user:
 * Hello and welcome
 * One of your recent edits have been undone
 * Mistakes are rare, but acknowledges "it does happen"
 * Follow some reporting steps, and you can "make the edit again"
 * Maybe it's just that I'm an old school admin and we were never this nice to newbies. I have been impressed with the work of CluebotNG over the years. It is quite accurate, and I think just knowing there is an all-seeing eye in a bot tends to deter a lot of vandals.
 * That said, it may be useful for folks who chime in here to list any tools or processes they use in the role of moderation. It may be useful to the WMF team and I know I would find it personally interesting. I'll start.
 * Moderation user story for User:Fuzheado
 * When I'm in serious patrolling mode, I'll use Huggle. The keyboard shortcuts are intuitive and quick, and I can make it through dozens of pages per minute. I can rollback and warn someone (like so) with one tap of the "Q" key on the keyboard.
 * If I'm assuming good faith, I'll tap the "Y" key and it will allow me to type in a nicer, less harsh, personalized message such as "remove unsourced content." (like so)
 * Huggle is smart enough to read what warnings have been given already, so it can escalate with harsher language on the user's talk page if it's, say, the 3rd or 4th infraction. That's also what makes Huggle so useful – it's collaborative, and aware of other editors and admins patrolling and moderating. I think any tool that is being developed today needs to have the same type of mindset.
 * As an admin, I tend not to do my blocking of users or protection of pages from Huggle, however. Its options are not that advanced. Instead, I will open the problematic page or user page with Huggle's "O" key, and use Twinkle's more powerful menu options. Twinkle provides full access to all the blocking templates (found at en:Template:Uw-block/doc/Block_templates) as well as automating a lot of user talk page messaging, page protections, and other housekeeping functions. Example of Twinkle block . So I think this is also a lesson that could be learned - you don't need to reinvent the wheel, and if you can hook into existing tools and scripts, that should be seen as a win.
 * - Fuzheado (talk) 20:57, 3 June 2023 (UTC)
 * I didn't mean "intimidating" as in rude or dismissive, but more as in technologically challening, particularly the part about the log entry. It looks harmless to seasoned editors, but putting myself in the shoes of someone completely new to the entire platform, I can imagine that I'd be intimidated. Actualcpscm (talk) 21:12, 3 June 2023 (UTC)
 * @Fuzheado Thanks for your comments! I agree that, where possible, we should integrate with existing workflows. That might be challenging with regards to warning templates since they're different from one wiki to the next, but I can think of some ways we might be able to solve that issue. In terms of integration with Huggle/patrolling, one idea that crossed my mind was the following: if the model takes a little time (let's say a second) to check an edit and decide whether it's going to take an action or not, we could flag the edit in a way that tools like Huggle could avoid showing it to patrollers momentarily, inserting it into their queue only after the model has decided not to revert it. This could reduce the workload for patrollers who, as I understand it, at the moment will review and 'undo' an edit even if a bot like ClueBot NG already undid it. Could you see that being beneficial? Samwalton9 (WMF) (talk) 13:58, 5 June 2023 (UTC)

Model should include checkuser data
There's obviously a whole slew of privacy and security issues which would need to be addressed, but having the ability to include checkuser data in the model would be immensely powerful. There are certain patterns which are easily recognized as being indicative of disruptive behavior, such as rapidly changing IP address ranges, use of IP ranges which geolocate to many different countries, use of IP addresses which are known to host certain kinds of proxies, use of user agent strings which are clearly bogus, etc. These are obviously not definitive indicators, but are common patterns that checkusers look for and should be included in the training data. RoySmith (talk) 12:28, 1 June 2023 (UTC)


 * Hi! Here Diego from WMF Research. Both models are already considering user information such as account creation time, number of previous edits and the user groups that the editor belongs to. All this just based on public data, retrieved directly from the MediaWiki API. Diego (WMF) (talk) 14:39, 1 June 2023 (UTC)

General feedback

 * This is a wonderful idea; relying on individual community bots run by one or two users for such a central task has always worried me. ClueBot NG does a good job, but it's certainly aged and there are surely new algorithms and ideas that can perform even better. ToBeFree (talk) 17:28, 1 June 2023 (UTC)
 * @ToBeFree Thanks for the positive comment! I agree about the reliance on individual technical volunteers to maintain important tools. In terms of ClueBot NG, there's a good chance that this tool actually wouldn't be better than ClueBot in terms of accuracy, given that ClueBot has been trained on English Wikipedia specifically and has been learning for many many years. It still might be an improvement in terms of features, so maybe there's room for both tools to run alongside each other, or perhaps we can build functionality that ClueBot could also utilise. We'll have to investigate further once we get into community testing of the model. Samwalton9 (WMF) (talk) 10:41, 2 June 2023 (UTC)

Incorporating edit histories
RoySmith suggested just above that the model should include checkuser data in its evaluation, which made me think that some other data external to the edit itself might be relevant. For example, if an account has 3 or 4 edits that were all identified as vandalism, and they make another edit, that is probably going to be vandalism too. Evaluating edit histories, especially checking for the percentage of edits identified as vandalism, might be something to consider for the model. I am assuming that there will be a user whitelist (like with ClueBot NG iirc) so that the model does not have to check the super extensive histories of long-term editors every time. Maybe ClueBot NG already incorporates this kind of data? I don't think so, though. Actualcpscm (talk) 17:33, 1 June 2023 (UTC)


 * @Actualcpscm This is a great point. A very simple implementation of this tool could simply be 'revert any edit above a threshold', but there's lots of contextual clues we can use to make better guesses about whether an edit should be reverted or not. The counter-example to the one you described would be something like 'an editor reverts their own edit'. By itself that might look like, for example, removal of content, but if it's their own edit we probably shouldn't be reverting that. If you have any other ideas like this I'd love to hear them, they're the kind of thing we might otherwise only start noticing once we look closer at individual reverts the tool proposes. Samwalton9 (WMF) (talk) 10:53, 2 June 2023 (UTC)
 * Hi again Samwalton9 (WMF). I'm pinging you because I assume you have a lot going on, but if you prefer that I don't, just let me know.
 * Outside checkuser data, edit histories, and administrative logs, there isn't much data on most editors (and that's a good thing). A closely related consideration would be the user logs, specifically block logs and filter logs. If an IP has been temp-blocked 3 times and tagged as belonging to a school district, a spike in suspicious activity will most likely not be made up of constructive editing. That seems like a reasonable factor to incorporate into the model.
 * If I had to come up with something more, the time of day in the editor's timezone might be relevant. Maybe vandalism is more prevalent in the evenings (think alcohol consumption) or at night? This hypothesis is a stretch, I think the only way to figure this out would be testing if allowing the model access to such data makes it more accurate. I don't know if there actuall is a ToD - vandalism correlation, I'm just hypothesizing.
 * I don't know exactly how you will create this tool. If you're planning on creating an artifical neural network (like CBNG uses right now), it might be worth trying some wacky stuff. From my very limited knowledge, it's not unusual to just throw all available data at an ANN and see how it behaves. That's what I was imagining above, too; the data we mentioned, like edit histories, is fed into the model and considered alongside the specific edit being evaluated. The decision then just depends on the score provided by the ANN.
 * However, if you're expecting these to be manually written checks, I would forget time of day and filter logs. For a human coder, that would be way too much effort for a minute accuracy boost, if anything.
 * On a related note: I'm not sure if CBNG does this outside of the editor whitelist, but Huggle records user scores. Spitballing again: edits that just barely pass the check cause the user to be assigned a suspicion score, which is incorporated into the model. For the first edit a user makes, this wouldn't make any difference (since their suspicion score would be 0), but if they make 5 edits that scrape by by the skin of their teeth, maybe there's something fishy going on ---> The higher a user's suspicion score, the lower the threshold of certainty needed to revert / warn (or, in the ANN, the higher the likelihood that this is in fact vandalism).
 * Also, if we're in an ideal world, the model could consider the topic of the article where the edit was made in relation to the edit's contents. If an edit to an article about racial slurs includes a racial slur, that's less indicative of vandalism than the same slur being added to a small BLP or science-related article. On the other hand, I'm sure that some article categories generally have a bigger problem with vandalism than others. More data for the ANN! Actualcpscm (talk) 11:48, 2 June 2023 (UTC)

Revert vs. prevent or throttle
You say: "Reverting edits is the more likely scenario - preventing an edit [may] impact edit save times."

I don't know what changes then: if you're not preventing bad edits in real time, those edits will (have to) be seen and checked by RC patrollers. The biggest burden is not one bad-faith user doing one bad edit, but one user doing a quick series of bad edits (example, avoided the too-many-consonants AF), on one or more pages. I had to ask some global patrollers to stop reverting edits while the vandal is on the wiki because that only triggers more vandalism (today's example, see indiv. page edit histories). Unless you can block or slow down vandals in those 20 minutes, you better leave them alone.

The worst vandalisms, which are in most cases by IP or very new editors (2-3 hours max), can be handled by abuse filters, either by disallowing edits or blocking users. Abuse filters are not perfect, smarter vandals learn how to evade them, but they work in real time. I assume Automoderator is trained only on edits that went past the existing filters, so it's also unlikely that it'll ever replace them.

How about you don't touch any users with clean history and 66+ edits, revert bad edits of registered users with not so many edits whenever there's time, but prevent or throttle (really) bad edits by IPs and very new registered users in real time? I think that'd be most helpful! p o nor    (talk) 21:29, 1 June 2023 (UTC)


 * @Ponor Thanks for sharing your thoughts! I agree that preventing edits may be a more ideal situation, the problem is that these checks (like AbuseFilter) would need to happen every time someone clicks 'Save' on an edit, and prevent anything else from happening for that user before all the checks have happened. Filters already take quite a lot of that time, and running the edit through a model is likely to take even more time, in addition to other technical issues to consider. There's some discussion on this at Community Wishlist Survey 2022/Admins and patrollers/Expose ORES scores in AbuseFilter and the associated Phabricator ticket. This is something I'm still looking into, however (T299436).
 * You raise a helpful point about vandals continually reverting, in a way that wouldn't make this tool helpful, and I'd like to think more about how we could avoid those issues.
 * I do agree that we'll want to build in features/configuration for avoiding users with, for example, more than a specific number of total edits. That said, the model is already primarily flagging edits by unregistered and brand new users more than experienced editors, so this may not be necessary.
 * Samwalton9 (WMF) (talk) 11:11, 2 June 2023 (UTC)
 * Thanks, @Sam. Is there any estimate of the time needed for a check? I'm thinking that these models are way more optimized than AbuseFilters, of which there are tens or hundreds running in sequence, indiscriminately. Automoderator could run on some 20% of all edits (~IP users edits rate from the main page here); you'll make those editors wait for a fraction of a second, but if it's helping experienced users moderate the content you'll save many seconds of their time. Your link cluebot FP tells me you should focus solely on most blatant vandalisms, users who make many consecutive unconstructive edits that cannot be caught by AF even in principle. Ever since we started more aggressive filtering I've been checking AF logs on a daily basis, and I don't think anyone else does. Add one more log (false positives + falsely claimed false positives), there will be ~no one to watch it: at some point you get so overwhelmed with stupid work that you start caring more about your time than the time of those bypassers. I know this sounds rough, but it's just the way it is. p o nor     (talk) 11:54, 2 June 2023 (UTC)
 * @Ponor I've asked another WMF team to look into how much time this would take in T299436, so hopefully we'll know before we make a final decision on this. I agree with the concern about very few people checking false positive logs - creating another new venue to review edits isn't ideal for the reasons you describe. We'll have to think about this in more detail. Samwalton9 (WMF) (talk) 12:13, 2 June 2023 (UTC)

Shutoff Switch
would there be an emergency shutoff switch if it malfunctioned? Blitzfan51 (talk) 20:52, 5 June 2023 (UTC)


 * @Blitzfan51 Yes, absolutely. It seems to me that one of the most basic features should be that users (probably administrators) can turn the tool on or off at any time. Samwalton9 (WMF) (talk) 11:43, 6 June 2023 (UTC)

Interconnectedness with other similar existing mechanisms
How will the tool interconnect/interact with other existing mechanism such as abuse filters, ORES and maybe even blocks or past edits waiting for review?

Are there any really future-future plans to create an easier review infrastructure with integrated AI?

I'm an admin from SqWiki and we've been overwhelmed with edit reviewing. Every edit by new users needs to be reviewed in SqWiki but we really lack active reviewers. There are currently edits waiting for more than 900 days for review. We've been using edit filters to weed out vandalism and that has slowed down the number increase a bit. We've also created user scripts that make it easier to revert changes and subsequently we make heavy use of ORES. I've asked around to revamp the page review extension but it has been orphaned for many years now. I've also asked around for anti-vandalism bots (EsWiki apparently also had one not mentioned in the original wish) but none has been able to help because of language problems. Because of all this I'm really excited about the automoderator project but I wonder how it will interact with the other features I mentioned above. I was also wondering if we could hope for an overall easier review infrastructure where the aforementioned features get merged in one more or less with the AI bot at its core. (Huggle, Twinkle, SWViwever are also part of the said infrastructure.)

As for the blocks/pas edits thing... Personally I'd hope that the tool was smart enough to include some micro-judgement rules like "increase percentage of this edit to be a vandalism if it is coming from a user that has been blocked in the last 3 months" or "lower the percentage if it is coming from a user whose edits have been accepted lately a lot". This already happens with the CX extension. Ideally for SqWiki it would also look at edits waiting for review and act on them, reverting what it suspects to be blatant vandalism and helping us in clearing that never-ending list but I don't know if this could be in the scope of the project or no. - Klein Muçi (talk) 12:57, 8 June 2023 (UTC)
 * @Klein Muçi Thanks for all the questions!
 * How will the tool interconnect/interact with other existing mechanism such as abuse filters, ORES and maybe even blocks or past edits waiting for review?
 * I have some ideas here but to be honest it's all speculative at this point. We could imagine this tool taking inputs from other sources, like abuse filters, to adjust how likely it is to revert an edit, but I think these kinds of integrations would be something for us to look at quite a ways into the future - there are a lot of more basic features we would need to build first. @Diego (WMF) does the model already consider whether the user has previous blocks on their account, when scoring an edit?
 * When you say "edits waiting for review" do you mean via Flagged Revisions? That's an integration I hadn't thought about yet, so we'll have to look into how that would work - I agree that ideally it should act on these edits too. Thanks for bringing this up.
 * Are there any really future-future plans to create an easier review infrastructure with integrated AI?
 * I think if we're going to have a false positive review interface it really needs to be as engaging and easy to use as possible, so that we reduce the number of reports going ignored. In terms of integrated AI, I'm not sure, but it's an interesting idea! Samwalton9 (WMF) (talk) 10:15, 9 June 2023 (UTC)
 * Samwalton9, thank you for the detailed answer!Yes, I suppose I mean exactly that extension. To be precise, I mean the pages that appear in Special:PendingChanges (for SqWiki) which I believe are coming from that extension. — Klein Muçi (talk) 10:58, 9 June 2023 (UTC)

Easy interface or multiple sets of configurations to choose from
I suspect the tool will have a lot of variables to configure so you can micro-manage it. This can make the interface overwhelming, especially in the beginning when you're basically testing out what practical effect would each change have on the community. In this scenario I hope that we either get a before-tested default configuration that we can choose from and be sure that it will work with at least 80% guaranteed efficiency or have some ready-made sets of configurations that serve for different purposes (maybe with the ability to add new ones). - Klein Muçi (talk) 13:03, 8 June 2023 (UTC)


 * @Klein Muçi I completely agree. When we were exploring this project one potential avenue we were thinking about was integration with AbuseFilter, but we know that many communities already find that tool to confusing to use. We'd like for this to have a simpler interface, with good default settings, so that communities don't need to spend a lot of effort learning how to configure the tool. Samwalton9 (WMF) (talk) 10:16, 9 June 2023 (UTC)

Protecting, deleting and blocking
A simple question related to the "unified review infrastructure" mentioned above:

Will the tool be able to change page protections' levels, delete new vandal-content pages or even block users? - Klein Muçi (talk) 13:16, 8 June 2023 (UTC)
 * @Klein Muçi So far we've just been thinking about reverting edits, but I know that anti-vandalism bots like PSS 9 also do things like protect pages and block users for short periods of time. I think that's something we'll need to evaluate as the project progresses. Is this a set of features that you would want? Samwalton9 (WMF) (talk) 10:29, 9 June 2023 (UTC)
 * Samwalton9, I wouldn't be against them and it would probably be a good thing to have as a last line of defense if "things get really crazy". But I'm an admin from a small wiki: We rarely get any "bot attacks" which would require bot-responses to fight back. This would be more helpful on big wikis I suppose but on the other hand big wikis usually don't enjoy delegating their autonomy to automatic tools so if the tool could do protection/blocking it would have to be highly configurable. As for page deletion... I don't know if others would agree easily (I would) but maybe it can flag such pages if deletion is deemed too extreme. — Klein Muçi (talk) 11:12, 9 June 2023 (UTC)

Responses and Thoughts
Training: The overview mentions "we train", I assume meaning the developers train. I'd like you to consider a mechanism where admins on the given wiki could train. Allow us to provide pages with a history that should be considered for training. The pages would either have edits that were reverted or it would have edits that were reverted and then restored. Either is appropriate for training. Maybe even a simple checkbox on the page history that could be checked for AI training and unchecked or checked differently to tell the tool that it screwed up on this one.

Threshold: I don't think I know how to respond until we can see sample results. I'm not sure of the range of possible confidence values, and the results are likely specific to each wiki and how the tool is trained. The obvious answer is that it needs to be something configurable. Particularly desirable would be a user interface where, as you drag the slider, you can see what edits would have been blocked / reverted and what edits would have been ignored.

Testing: Testing would be helpful, being able to immediately then apply tests toward training is more useful.

Configuration: Yes, being able to configure it to ignore user groups, or users with some configurable number of edits, or certain page prefixes (allowing for addressing both the main page and/or subpages on wikis that use subpages). Anonymous edits and users with very few edits or only very recent edits should generate a higher (or lower, depending on how this works) score. I'd also like to be able to configure the rollback message(s), and perhaps a user message that is added to the user's talk page telling them to not immediately revert but to contact someone for assistance, etc. We will want a way to add users to a group that would be bypassed, so consider adding a UserGroupRights group to address this.

False Positives: I'm not really concerned. As long as we can add appropriate messaging and tell users they need to take a (configurable) timeout before proceeding, I think this will be fine. Any legitimate user on Wikiversity is going to be willing to wait a period of time (somewhere between eight and 24 hours is my initial thought) to have their issue resolved.

We would absolutely use this. Wikiversity has a very high undo / rollback rate compared to legitimate edits, and my personal observation is that there will be a clear distinction on which edits are which, at least initially. You have to also recognize that training the AI will also eventually train the abusers, so this will be an escalating challenge as they try to figure out how to bypass the tool.

What you haven't mentioned in the overview, and I think only somewhat mentioned above, is that this needs to tie into some type of automated throttling and/or short-term blocking. I know you mentioned performance, but stopping the edits is the only thing that will make some abusers go away. If they are able to leave a reverted message, they've still won in their view. And, as you know, they link to that history and share it widely. We need to go further than just automated rollback to truly have an impact that protects the wiki and reduces administrative effort.

Thanks for your consideration.

Dave Braunschweig (talk) 02:50, 9 June 2023 (UTC)


 * @Dave Braunschweig Thank you for taking the time to provide all of this really valuable input!
 * Training: Absolutely! We think we can set up a system where administrators could flag false positives, which are then fed back into the model to re-train it. This will mean that the model can improve over time based on direct feedback.
 * Threshold: We're planning to share a tool which will allow editors to explore how the model works at various thresholds so that you can let us know whether it seems good enough!
 * Configuration: I agree with all of this. I'd like to know a little more about the idea of a user right though - can you elaborate in what situation you see a new user right exemption being useful?
 * I also just want to note that the models aren't currently trained on Wikiversity, but I think that's something we can do based on community interest! Samwalton9 (WMF) (talk) 10:33, 9 June 2023 (UTC)

Deferring changes
A maintained autonomous AI agent dealing with vandalism that doesn't need a sleep and doesn't go to job would be definitely useful and welcome by my community (mid-size to large wiki without automated anti-vandalism tools).

Regarding concerns of false positives, I feel this as an opportunity to bring back a few old ideas. Some background first:


 * German Wikipedia and some others use Flagged Revisions. If you don't have an account or have a fresh one, your edits need to be reviewed first.
 * English Wikipedia (and maybe some others) uses a softer version called pending changes. The community decides edits to which pages (by IPs and newbies) need to be reviewed first.
 * The former has obvious advantages and disadvantages. As long as most vandalism comes from IPs, none of them can harm the project's credibility. On the other hand, the backlog of changes for review can be very large, and I have heard of wikis where it took months or years to review some changes (and for that reason, the feature was disabled there). Also, it harms the reputation of the project as an open one.
 * Whereas English Wikipedia claims: As of July 2021, edits are rarely unreviewed for more than a day or two and the backlog is frequently empty. Still, only very small portion of pages (~3,700 out of 6,666,000+) is protected in such a way.

Now to the point: There is also an old proposal that changes which are identified as potentially problematic (e.g., by AbuseFilter) will be deferred and only published if approved. Unfortunately, although some engineering effort was put into it (T118696), the project has never been completed and deployed.

I always thought of that as a good compromise between obviously malicious edits going live immediately on one hand and a large backlog of changes waiting for review for months on the other hand. You can mitigate the former by an abuse filter set to disable or an AI bot, but we know the disadvantages.

So the new automoderator doesn't really have to be an active bot. It could just defer suspicious changes for review based on its learned model or abuse filters. (Or it may do both.) To my knowledge, this is how the "automod" feature is understood some of the online community websites.

--Matěj Suchánek (talk) 11:42, 9 June 2023 (UTC)