Manual:Edit throttling

This article proposes a security mechanism to deal with buggy or malicious bots by disallowing multiple edits from the same user or IP address within a certain time period.

Rationale
Wikis in general, including wikis running the MediaWiki engine, depend on human intervention to maintain editorial integrity. If one contributor, through malice or ignorance, makes unwanted edits, other contributors can simply revert or correct the edits.

This strategy, however, depends on the community being able to keep up with the pace of unwanted edits. A buggy or malicious bot can make edits much faster than a person can by hand, and may be able to outpace even a large community of editors trying to clean up its mistakes.

One tool in the MediaWiki software to deal with unruly bots is a user ban -- disallowing edits from the IP address the bot software is running from. Another possibility is edit throttling -- disallowing more than a certain number of edits from any one user or IP address during a particular amount of time.

Hostile bots could, of course, keep their edit rate below the throttling rate. However, this would be OK: at least the community can respond with other measures to handle the unwanted edits.

Design
The general idea is to restrict edits from any given user or IP address to X edits per every Y seconds. It's important for an individual MediaWiki installation to tune the variables X and Y to block hostile or buggy bots without blocking perfectly well-meaning human editors.

In addition, it's probably worth considering how big the wiki is, and how active the community is -- how much damage can be done if a hostile bot does X edits in Y seconds, before someone notices and uses another mechanism to stop the bot. Lastly, the suggested delay for friendly, well-behaved bots should be set to not exceed this rate (or vice versa).

Some example values of X and Y:


 * X = 1, Y = 10: no more than 1 edit in 10 seconds. Note that a human could conceivably do two edits in 10 seconds, and get a warning. Probably X = 1 is never going to be a good choice.
 * X = 20, Y = 600: this means 20 edits per every 10 minutes. Probably less likely that a human being could keep up this rapid pace of editing.
 * X = 100, Y = 3600: few human editors would do more than 100 edits per hour. However, for some sites, 100 edits could cause pretty impressive damage.
 * 100 edits by one user can be easily reverted through their user:contributions page.. it shouldnt be that bad...
 * Sure, it isn't any more difficuly to roleback but if your wiki only has 50 pages this is clearly too high a setting.

(More sophisticated restrictions could be implemented in the future -- for example, having different levels of X and Y for anonymous and logged-in users, since it's always a fun time to discriminate against anonymous users when we can. Another is varying X and Y by user to gradually "dampen" the rate of allowed edits if high edit rates are detected.)
 * i like that. its a good idea. make sure the levels are such that it never bothers users
 * How about something other than a simple X per Y model. The goal is to distinguish bots from humans in the fewest number of edits without false positives.  Humans might be able to get a high edit/second ratio for a short period but this will fall off quickly with time thus we want to flag more than 4 per 10 (since humans might do 4 edits in 10 seconds especially with browser tabs) and also 10 per 100 (as a human can't keep it up for this period of time).  In other words we need a finite number of this flags because no one value will work (we couldn't dispense with the 4 per 10 because then every bot would get to deface at least 9 pages before caught while if we dispense with the 10 per 100 a bot can indefinatly make 3 edits every 10 seconds).  The best solution of course would be to gather data on actual human edits/second.

In any event, assuming X and Y are set, the MediaWiki software would do an additional check at save time (in Article::updateArticle). If the $wgEditThrottling variable is true, the software would query the top X ("LIMIT X") records created by the current user from both the cur and old, sorted descendingly by date. (Yes, this means that there could be at most 2X records returned -- yet again another unavoidable cur vs. old pain in the ass). The returned records would have to be combined and sorted in memory. If the Xth record's date is less than Y seconds older than now, an error page is displayed, and the edit is not saved.
 * or how about a captcha is displayed and then the user may save it if they get it right. it slows the vandal down a bit, blocks robots, but still lets honest contributors do their work with only a minor annoyance in the big scheme of things  (honest contributors might open 100 tabs edit them and then save them all one after the other... ive done that when fixing lang links before, not 100 but maybe 30-40)

(Note that this could be extremely annoying for a human user who's made some important changes to a page. However, the software would have to be gravely misconfigured for the user to have had time to make significant changes and still get their edit throttled. One possibility is to present the user's wiki markup on the error page, so they can at least save it off somewhere. Probably this isn't necessary, though, since the whole goal is to never have humans see this error message in the first place.)
 * copy and paste.. you can make big changes real quick

Advantages

 * Limits damage from hostile or buggy bots
 * Relatively invisible to human editors

Disadvantages

 * Misconfigured installations could block perfectly good edits by humans
 * Many users sharing the same IP address could kick in throttling
 * High temptation for hard-security kooks who want to block edits by humans to misuse this feature