Manual:Edit throttling

This article proposes a security mechanism to deal with buggy or malicious bots by disallowing multiple edits from the same user or IP address within a certain time period.

Rationale
Wikis in general, including wikis running the MediaWiki engine, depend on human intervention to maintain editorial integrity. If one contributor, through malice or ignorance, makes unwanted edits, other contributors can simply revert or correct the edits.

This strategy, however, depends on the community being able to keep up with the pace of unwanted edits. A buggy or malicious bot can make edits much faster than a person can by hand, and may be able to outpace even a large community of editors trying to clean up its mistakes.

One tool in the MediaWiki software to deal with unruly bots is a user ban -- disallowing edits from the IP address the bot software is running from. Another possibility is edit throttling -- disallowing more than a certain number of edits from any one user or IP address during a particular amount of time.

Hostile bots could, of course, keep their edit rate below the throttling rate. However, this would be OK: at least the community can respond with other measures to handle the unwanted edits.

Design
The general idea is to restrict edits from any given user or IP address to X edits per every Y seconds. It's important for an individual MediaWiki installation to tune the variables X and Y to block hostile or buggy bots without blocking perfectly well-meaning human editors.

In addition, it's probably worth considering how big the wiki is, and how active the community is -- how much damage can be done if a hostile bot does X edits in Y seconds, before someone notices and uses another mechanism to stop the bot. Lastly, the suggested delay for friendly, well-behaved bots should be set to not exceed this rate (or vice versa).

Some example values of X and Y:


 * X = 1, Y = 10: no more than 1 edit in 10 seconds. Note that a human could conceivably do two edits in 10 seconds, and get a warning. Probably X = 1 is never going to be a good choice.
 * X = 20, Y = 600: this means 20 edits per every 10 minutes. Probably less likely that a human being could keep up this rapid pace of editing.
 * X = 100, Y = 3600: few human editors would do more than 100 edits per hour. However, for some sites, 100 edits could cause pretty impressive damage.
 * 100 edits by one user can be easily reverted through their user:contributions page.. it shouldnt be that bad...

(More sophisticated restrictions could be implemented in the future -- for example, having different levels of X and Y for anonymous and logged-in users, since it's always a fun time to discriminate against anonymous users when we can. Another is varying X and Y by user to gradually "dampen" the rate of allowed edits if high edit rates are detected.)
 * i like that. its a good idea. make sure the levels are such that it never bothers users

In any event, assuming X and Y are set, the MediaWiki software would do an additional check at save time (in Article::updateArticle). If the $wgEditThrottling variable is true, the software would query the top X ("LIMIT X") records created by the current user from both the cur and old, sorted descendingly by date. (Yes, this means that there could be at most 2X records returned -- yet again another unavoidable cur vs. old pain in the ass). The returned records would have to be combined and sorted in memory. If the Xth record's date is less than Y seconds older than now, an error page is displayed, and the edit is not saved.
 * or how about a captcha is displayed and then the user may save it if they get it right. it slows the vandal down a bit, blocks robots, but still lets honest contributors do their work with only a minor annoyance in the big scheme of things  (honest contributors might open 100 tabs edit them and then save them all one after the other... ive done that when fixing lang links before, not 100 but maybe 30-40)

(Note that this could be extremely annoying for a human user who's made some important changes to a page. However, the software would have to be gravely misconfigured for the user to have had time to make significant changes and still get their edit throttled. One possibility is to present the user's wiki markup on the error page, so they can at least save it off somewhere. Probably this isn't necessary, though, since the whole goal is to never have humans see this error message in the first place.)
 * copy and paste.. you can make big changes real quick

Advantages

 * Limits damage from hostile or buggy bots
 * Relatively invisible to human editors

Disadvantages

 * Misconfigured installations could block perfectly good edits by humans
 * Many users sharing the same IP address could kick in throttling
 * High temptation for hard-security kooks who want to block edits by humans to misuse this feature

Discussion
Moved from the article core by Evan

Rather than an explicit limit, why not give each user/IP a leaky bucket initialized to allow 10 edits when the first edit from that user happens - also, you could have several buckets, to limit edits per hour to 100, as well as limiting edits in a minute to e.g. 10. Pakaran 18:06, 6 Feb 2004 (UTC)

For those unfamiliar with the concept, the idea is that each edit requires a "token." Tokens not yet used are stored in a "bucket." in practice an integer variable, which "overflows" and wastes tokens when it reaches a specific value; tokens are added every N seconds. This is a common way to control excessive load in routers and such. We could have one bucket for short-term - for example a limit of 12 edits in 30 seconds, with one token added every 5 seconds and a limit of 6 in the bucket (someone check my math), and another bucket to have a long-term limit e.g. per hour. Pakaran 18:10, 6 Feb 2004 (UTC)

See SurgeProtector. Known technique.


 * Uh... that page seems to be about all kinds of protection against "surges", not particularly for edit floods, and different techniques for doing that. Ward's wiki seems to have some kind of edit flooding protection based on percentage of recent changes, which is pretty interesting. --Evan 03:40, 7 Feb 2004 (UTC)