Admin tools development

Rationale
There are a number of powerful anti-disruption and protective tools that are provided for our communities on the Wikimedia clusters. These include local tools for each project's sysops or other higher-graded users, and cross-wiki tools co-ordinated on metawiki that are either for all metawiki sysops, or for stewards.

These tools are crucial for the community to fight against spam, disruptive behaviour and large-scale vandalism, and dealing with private information or other, less-common situations. It is important that the Wikimedia Foundation works on these tools so that the communites we support are not overloaded with work and that they can maintain the quality of the projects.

Sadly, the resources of the Wikimedia Foundation are limited, and we do not wish for this to interfere with our timely delivery of appropriate tools. Thus, we hope that a member of the community to work as the volunteer Product Manager for these admin tools, liaising between the different users of these tools and the developers, shaping what the products should look like and prioritising areas of particular concern.

Roadmap
Still a little hazy. Our top priorities for now (in descending order) are:


 * 1) Global AbuseFilter rules
 * 2) Global AbuseFilter variables/throttling
 * 3) CentralAuth mass account locking

See below for more details on what these items mean.

List of in-scope extensions and corefeatures

 * Local project-specific tools
 * AbuseFilter
 * Revision supression and Oversight
 * ConfirmEdit
 * CheckUser
 * BadImageList
 * MassDelete
 * File upload cleanness checking (mostly relevant to Commons).


 * Note: local blocking tools are, at least for now, out of scope.


 * Metawiki-based tools
 * GlobalAbuseFilter (not currently extant - see below)
 * GlobalBlocking
 * CentralAuth account locking, renaming, merging
 * TorBlock
 * AntiBot (currently broken?)
 * SpamBlacklist
 * TitleBlacklist

Existing major problems, and possible solutions

 * The number one cross-wiki issue is dealing with spammers and other bot-driven disruptive editing. This is especially a problem on our smaller wikis where the spam can easily swamp the local community. Beyond the original problem, noticing the problem once it's occurred can take some time during which our readers are poorly served, and cleaning up the spam can be arduous.
 * To try to prevent the spam from occurring, we are looking to provide a version of the AbuseFilter extension, based on metawiki, that would apply its event rules to act globally on all Wikimedia wikis at once, as directed by Stewards. This functionality is currently implemented as an experiment in the existing codebase, but it has never been used, is untested, and likely to be rusty).
 * To deal with multiple accounts being created - or multiple edits being made - very rapidly across a number of wikis, we would like to extend AbuseFilter to allow per-IP throttles on events ('only 5 edits a minute' and similar) to apply across all of our wikis globally, rather than be implemented on a per-wiki basis.
 * One recurring Wikimedia Foundation issue is the performance of the systems that we provide, and in particular the scalability of existing systems to work as the load grows.
 * In general, the Foundation would like to re-engineer the ad hoc collection of tools into a smaller number of more powerful tools (ideally one), based on the existing AbuseFilter extension. This would take some time and is not the top priority, but is our overall intended direction.
 * A particular piece of work in simplifying and speeding up our systems would be to move some of the lists of banned article titles, account names, used images and external links from blocks of wikitext that need to be interpretted by their own extensions into the database.

Other problems or possible pieces of work

 * There is an opportunity (via AbuseFilter) to use detection of particular UserAgent strings used by spammers to prevent more effectively.
 * If I've understood correctly, the Bad Behavior extension does exactly this; it blocks access to the wiki completely if the User Agent string matches a known spambot one. I have no idea how effective that is, though, since User Agents aren't that difficult to alter/modify/forge... --Jack Phoenix (Contact) 21:59, 31 July 2012 (UTC)


 * Allow supression using the RevisionDeletion tool of multiple revisions of deleted page (existing functionality in action=history which is needed in Special:Undelete).


 * The current CAPTCHA system is awkward for a number of users (there are internationalisation concerns - see 5309), and isn't very effective in fighting against spammers given the state of the art in image character recognition and the wide number of mechnical turk-like systems. We should analyse our logs and consider either somehow replacing it with something that would work, or just disabling it entirely and relying on our other tools that actually work.
 * As much as I dislike admitting it, this is indeed true. Blurry word CAPTCHAs appear to be relatively easy for spambots to crack and Asirra, while cute, is apparently difficult for humans to understand (and as Chris mentions, there are the privacy concerns and it'd be an ugly third-party dependency, which we do not want). --Jack Phoenix (Contact) 21:59, 31 July 2012 (UTC)
 * It seems like there is consensus that captchas are not as effective as people would like (both annoying editors, and not preventing spam). However, there does not seem to be consensus on the best solution to fix this. Some ideas and insights in this email thread: Captcha for non-English speakers II. CSteipp (talk) 21:18, 2 August 2012 (UTC)


 * Allow local sysops to temporary variation of the IP throttles for particular IPs and IP ranges (e.g. so that an event with a lot of expected newbies would not hit the "too many account creations from your IP" issue), which currently can only be solved by a user with shell access, or individually-creating accounts by local sysops. The code hooks for this functionality already exist, but no extension has yet been written.


 * The e-mail black list (which prevents account registration or sending e-mail using matching addresses) is currently split into a file that only shells can access, or an entirely-public list on metawiki. A middle-ground tool which could only be accessed by a small number of users would be a significant improvement for privacy purposes.


 * Better filtering on potential exploits in file uploads


 * Selective blocking so a user cannot edit particular pages under a LocalBlock, pages in a Category (feels like local poltical decision - not for this?)
 * It is a political thing, but also it sounds very useful and in line with the mission of WMF; we want (almost) everyone to be able to edit, and even if they have problems with certain types of articles, it shouldn't mean that we need to completely block them. Blocking them from "problematic" pages would still allow them to find something else equally interesting via Special:Random or whatnot and still contribute constructively to the encyclopedia. --Jack Phoenix (Contact) 21:59, 31 July 2012 (UTC)


 * Auto-warn user that they've reverted 3 times


 * Global AbuseFilter separated resources


 * CentralAuth mass account blocking - "search with check boxes" (from Special:Log/newaccounts or whatever?). Global view into account creation(?)

Thoughts and lessons learned from the Phalanx extension
Phalanx is an integrated anti-spam extension originally written for and by Wikia, but nowadays also used by ShoutWiki. It integrates a bunch of anti-spam extensions &mdash; BadWords, FilterWords, regexBlock, SpamBlacklist, spamRegex, TextRegex and TitleBlacklist &mdash; into one easy-to-use extension.

Pros:
 * easy to use
 * pretty effective
 * plenty of different filters
 * adding a new filter is relatively easy
 * ability to block something on a per-language basis (not sure how stable this is, we at ShoutWiki usually block everything for "all languages")

Cons:
 * unlike with AbuseFilter rules, not everyone can view Phalanx logs and rules (then again, what would be the point of an anti-spam extension if spammers could just easily see what's blocked?)
 * IP blocking interface is a bit unstable, thanks to the recent rewrite (in MediaWiki 1.18) of the Block class and the related interfaces
 * expiry time dropdown sucks and should be more flexible, like core's MediaWiki:Ipboptions

If we were to use Phalanx on Wikimedia sites...:


 * it would need to be updated (the ShoutWiki fork is at r25850, while SVN HEAD on Wikia's SVN right now is r57491)
 * the hacks specific to a certain wiki farm setup would need to be removed and replaced with something generic/more flexible
 * we'd need to set up a git repo for it and I would need to learn to use git ;)
 * shared DB stuff; Phalanx would need a new global or two (like how AbuseFilter or CentralAuth do things) to define the database where Phalanx DB tables will be stored
 * this is assuming that we use it as a global solution. While it definitely makes sense, there are some important questions, too:
 * who would be allowed to access Phalanx and change the rules? Stewards, probably, but I imagine that there'd be some complains about turning the stewards into decision-makers instead of neutral observers...
 * would this create complicated bureaucracy about the management? I.e. something spammed on enwiki is a legitimate phrase on plwiki and Polish editors are upset that legitimate edits are being blocked.
 * solution: new user group (like how we have global editinterface, rollback, sysop, etc. groups)
 * moving BadImageList and whatnot to Phalanx should be entirely possible; it's a different, more political, question if that's wanted (on the other hand, transparency is important, but in anti-spam/anti-vandalism work, transparency can easily be used against you...)

Lessons learned:


 * Phalanx is handy!
 * it's easy to use and regexes are surprisingly easy to learn (the basic stuff anyway)
 * but like everything else, it's not perfect; some spambots will always slip through, so it can't totally eliminate the human factor in anti-spam work
 * the statistics interface (Special:PhalanxStats/Filter-id-goes-here) allows seeing who triggered the filter when (and where) and thus a human can make the decision to block a spambot account even before it has submitted any spam

Documents

 * User requirements:
 * Specifications:
 * Software design document:
 * AbuseFilter Priorities / Design
 * Test plan:
 * Documentation plan:
 * User interface design docs:
 * Schedule:
 * Task management:
 * Release management plan:
 * Communications plan: