Admin tools development/Phalanx

Thoughts and lessons learned from the Phalanx extension
Phalanx is an integrated anti-spam extension originally written for and by Wikia, but nowadays also used by ShoutWiki. It integrates a bunch of anti-spam extensions &mdash; BadWords, FilterWords, regexBlock, SpamBlacklist, spamRegex, TextRegex and TitleBlacklist &mdash; into one easy-to-use extension.

Pros:
 * easy to use
 * pretty effective
 * plenty of different filters
 * adding a new filter is relatively easy
 * ability to block something on a per-language basis (not sure how stable this is, we at ShoutWiki usually block everything for "all languages")

Cons:
 * unlike with AbuseFilter rules, not everyone can view Phalanx logs and rules (then again, what would be the point of an anti-spam extension if spammers could just easily see what's blocked?)
 * IP blocking interface is a bit unstable, thanks to the recent rewrite (in MediaWiki 1.18) of the Block class and the related interfaces
 * I hope that was just a silly typo in the code by me, and nothing more serious, but I guess only time will tell. --Jack Phoenix (Contact) 23:50, 9 August 2012 (UTC)
 * expiry time dropdown sucks and should be more flexible, like core's MediaWiki:Ipboptions
 * this shouldn't be very hard to fix, just c&p the code from core classes (SpecialBlock, Block, etc.) --Jack Phoenix (Contact) 23:50, 9 August 2012 (UTC)

If we were to use Phalanx on Wikimedia sites...:


 * it would need to be updated (the ShoutWiki fork is at r25850, while SVN HEAD on Wikia's SVN right now is r57491)
 * updated to r57731 on 9 August 2012 --Jack Phoenix (Contact) 23:50, 9 August 2012 (UTC)
 * the hacks specific to a certain wiki farm setup would need to be removed and replaced with something generic/more flexible
 * we'd need to set up a git repo for it and I would need to learn to use git ;)
 * first part is done; second not so much, but MWJames gave me some helpful tips and tricks --Jack Phoenix (Contact) 23:50, 9 August 2012 (UTC)
 * code committed to git on 20:31, 10 August 2012 (UTC). --Jack Phoenix (Contact) 20:31, 10 August 2012 (UTC)
 * shared DB stuff; Phalanx would need a new global or two (like how AbuseFilter or CentralAuth do things) to define the database where Phalanx DB tables will be stored
 * this is assuming that we use it as a global solution. While it definitely makes sense, there are some important questions, too:
 * who would be allowed to access Phalanx and change the rules? Stewards, probably, but I imagine that there'd be some complains about turning the stewards into decision-makers instead of neutral observers...
 * would this create complicated bureaucracy about the management? I.e. something spammed on enwiki is a legitimate phrase on plwiki and Polish editors are upset that legitimate edits are being blocked.
 * for now (current codebase as of 20:59, 10 August 2012 (UTC)), there is the option to block by language, but I'm not sure how effective it'd be. For cases like this, having an option to "block this [phrase/e-mail address/user(name)/etc.] for all languages except pl" would be useful (HT Isarra). --Jack Phoenix (Contact) 20:59, 10 August 2012 (UTC)
 * solution: new user group (like how we have global editinterface, rollback, sysop, etc. groups)
 * moving BadImageList and whatnot to Phalanx should be entirely possible; it's a different, more political, question if that's wanted (on the other hand, transparency is important, but in anti-spam/anti-vandalism work, transparency can easily be used against you...)
 * well, BadImageList itself is a bit of a special case because it allows whitelisting on the same page. We could, however, move the actual image blacklist into Phalanx and create a new whitelist MediaWiki: page... --Jack Phoenix (Contact) 23:50, 9 August 2012 (UTC)

Lessons learned:


 * Phalanx is handy!
 * it's easy to use and regexes are surprisingly easy to learn (the basic stuff anyway)
 * but like everything else, it's not perfect; some spambots will always slip through, so it can't totally eliminate the human factor in anti-spam work
 * the statistics interface (Special:PhalanxStats/Filter-id-goes-here) allows seeing who triggered the filter when (and where) and thus a human can make the decision to block a spambot account even before it has successfully submitted any spam to any wiki