Extension:Check Spambots

Description
Spambot Search Tool is an automated script that may be configured to query the following databases:


 * 1) fSpamlist - fspamlist.com
 * 2) StopForumSpam - stopforumspam.com
 * 3) Sorbs - sorbs.net
 * 4) Spamhaus - spamhaus.org
 * 5) SpamCop - spamcop.net
 * 6) ProjectHoneyPot - projecthoneypot.org
 * 7) Bot Scout - botscout.com
 * 8) DroneBL - dronebl.org
 * 9) AHBL - ahbl.org
 * 10) Undisposable - undisposable.net
 * 11) Tor Project - torproject.org

Most of these databases list known spambot or open proxy IP addresses. A few also list e-mail or user names. Undisposable identifies bugmenot users and free e-mail addresses intended for one-time use, Tor lists proxies which are part of the TOR project. ProjectHoneyPot (if enabled) requires an API key from projecthoneypot.org and Bot Scout (if enabled) is limited to a small number of enquiries per day unless a key is obtained from botscout.com.

There are two versions of the original Check Spammers script. One is a standalone program, as deployed on http://temerc.com/Check_Spammers - the second, is check_spammers_plain.php. This is a simplified version of the script that can be used for forums, blogs, guestbooks or other web forms that allow users to comment/post. It returns true or false, based on whether or not the user is listed in the databases.

The CheckSpambots.php script listed below is a wrapper to allow Spambot Search Tool to be called as a MediaWiki extension. It is not part of the original package and is not currently distributed by nor supported by the original "check spammers" author.

Installation

 * 1) Download the source files (as listed below) to your "...extensions/CheckSpambots/" directory
 * 2) Edit config.php to indicate which servers you wish to check for spam blacklist information; add any API keys (if applicable)
 * 3) Edit your wiki's LocalSettings.php to add:
 * 4) Set $wgEnableSorbs = false; as it is no longer needed (versions 1.4.1 - 1.15.x). Not necessary in versions 1.16.x+ since this option was removed.

CheckSpambots.php
CheckSpambots.php is a wrapper function and is MediaWiki-specific:

Spambot Search Tool
The remainder of the code for this extension is available from the original author's site (description, download).

The files from the spambotsearchtool package which are required to deploy the script on MediaWiki are:
 * check_spammers_plain.php
 * config.php
 * en.php
 * functions.php

Most of the other files in the package are used to provide a stand-alone web-based user interface (such as this or this) and are not necessary for the deployment of the Spambot Search Tool as a wiki extension.

The check_spammers_plain.php script needs to be v0.39 (29/09/2009) or later. One minor patch was employed to disable any direct output to the screen unless a spambot is detected. The code segment to be disabled is:

These lines appear after all of the individual checks are complete, and were removed in this example because direct output to the browser will break the display formatting used by MediaWiki - a cosmetic issue.

Limitations
This extension does not provide a means internally for caching previous results (although, for DNS BL servers only, the local domain name server normally already does this). It does not provide a provision for using downloadable lists of known spammers and open proxies (such as those offered by stopforumspam); the use of an external lookup to obtain this same information works but may cause a slight delay to be added to the time to save a wiki edit.

There is currently no 'whitelist' capability (some time potentially could be saved by having the code *not* check edits by sysops or known, established users) and little or no provision to provide feedback back to the external blacklist maintainers as new spambots appear on-wiki. Whitelist could be trivially added; for instance (if you had an existing 'skipcaptcha' permission to control ConfirmEdit and wanted to re-use that same permission) add the following to the beginning of check_edit:

There is no check on the body of the text being edited; this extension determines whether the posting IP is a known bot, but does not check added external links to see if they contain known spam URLs. That latter task is done using extensions like SpamBlacklist (which can be extended to block links to blacklisted domains in article text).

There is also a risk of false positives, depending on the blacklist sources chosen. Many lists are intended primarily to target other forms of net.abuse (such as spam e-mail) and users on the same local net as a PC compromised by spammers may find themselves unexpectedly blocked from editing if these lists are used as a check on spambots.

This extension is not of use for dealing with any spambots not yet on the external blacklist databases. Title Blacklist, SpamRegex, Bad Behaviour, AbuseFilter, ConfirmEdit or ReCAPTCHA therefore remain necessary as a means of handling problem cases such as the supposed "new" anon-IP user who wishes only to create pages packed with external links to half of *.ru (or repeatedly break the Unicode on your existing pages if your audience is *.ru) or discuss «h3rb@l v1agra» ad infinitum under questionable, often-deleted page names such as ".../", ".../index.php", "Forum talk:..." and "Category talk:.../".