Anti-spam features

From MediaWiki.org
(Redirected from Anti-spam Features)
Jump to: navigation, search
See also Manual:Combating spam and Manual:Combating vandalism, where many of the same issues are addressed.

MediaWiki (and its extensions) provide a number of features to reduce the problem of Wiki Spam.

Note that many of these features are not activated by default. If you are running a MediaWiki installation on your server/host, then you are the only one who can make the necessary configuration changes! By all means ask your users to help watch out for wiki spam (and do so yourself) but these days spam can easily overwhelm small wiki communities. It helps to raise the bar a little. You should also note however, that none of these solutions can be considered completely spam-proof. Always visit 'Recent changes' (Special:RecentChanges) periodically!

Common tools used to combat wiki spam typically fall into these categories:

  • Requiring log in and/or a CAPTCHA on certain operations, such as edits, adding external links, or new user creation
  • Blocking edits from known blacklisted IP addresses or IPs running open proxies
  • Blocking edits which add specific unwanted keywords or external links
  • Blocking specific username and page title patterns commonly used by spambots
  • Blocking edits by new or anonymous users to specific often-targeted pages
  • Whitelisting known-good editors (such as admins, regular contributors) while placing restrictions on new or anonymous users
  • Cleanup scripts or bulk deletion (Nuke) of existing posts from recently-banned spambots

Normally a combination of various methods will be used, in order to keep the number of spam edits to a minimum while limiting the disruption caused to legitimate users of the site.

Contents

[edit] Individual page protection

Frequently-spammed pages may be protected from editing by new and anonymous users by using semi-protection of individual pages. Often, the same page will be being hit repeatedly by spambots and, as most abusive edits on wikis which don't require registration to edit are from anonymous sources, blocking edits to these specific pages by anyone other than established users can prevent re-creation of deleted spamdump pages.

Spambots will frequently target poorly-monitored pages such as talk pages of categories and files. Any page which has been deleted multiple times is a good candidate for page protection.

[edit] Content-banning blacklists

[edit] $wgSpamRegex

The variable '$wgSpamRegex', when set in LocalSettings.php, prevents users from making edits including a given regular expression. Typically it's used to exclude URLs (or parts of URLS) which you do not want to allow users to link to. Users are presented with an explanatory message, indicating which part of their edit text is not allowed. Extension:SpamRegex allows editing of this variable on-wiki.

$wgSpamRegex = "/online-casino|buy-viagra|adipex|phentermine|adult-website\.com|display:none|overflow:\s*auto;\s*height:\s*[0-4]px;/i";

This prevents any mention of 'online-casino' or 'buy-viagra' or 'adipex' or 'phentermine'. The '/i' at the end makes the search case insensitive.

This is only a simple example. See $wgSpamRegex documentation for more detail.

[edit] Longer spam blacklists

The above approach will become too cumbersome if you attempt to block more than a handful of spammy URLs. A better approach is to have a long blacklist identifying many known spamming URLs. The SpamBlacklist extension allows such a list to be constructed on-wiki with the assistance of privileged users, and allows the use of lists retrieved from external sources (by default, it uses the extensive m:Spam blacklist).

[edit] CAPTCHAs

A CAPTCHA is a system that tries to distinguish humans from automated systems by asking the user to solve a task that is difficult for machines. The ConfirmEdit extension provides several different mechanisms for validation, and allows you to customize when it is presented.

The most robust CAPTCHAs available today are ReCaptcha (one of the options of ConfirmEdit) and the Asirra CAPTCHA, which asks the user to distinguish cats and dogs (currently supplied by Extension:Asirra). The QuestyCaptcha option of ConfirmEdit is effective today, but is a weak CAPTCHA that may not remain effective in the future.

CAPTCHAs have some disadvantages in terms of accessibility and inconvenience to your real human users; for this reason it is recommended not to use them on every edit, but only on account creation and anonymous edits that insert links (these are the default settings for ConfirmEdit, used by Wikimedia Foundation projects). Also it will not completely spam-proof your wiki; according to Wikipedia "Spammers pay about $0.80 to $1.20 for each 1,000 solved CAPTCHAs to companies employing human solvers in Bangladesh, China, India, and many other developing nations." For this reason it should be combined with other mechanisms.

Note that CAPTCHAs may block users who are blind or visually impaired (reCAPTCHA includes an audio CAPTCHA for such cases). Consider providing an alternative means for affected users to create accounts and contribute, which is a legal requirement in some jurisdictions.

[edit] Abuse filter

Extension:AbuseFilter allows privileged users to create rules to target the specific type of spam your wiki is receiving, and automatically prevent the action and/or block the user. It can examine many properties of the edit, such as the username, user's age, text added, links added, and so on. It is most effective in cases where you have one or more skilled administrators who are willing to assist in helping you fight spam. The abuse filter can be effective even against human-assisted spammers, but requires continual maintenance to respond to new types of attacks.

[edit] Blocking spammer IPs

[edit] DNSBL

You can set MediaWiki to check each editing IP address against one or more DNSBLs (DNS-based blacklists), which requires no maintenance but slightly increases edit latency. For example, you can add this line to your LocalSettings.php to block many open proxies and known forum spammers:

$wgEnableDnsBlacklist = true;
$wgDnsBlacklistUrls = array( 'zen.spamhaus.org', 'dnsbl.tornevall.org' );

For details of these DNSBLs, see Spamhaus: Zen and dnsbl.tornevall.org. For a list of DNSBLs, see Comparison of DNS blacklists. See also Manual:$wgEnableDnsBlacklist, Manual:$wgDnsBlacklistUrls.

[edit] Bad Behavior and Project HoneyPot

Bad Behavior is an anti-link spam tool that is available as a MediaWiki extension (see Bad Behavior on MediaWiki).

For maximum effectiveness, it should be combined with an http:BL API Key, which you can get by signing up for Project Honey Pot, a distributed spam tracking project. To join Project HoneyPot you will need to add a publicly accessible file to your webserver, then use the following extension code in your LocalSettings.php (or an included PHP file) to embed a link to it in every page:

$wgHooks['ParserAfterTidy'][] = 'fnInsertLinksToHoneyPot';
 
function fnInsertLinksToHoneyPot( &$parser, &$text ) {
    global $wgHoneyPotPath;
    $text .= "<a href=\"$wgHoneyPotPath\"><!-- hijacker --></a>";
    return true;
}

Set $wgHoneyPotPath to the path of the honeypot page in your LocalSettings.php (e.g. "/ciralix.php"). You may change the form of the link above to any of the alternatives suggested by Project HoneyPot.

Once you're signed up, choose Services→HTTP Blacklist to get an http:BL API Key, and put your key in Bad Behavior's settings.ini.

[edit] $wgProxyList

Warning: This particular technique will substantially increase page load time and server load if the IP list is large. Use with caution.

You can set the variable $wgProxyList to a list of IPs to ban. This can be populated periodically from an external source using a cron script such as the following:

#!/bin/bash
cd /your/web/root
wget http://www.stopforumspam.com/downloads/bannedips.zip
unzip -u bannedips.zip
echo "<?php" > bannedips.php
echo \$"wgProxyList = array(" >> bannedips.new.php; echo -n "'" >> bannedips.php
cat bannedips.csv | sed -e 's/,/\x27,\n\x27/g' >> bannedips.php
echo "');" >> bannedips.php
rm -f bannedips.csv bannedips.zip

You then set in your LocalSettings.php:

require_once("$IP/bannedips.php");

If you do this and you use APC for caching, you may need to increase apc.shm_size in your php.ini to accommodate such a large list.

[edit] $wgBlockOpenProxies

By setting $wgBlockOpenProxies to true in your LocalSettings.php, MediaWiki will automatically scan each editing IP for open HTTP proxies. Such scans may be interpreted as hostile by some system administrators, and so this measure is not recommended.

[edit] rel=nofollow link attribute

MediaWiki uses the rel=nofollow link attributes by default (it can be configured, see Manual:$wgNoFollowLinks for details). This tells search engines to not follow any external links added by users, thereby making spammy links much less valuable. Note that this does not prevent spam. Spammers generally don't notice the difference, and will abuse your wiki anyway, but it does mean that they benefit much less from it. On the other hand, rel=nofollow will mean that useful sites your users choose to link to will not receive the benefit of increased ranking in search results.

By default, it is put on all external links, plus log and history pages. See NoIndexHistory. Note that putting it on all external links is a rather heavy handed anti-spam tactic, which you may decide not to use (switch off the rel=nofollow option). See Nofollow for a debate about this. It's good to have this as the installation default though. It means lazy administrators who are not thinking about spam problems, will tend to have this option enabled.

[edit] Apache configuration changes

In addition to changing your MediaWiki configuration, if you are running MediaWiki on Apache, you can make changes to your Apache web server configuration to help stop spam. These settings are generally either placed in your virtual host configuration file, or in a file called .htaccess in the same location as LocalSettings.php (note that if you have a shared web host, they must enable AllowOverride to allow you to use an .htaccess file).

[edit] Filtering by user agent

When you block a spammer on your wiki, search your site's access log by IP to determine what user agent string that IP supplied. For example:

grep ^195.230.18.188 /var/log/apache2/access.log

The access log location for your virtual host is generally set using the CustomLog directive. Once you find the accesses, you'll see some lines like this:

195.230.18.188 - - [16/Apr/2012:16:50:44 +0000] "POST /index.php?title=FlemmingCoakley601&action=submit HTTP/1.1" 200 24093 "-" ""

The user agent is the last quoted string on the line, in this case an empty string. Some spammers will use user agent strings used by real browsers, while others will use malformed or blank user agent strings. If they are in the latter category, you can block them by adding this to your .htaccess file (adapted from this page):

SetEnvIf User-Agent ^regular expression matching user agent string goes here$ spammer=yes

Order allow,deny
allow from all           
deny from env=spammer

This will return a 403 Forbidden error to any IP connecting with a user agent matching the specified regular expression. Take care to escape all necessary regexp characters in the user agent string such as . ( ) - with backslashes (\). To match blank user agents, just use "^$".

Even if the spammer's user agent string is used by real browsers, if it is old or rarely encountered, you can use rewrite rules to redirect users to an error page, advising them to upgrade their browser:

RewriteCond %{HTTP_USER_AGENT} "Mozilla/5\.0 \(Windows; U; Windows NT 5\.1; en\-US; rv:1\.9\.0\.14\) Gecko/2009082707 Firefox/3\.0\.14 \(\.NET CLR 3\.5\.30729\)"
RewriteCond %{REQUEST_URI} !^/forbidden/pleaseupgrade.html
RewriteRule ^(.*)$ /forbidden/pleaseupgrade.html [L]

[edit] Preventing blocked spammers from consuming resources

A persistent spammer or one with a broken script may continue to try to spam your wiki after they have been blocked, needlessly consuming resources. By adding a deny from pragma such as the following to your .htaccess file, you can prevent them from loading pages at all, returning a 403 Forbidden error instead:

Order allow,deny
allow from all
deny from 195.230.18.188

[edit] Preventing adding links by untrusted users

Extension:NotEvil allows you to prevent adding links except by an on-wiki list of trusted users. This solution is particularly appropriate for wikis where external links are rarely added by legitimate users. Some users have found it useful to modify the regular expression '/http:\//' to a broader one capturing links with the protocol omitted.

Note that this extension is very basic - it is somewhat difficult to install and requires further development (e.g. it should be using user groups instead of an on-wiki user list).

[edit] Lock down (lazy solution)

You can disallow editing by anonymous users, forcing them to create an account with a username and sign in prior to editing. As a last resort, spam can be nearly eliminated by creating a "gated community" in which new users cannot create a new account and must request one from you.

People often naively suggest lock-down as best solution to wiki spam. It does reduce spam, but it is a poor solution and a Lazy Solution, because you are introducing something which massively inconveniences real users. Having to choose a username and password is a big turn off for many people. The wiki way is to be freely and openly editable. This "soft security" approach is one of the key strengths of the wiki concept. Are you going to let the spammers spoil that?

...if so, you can easily lock down your MediaWiki installation by adding the following to your LocalSettings.php:

#Force people to register before they are allowed to edit
$wgGroupPermissions['*']['edit'] = false; 
$wgShowIPinHeader = false;

Note that this only reduces spam. MediaWiki installations are routinely targeted by spam bots which perform automated registrations, and so this setting will result in a lot of bogus user accounts in the database, usually with names that follow some recognizable pattern. Spammers may create a large number of sleeper accounts, which are accounts that do nothing and then are used for spam at a later time. You should combine this with other measures such as CAPTCHAs (see above) on user registration and/or blocking spammer IPs.

Some spammers don't supply e-mail addresses, or supply invalid e-mail addresses. To deal with these, you can require e-mail validation before editing with Manual:$wgEmailConfirmToEdit:

$wgEmailConfirmToEdit = true;

As a last resort, spam can be almost entirely eliminated by creating a "gated community" where new users can't even register without asking you to set up an account for them. To do this, add the following to your LocalSettings.php:

#Disallow creating accounts
$wgGroupPermissions['*']['createaccount'] = false;

You can then visit Special:UserLogin while signed in to create new accounts. See Manual:User rights and Manual:Preventing access for more information.

[edit] Cleaning up spam

After dealing with spam, it's necessary to clean up existing spam. If you allow existing spam to remain, then antispam features may interfere with people attempting to make legitimate edits.

If the problem is limited to a few pages, it can be cleaned up by hand using normal administrative functions.

If the problem is limited to a small number of IPs or users, Extension:Nuke can systematically remove all their contributions.

If spam is widespread and performed by many users, you may find this spam cleanup extension useful. This script automatically goes back and removes matching spam on your wiki after you make an update to the spam blacklist. It does this by scanning the entire wiki, and where spam is found, it reverts to the latest spam-free revision.

Procedure:

  1. Copy cleanup.php to the extensions/SpamBlacklist folder
  2. Login using PUTTY.
  3. Navigate to the extensions/SpamBlacklist subdirectory
  4. type "dir" to confirm that cleanup.php is in the directory
  5. type "php cleanup.php" to run the script

[edit] Other ideas

This page lists features which are currently included, or available as patches, but on the discussion page you will find many other ideas for anti-spam features which could be added to MediaWiki, or which are under development.

There is now also 'Spam Filter' project, dedicated to the task of building more effective spam filtering for MediaWiki.

[edit] See also

Extensions
  • AbuseFilter — allows edit prevention and blocking based on a variety of criteria
  • AntiBot — a simple framework for spambot checks and trigger payloads.
  • Asirra — CAPTCHA based on distinguishing cats and dogs
  • Check Spambots — queries online databases and DNSRBLs to detect known spam vectors
  • CheckUser — allows, among other things, the checking of the underlying IP addresses of account spammers to block them. Allows mass-blocking of spammers from similar locations.
  • ConfirmEdit — adds various types of CAPTCHAs to your wiki
  • NotEvil — allows blocking adding of links except by trusted users
  • Nuke — removes all contributions by a user or IP
  • QuestyCaptcha — CAPTCHA based on answering questions
  • SimpleAntiSpam — adds an invisible input field into the edit view and checks if the box was filled; if it was, the extension disallows the edit. Won't affect human users in any way.
  • SpamBlacklist — prevents edits containing spam domains, list is editable on-wiki by privileged users
  • SpamRegex — allows basic blocking of edits containing spam domains with a single regex
  • Category:Spam management extensions — category exhaustively listing spam management extensions
Settings
Other helpful pages

[edit] External links

Personal tools
Namespaces

Variants
Actions
Navigation
Support
Download
Development
Communication
Print/export
Toolbox