Anti-spam features


 * See also Manual:Combating spam and Manual:Combating vandalism, where many of the same issues are addressed.

MediaWiki provides the following features to reduce the problem of Wiki Spam.

Note that many of these features are not activated by default. If you are running a MediaWiki installation on your server/host, then you are the only one who can make the necessary configuration changes! By all means ask your users to help watch out for wiki spam (and do so yourself) but these days spam can easily overwhelm small wiki communities. It helps to raise the bar a little. You should also note however, that none of these solutions can be considered completely spam-proof. Always revisit 'recent changes' (Special:Recentchanges) periodically!

$wgSpamRegex
To prevent a spammer from saving wiki edits with problematic content, use the variable '$wgSpamRegex' (in very old versions of MediaWiki, the setting was called '$wgSpamBlacklist'). Set the variable in LocalSettings.php (overriding the value appearing in DefaultSettings.php). Set it to a regular expression for matching on any URLs (or parts of URLS) which you do not want to allow users to link to. You can also match any other bad content which you wish to ban. Users are presented with an explanatory message, indicating which part of their edit text is not allowed.

Place a line like this somewhere in your LocalSettings.php file. This prevents any mention of 'online-casino' or 'buy-viagra' or 'adipex' or 'phentermine'. The '/i' at the end makes the search case insensitive.
 * Simple example $wgSpamRegex setting:

The example also prevents any reference to 'adult-website.com'. Clearly this kind of setting provides an easy way to get rid of a particular spammer if they keep coming back to your wiki.

Finally the example also blocks certain CSS style attributes which have recently been used to hide spam in many attacks. Unfortunately there are many workarounds this spammer can use, but for the time being this will get them off your back.

This is only a simple example. See $wgSpamRegex documentation for more detail.

Longer spam blacklists
The above approach will become too cumbersome if you attempt to block more than a handful of spammy URLs. A better approach is to have a long blacklist identifying many known spamming URLs, in a more readable format (not a single regular expression). To achieve this, you will need to use the SpamBlacklist extension. With this, you can allow some of your users to edit the blacklist on a wiki page, and you can fetch updates from external sources.

Spam cleanup script
Blacklisting spam words or spammer domain names prevents future spam, but doesn't get rid of existing spam. In fact if you allow existing spam to remain, then the blacklist may interfere with people attempting to make legitimate edits. It's important that you clean-up as well as adding to the blacklist. You can do this by hand, or if you have a widespread spam situation, you may find this spam cleanup extension useful. This script automatically goes back and removes matching spam on your wiki after you make an update to the spam blacklist. It does this by scanning the entire wiki, and where spam is found, it reverts to the latest spam-free revision.

Procedure:
 * 1) Copy cleanup.php to the extensions/SpamBlacklist folder
 * 2) Login using PUTTY.
 * 3) Navigate to the extensions/SpamBlacklist subdirectory
 * 4) type "dir" to confirm that cleanup.php is in the directory
 * 5) type "php cleanup.php" to run the script

CAPTCHAs
A CAPTCHA is a system that tries to distinguish humans from automated systems by asking the user to solve a task that is difficult for machines. The ConfirmEdit extension provides several different mechanisms for validation, and allows you to customize when it is presented.

The most robust CAPTCHAs available today are ReCaptcha (one of the options of ConfirmEdit) and the Asirra CAPTCHA, which asks the user to distinguish cats and dogs (currently supplied by Extension:Asirra).

Captchas have some disadvantages in terms of accessibility and inconvenience to your real human users; for this reason it is recommended not to use them on every edit, but only on account creation and anonymous edits that insert links (these are the default settings for ConfirmEdit). Also it will not completely spam-proof your wiki; according to Wikipedia "Spammers pay about $0.80 to $1.20 for each 1,000 solved CAPTCHAs to companies employing human solvers in Bangladesh, China, India, and many other developing nations." For this reason it should be combined with other mechanisms.

Abuse filter
Extension:AbuseFilter allows privileged users to create rules to target the specific type of spam your wiki is receiving, and automatically prevent the action and/or block the user. It can examine many properties of the edit, such as the username, user's age, text added, links added, and so on. It is most effective in cases where you have one or more skilled administrators who are willing to assist in helping you fight spam. The abuse filter can be effective even against human-assisted spammers, but requires continual maintenance to respond to new types of attacks.

DNSBL
You can set MediaWiki to check each editing IP address against one or more DNSBLs (DNS-based blacklists), which requires no maintenance but slightly increases edit latency. For example, you can add this line to your LocalSettings.php to block many open proxies and known forum spammers:

For details of these DNSBLs, see Spamhaus: Zen and dnsbl.tornevall.org. For a list of DNSBLs, see Comparison of DNS blacklists. See also Manual:$wgEnableDnsBlacklist, Manual:$wgDnsBlacklistUrls.

$wgProxyList
You can set the variable $wgProxyList to a list of IPs to ban. This can be populated periodically from an external source using a cron script such as the following:

You then set in your LocalSettings.php:

If you do this and you use APC for caching, you may need to increase apc.shm_size in your php.ini to accommodate such a large list.

$wgBlockOpenProxies
By setting $wgBlockOpenProxies to true in your LocalSettings.php, MediaWiki will automatically scan each editing IP for open HTTP proxies. Such scans may be interpreted as hostile by some system administrators, and so this measure is not recommended.

rel=nofollow link attribute
MediaWiki uses the rel=nofollow link attributes by default (it can be configured, see Manual:$wgNoFollowLinks for details). This tells search engines to not follow any external links added by users, thereby making spammy links much less valuable. Note that this does not prevent spam. Spammers generally don't notice the difference, and will abuse your wiki anyway, but it does mean that they benefit much less from it.

By default, it is put on all external links, plus log and history pages. See NoIndexHistory. Note that putting it on all external links is a rather heavy handed anti-spam tactic, which you may decide not to use (switch off the rel=nofollow option). See Nofollow for a debate about this. It's good to have this as the installation default though. It means lazy administrators who are not thinking about spam problems, will tend to have this option enabled.

Lock down (lazy solution)
You can disallow editing by anonymous users, forcing them to create an account with a username and sign in prior to editing. As a last resort, spam can be nearly eliminated by creating a "gated community" in which new users cannot create a new account and must request one from you.

People often naively suggest lock-down as best solution to wiki spam. It does reduce spam, but it is a poor solution and a Lazy Solution, because you are introducing something which massively inconveniences real users. Having to choose a username and password is a big turn off for many people. The wiki way is to be freely and openly editable. This "soft security" approach is one of the key strengths of the wiki concept. Are you going to let the spammers spoil that?

...if so, you can easily lock down your MediaWiki installation as follows:

Add the following to your LocalSettings.php

Note that this only reduces spam. MediaWiki installations are routinely targeted by spam bots which perform automated registrations, and so this setting will result in a lot of bogus user accounts in the database, usually with names that follow some recognizable pattern. Spammers may create a large number of sleeper accounts, which are accounts that do nothing and then are used for spam at a later time. You should combine this with other measures such as CAPTCHAs (see above) on user registration and/or blocking spammer IPs.

As a last resort, spam can be almost entirely eliminated by creating a "gated community" where new users can't even register without asking you to set up an account for them. To do this, add the following to your LocalSettings.php:

You can then visit Special:UserLogin while signed in to create new accounts. See Manual:User rights and Manual:Preventing access for more information.

Other ideas
This page lists features which are currently included, or available as patches, but on the discussion page you will find many other ideas for anti-spam features which could be added to MediaWiki, or which are under development.

There is now also 'Spam Filter' project, dedicated to the task of building more effective spam filtering for MediaWiki.