Talk:Anti-spam features

Inappropriate useage
Graffiti networks (site links here) uses open wikis to store blocks of content as part of a peer to peer network. This is persisting data over a long duration using orphaned pages so the chances of you finding them are small. The content is base64 encoded blobs under a normal looking subject.

Lockdown
It would be nice if the paragraph on lockdown would mention that you need to add the configuration settings AFTER the line:

require_once( "includes/DefaultSettings.php" );

I would fix it myself, but the page is protected. Not very wikilike. Especially after reading the rant against closing the wiki in the lockdown paragraph. Some people do have a perfectly good reason to implement a closed wiki ...

Sherman.boyd 23:35, 4 December 2006 (UTC)


 * Hi Sherman.
 * My feeling on that last point is, yes there are reasons you might want to implement a closed wiki, but the reason should not be wiki spam. It's lazy to lockdown a wiki as a response to spam attacks, before you've tried some of the better solutions listed above, and I think it's important to stress that.
 * Regarding the point about adding the configuration settings after the line. You'll find many references to setting $wg global variables throughout the various manual pages. If we gave details of exactly how to modify settings in all of these places, the instructions would all be cluttered. We should however link to LocalSettings.php page where all such details are given.
 * Regarding the fact that this page is currently 'protected'... I wrote a lot of this page. As such I'm generally happy with it in its current state, and I'm secretly quite happy that it's protected :-) ...but I suppose I should stand by my principles and agree with you. The page shouldn't really be protected. It is unwikilike, and in contradiction with the 'lockdown' information as you say. Also I'm hoping some developers will come along and link to some code for more powerful anti-spam features. Not sure why the page was protected really. Wasn't me. I dont have the power to protect or unprotect a page myself. -- Halz 13:57, 24 January 2007 (UTC)

blacklist running here?
I improved the example regexp to have terms which are less likely to lead to false positives e.g. 'buy-viagra' instead of 'viagra'. The reason I didn't write it like that in the first place was because it was actually matching the blacklist running here, when I made the edit. Not sure why I am allowed to write this now. Is the blacklist switched off? -- Halz 00:18, 6 June 2006 (UTC)

Regex Case-insensitive
'''Q. How does one make the $wgSpamBlacklist regex entirely case-insensitive? I don't want to have to do 'viagra|Viagra' and all the other possibilities! Is there a list of regexs that work in this variable/mediawiki/php somewhere? --anon'''
 * Aha! Found the answer myself adding '/i' at the end makes it case insensitive. There should still be a list of php regexes linked to at the point of the example.

CSS Hidden Spam
Could the SpamBlacklist be extended to allow blocking of any kind of string, not limited to being part of an URL? (example: a popular spam bot always starts its content with <div id=wiki) -- anon


 * See CSSHiddenSpam. This has now become a very widespread problem across small MediaWiki installations arround the web, although it's probably only one or two spammers employing this technique. On Peter Forret's blog he's suggested $wgSpamRegex="/<div/";  # ban DIV tags


 * This is not ideal since it will prevent legitimate use of &lt;DIV&gt; tags.


 * I have now tried this out on a new mediawiki installation, and it works! But I think in older MediaWiki versions this variable was called $wgSpamBlacklist, and maybe was only tested against external links.
 * I had some legitimate use of div tags with style attributes, on my wiki, so I used this one instead: $wgSpamRegex = "/overflow:\s*auto;\s*height:\s*[0-4]px;/i";  # ban certain hiding CSS attributes Less powerful (the spammer can easily come up with an alternative workaround). I suppose another alternative (halfway) would be to ban the STYLE attribute. -- Halz 16:52, 15 December 2005 (UTC)


 * I finally got a MediaWiki setup to tryout so I have been looking at the antispam features. I have not tested this regex out well, but it seems better than the ones above since it covers most ways spam can be hidden with CSS.  Blocking divs only may work for now, but spammers could start using plenty of other tags (b, p, span, etc). $wgSpamRegex = "/\<.*style.*?(display|position|overflow|visibility|height):.*?>/i"; This checks for style and only blocks those rules that would be good for hiding spam.  Color could also be added, but can be useful and is not as good at hiding text. -- JoeChongq  25 December 2005

Spammers registering
I'm having a small problem. So far nothing major's happened, but I don't like it.

I am requiring that only logged-in users can edit. I am now noticing that there is some sort of script that is automatically generating random usernames. It was 1500 when I first found out. I went into the database and deleted them all. Now, (3 hours later) another 200 have been created. Is there any way to prevent this? Is it possible to determine the IP address of the script that's doing this and block it?

So, far there's been no spam, but I'd really like to keep it that way. Thanks for any ideas! --Werdna 21:15, 26 May 2005 (UTC)


 * So - do I understand this right - someone is creating random usernames with a script? Like maybe like they're testing a malicious bot they're developing?  Hmm... I don't know - have you try asking on the mediawiki IRC channel.--Aerik 22:48, 27 May 2005 (UTC)

I would like to confirm What Werdna said. I am also getting random alpha numeric subscribers no where as many though, probably 3 or 4 per day. I am also deleting them from the database. I have not yet figured out how to reset the number of actual members in the database. Silkrooster 05:22, 11 January 2006 (UTC)

There is a free [Anti-Spam patch] for MediaWiki 1.5.7 which fixes this problem. It controls users' registration and records IP numbers for later banning of offending users/domains. The zipfile includes instructions and extended info on what the patch exactly does.

I get the same problem and it makes maintaining a publicly visible wiki a very time consuming task. Spammers register on my wiki and mess up my pages. The way I deal with them is to block them and revert the pages theey edited (this takes some time). I use my wiki as a convenient publishing platform for myself and a few other people. I already locked down editing to registered users but that is not helping. I'm willing to try above mentioned Anti-Spam patch but I do not see how that would really fix my problem. Spammers could still register and spam, and I still have to revert the pages and block the spammers after the fact. What would help me a lot is a further lockdwon of the registration/edit proces to have administrators approve new user accounts before they are allowed to edit. For small scale wikis such as mine, this would probably be the best possible solution. Illustir 13:27, 25 April 2006 (UTC)


 * ConfirmEdit extension. --brion 02:42, 26 April 2006 (UTC)

Dimensions of protection
Here's some ideas from a discusison with Brion Vibber. A particular focus for this is the application of very adjustable and scalable controls for lesser used wikis (where a lot of damage could be done before someone notices). Here's a summary from Brion:--Aerik 22:58, 27 May 2005 (UTC)


 * There are several dimensions of protection to engage in:


 * Making it easy to undo attacks when they are discovered: Generally the wiki system works reasonably well for this. Revert tools and such help a little extra.


 * Making it easy to discover an attack in progress: Recent Changes and watchlists are part of this; we also have for instance IRC channels that carry change notifications for many of our wikis to make it easy to keep an eye on things. "Recent Changes Patrol" was intended to help with this but seems unwieldy.


 * I think what could really help here is automated flagging of 'suspicious' actions which can produce a real-time notification: for instance an alert sent to a central IRC channel.


 * Prevention of really bad stuff. The wiki allows only limited control; for instance no matter how hard you try you can't add an ActiveX control to the main page which formats people's hard drives. (Well, unless we have a bug. :)


 * We can extend this in limited ways to eg the URL blacklist for spam blocking, but outright blocking of edits is hard to do without false positives; even the simple URL blacklist is problematic at times. Generally we leave 'value judgements' to human-based soft security, as part of the wiki philosphy.


 * Flagging things for human review, and automatically slowing down rather than stopping are much friendlier in the face of false positives.


 * Captchas are probably useful as a speed bump, but are potentially problematic as a default because they're a) annoying and b) hard to make accessible for people with limited software/hardware or physical abilities.


 * Simply adding a delay before accepting edits to enforce an edits-per-minute limit has been suggested as a speed brake for spambots and vandalbots: legitimate users may not even notice anything different ;) while a problem vandal will be slowed to reasonable levels while humans organize a response.


 * (Consider also multiple-IP attacks such as a vandalbot going through multiple proxies with a different identity on each. This is not a hypothetical, but something that does happen from time to time and should be handled if possible.)


 * We want as much as possible to stay out of the way of legitimate human interaction. What is often useful is to apply the soft rules a little differently against established editors than against new accounts (say, Spambot78434). You don't want to make a hard cutoff as this can cause problems, but it's probably ok to be less strict about edits-per-minute on a longtime established editor than on a newly registered possible spambot.


 * A dynamic filter combined with a blacklist and local auto-banning might work. A one-click button for reverting all bot edits, deletion of all newly pages created by bots and undo to last previous version of edit marked as legitimate or (not in the banned list). The bottom doesn't do the deleting part though.

Protecting Small Wikis
Even though I help run it, at first I was not for having a blacklist at chongqed.org. I have never liked blacklists because of the problems they can cause and the pain it often is to get off if listed by mistake. But we had the data and our more passive attempts were not doing enough to slow down spam. The spam problem is so bad that anything less than a blacklist or locking down the wiki isn't going to help much. Many (or most) don't care that WikiMedia has nofollow tags on links. Few take the time to learn about the sites or wiki engine they are spamming.

On a large wiki like Wikipedia you have several 1000s of people potentially cleaning spam. But you also have to think about the small wikis (which its obvious you are). Flagging an edit for human review is good for large wikis, but for a small one its only slightly better than what you have now. Many smaller wikis are so low traffic that its a good bet that any post not by the Admin and a small group of friends is a spammer. I am not saying that MediaWiki should use our blacklist or not, but it would be nice to give people the option of using our list or one of the other distributed lists maintained on some of the larger wikis using the same format.

-- Joe 10:33, 9 Jun 2005 (UTC)


 * Yeah there's quite a few small wikis out there which are harbouring wiki spam. Another interesting thing for MediaWiki developers to take note of. MediaWiki Default Pages Spam is a widespread problem. Basically people will install mediawiki, and use it to collaborate on just one or two pages, with hardly any users. Along come the spammers and create the initial version of all these default pages, which the users then don't notice (because they are only really looking at the main page). Of course to some extent we dont need to care about this any more, because these same lazy administrators will not switch off 'nofollow' in their configurations, and so the spammers will not benefit.


 * That's really the main thing, that the spammers dont benefit. If the spammers gain no google rank, then it's only that one wiki which is being damaged, but if the spammers gain google rank whenever they swing by these fresh mediawiki installations, then this will encourage them, and then we are all losing out. -- Halz - 30th Aug 2005

Ideas under development
--Aerik 16:49, 28 May 2005 (UTC)
 * Brion is working on an edit limiter (prevents too many edits being made too quickly by one user)
 * I'm working on a edit checker with finely tuneable settings to prevent or flag edits having certain charateristics (such as more than x additional external urls, for example).

Actual features only please
This stuff is moved off the main Anti-spam Features page to here, because we need that page to be focussed on what features the software actually has (or patches which are publicly available).

People tend to chip in with loads of other spam prevention ideas which aren't actually available. Such discussions can be seen all over the web. Different wiki development commmunities re-hash the same pros/cons discussions. I'm not saying they're a complete waste of time, just that we should keep them off the Anti-spam Features page, and for more general/conceptual ideas, preferably discuss them in a more wiki centralised location, on the wiki dedicated to the topic of wiki spam: http://wiki.chongqed.org  -- Halz 15:18, 29 November 2005 (UTC)

'Bad behaviour' extension
This extension (or hack) does a variety of checks comparing http requests to profiles of known spambots: http://www.ioerror.us/software/bad-behavior/

Whitelist & rel=nofollow
One alternative to always rel=nofollow is to approvelinks and if a link is approved then it will not have a nofollow tag. So if someone edits a page and inserts external links then you would have to approve these links before they could show up without nofollow tag. Pmwiki uses URL Approvals to do this.

Using Karma to prevent spam
If the wiki demand login to edit, then you can use Karma to prevent spam. Karma can be had by; confirming your email address, getting edits approved/patrolled by a user with better karma than yourself. This would also mean that people without any karma would have to have their edits controlled before they can do anymore stuff, so they get one or two free edits and then they need to be approved.

registration-only editing
In a not-so-open wiki's where it doesn't matter to require users to be logged in for editing, it's a bit harder, but still easy to edit content. If it's even too easy for spambots, there are some ways to make it harder


 * use non-computer-readable graphics to prevent registration of spam bots
 * in a wiki with registration-only editing, a user could be requested to read some characters from a non-computer-readable auto-generated image file containing random text.
 * This seems to be called captcha: Captcha


 * registration mail confirmation
 * in a wiki with registration-only editing, a user could be required to confirm registration on a website with some random number sent to him by email. makes it hard for bots to get through.


 * administrator registration check
 * If you like a wiki that's real hard to get into, you'll only allow an administrator to create new accounts.
 * You could register only people you know in person, or who confirm with a phone call, that they are somehow living persons.


 * gpg-key registration check
 * Let a user sign some random text with his gpg-key, and only let him register when he's in your web of trust. Could even be automated.
 * The same with CaCert or other cryptographic signature with similar features.


 * If we could provide the necessary steps to acheive any of these 'lock-down' states (e.g. is it a simple config option in MediaWiki?) then this would good information to have on the the Anti-spam Features page. -- Halz 15:18, 29 November 2005 (UTC)

Limiting External Links
For small wikis without loads of users maintaining spam blacklists, it makes sense to simply disallow external links for all users except the ones with sysop privieleges. There's a little MediaWiki extension that does just that here:

undo changes of specific users
To ease the cleanup process, it would be cool to have a script handy, that helps you undo all changes from a specific user, when you find him spamming. By the way - this is not the same as the possibility to revert single changes on single pages.

changelog emails
in addition to rss feeds, an email could be sent to people willing to watch specific pages, or the whole wiki, if changes get applied to them.

Hidden form fields
Add one or two hidden fields to your forms like this:

&lt;div style="display:hidden"&gt;&lt;input type="text" name="_url" /&gt;&lt;/div&gt;

Or:

&lt;div style="height:0px"&gt;&lt;input type="text" name="_url" /&gt;&lt;/div&gt;

Or:

&lt;noscript&gt;&lt;input type="text" name="_url" /&gt;&lt;/noscript&gt;

The spambot may think that he has to fill out the input field and bang, he got locked down. You surely need be a little more fancy than these easy to beat examples. :) A simple regex will beat this protection... Roland

Matching entire url vs just domain
Today I found out that ipowerweb.com was compromised and each of their domains had a redirector script in place. What is needed is url pattern matching so that you can block based on the entire url rather than just the domain portion. This is a arms race. Hacker compromises server, installs redirect script on every virtual domain. Spammer posts links to the websites in questions on wiki's.

Example urls all had the following characteristic

/html/?drug name or slogan /Img/?drug name or slogan /Images/?drug name or slogan /aboutus/?drug name or slogan

So a quick interm fix could be to block the addition of urls containing /? or match by the pattern /html/? by non-sysops.

If viagra is anywhere in the url then block and so on....

Silent Blocking
Bob: Suggestion: I think a really important feature of spam blocking is silence. When a spammer's edit is rejected, the spammer should not see a warning page informing them that they had been blocked, instead they should see a page that deceives them into thinking their edit succeeded. Naturally this cannot safely be applied to all forms of spam blocking, since sometimes false positives are a problem, but for any filter with a very low FP, a deceptive success page can really improve the effectiveness of the filter. The spammer is unlikely to realize their spam has failed, and therefore they do not escalate to the next level of spammyness (proxys, text obfuscation, throwaway accounts, etc...) I have been using this method (implemented in a $wgFilterCallback) on a small wiki, and I am really pleased with it.
 * But then some spambots may "think" that they have found a page where attack has success ... and telling this to other robots. So the amount of spam might increase...
 * As well, that'd be a bad idea for humans. &mdash; Mike.lifeguard &#124; @meta 19:05, 25 January 2009 (UTC)

Patch to control users' registration?
User:Cleoni wrote: ''There is an [Anti-Spam patch] for MediaWiki 1.5.7 freely available for download. It controls users' registration and records IP numbers for later banning of offending users/domains. Includes instructions and extended info.''

Sounds good, but that link 'http://www.elearninglab.it/?page_id=25' doesn't work (now takes us to something irrelevant) :-( -- Halz 11:08, 24 April 2006 (UTC)

disabling external links completely?
I had this option enabled at some point, but it seems to have been lost during one of the updates. At some point, there was a single line of code I had in my LocalSettings.php that would NOT allow any post to go through if it had even ONE external link. My wiki does not require the use of external links, so a pretty bulletproof way to prevent spam, in my case, is to simply eliminate the possibility of posting anything that contains a URL.

Anyone know how to do this?

Blocking MediaWiki Spam HOWTO
Please see http://wiki.evernex.com/index.php?title=Blocking_Spam_in_Mediawiki for a HOWTO on blocking spam in MediaWiki.

I keep it updated fairly often and run a fairly active small wiki on which I’ve successfully been able to go back to an open edit (no account required) policy. Successful spam attacks in the last 6 months: 0.

--Neurophyre, 07:23, 11 October 2006 (UTC)

I have created a similar article on my Wiki showing how I managed to stop spam: http://www.cookipedia.co.uk/recipes_wiki/How_to_stop_Spam_on_a_MediaWiki

--CookipediaChef (talk) 05:44, 19 February 2012 (UTC)

limit edit to an intranet (like a university)
I haven't tested this yet, because our admin is suspicious about this hack. The idea behind this is to limit anonymous edits to the local net of our university. I hope this is useful to you and please tell here. -- mik

suggestion for LocalSettings.php:

using $wgSpamRegex to block many links in a row
Most wikispam I've encountered has taken the form  keyword keyword keyword2 keyword2  etc. So, what about using $wgSpamRegex to block many links in a row? I'm thinking something like...

$wgSpamRegex = '/(\[http:\/\/[a-z0-9\.\/\%_-]+(\s+[a-z0-9\.-]+)+\]\s+){10}/i'; or $wgSpamRegex = '/(\[http:\/\/[^\s]+\s+[^\[]+\s*?){10}/i';

Comments? (Handy PHP regex tester...)

--Alxndr 03:18, 26 November 2006 (UTC)

Another CAPTCHA plugin for MediaWiki
reCAPTCHA has a MediaWiki plugin. It supports normal CAPTCHAs, and audio CAPTCHAs for the visually impaired.
 * ReCAPTCHA
 * http://recaptcha.net/plugins/mediawiki/


 * the reCaptcha plugin only works with php5 and, I'm assuming, MediaWwiki 1.8. I've gotten it to work with MediaWiki 1.6.12 and PHP 4.x download. 69.139.238.225 05:23, 13 April 2009 (UTC)

Picture CAPTCHAs
A vBulletin hack has this new type of CAPTCHA that differs from the regular text identification CAPTCHA and is far more powerful. It displays several simple pictures like a duck, a fire truck, a book, and a keyboard and it asks you to identify one of them. Bots have been programed to read the text CAPTCHAs, but so far can't get past this new type. We installed it on our forum and saw spam stop immediately. I run a small wiki, the Youth Rights Network and I'd love to see a system like this created to control new registrations. It is easy for humans and impossible (so far) for bots.

I saw a similar one from some university group. It shows you four pictures, then asks you what they all have in common, and there's a list of like 2 dozen keywords that could be the right answer. One picture might be a photograph of someone's basement, with junk all over, including a shelf with some empty wine bottles. Another might be a cartoon where one of the characters says the phrase 'bottle blonde'. Four pics like that, each with many distractions. Human finds a word they all have in common - Bottle - and picks it from the list of answers. Computer is still trying to understand the picture of the basement and all the objects in view.

how to implement: get a bunch of pics, cartoons, go google image search. For each pic, make a list of keywords. For each captcha, the server chooses a keyword and digs up 4 pics that have that keyword. Make sure you have enough!

external links only from admins
hey every one, i've a massive spambot attack on my wikimedia.

it looks always like this:
 * (Upload log); 14:06 . . LeroyStevenson (Talk | contribs) uploaded "File:Hcg Drops 1947.jpg"
 * (User creation log); 14:06 . . LeroyStevenson (Talk | contribs) New user account
 * (diff) (hist) . . N Has to be Cech his form never drops 38‎; 14:06 . . (+3,924) . . 173.208.40.36 (Talk) (Created page with 'Image:Hcg_Drops_1947.jpg HCG diet drops assure extreme weight loss. HCG -- Human Chorionic Gonadotropin -- diet drops generate up part of some weight loss program pr…')

always with different ips, and links. there is always only one link included. i have a recaptcha. but this is no problem for the bot.

i tried some link blacklist options. now it nearly isnt possible to add a link (also for normal user).

so i thought about only admins are able to include links. is there some way? the wiki: o-wiki.net

best regards, moritz

It's quite annoying but they are quite predicatable. I won't share here how I succesfully block them (they'll probably read it), but this is how I think their bot kind of works: I guess they seem to have people (payed or tricked into) entering the answers to the captcha's. Then they can use a bot to create the account using the answer to the captcha. Sometimes they specify an email address, usually a @gmail with lots of dots in the name (something you could also use to block them). If the account creation fails they retry with another ip-address. If account creation was succesfull they try to create a spam article. (or if creation of the account fails an X amount of times they try to use an existing unblocked spam account). If it fails they retry creating the account with another account (could be an older spam account that wasn't deleted). Spellcoder 21:36, 3 June 2011 (UTC)

I agree that for small web sites where external links aren't even necessary (and I'm sure there are many of these sites), the easiest means of preventing spam is to block all spam and then list the specific links that are allowed or to allow only admins or specified users to create links. There's a NotEvil extension that's supposed to do the latter, but it doesn't seem to work very well. It's supposed to allow specified users to create external links. This is such an easy solution that it's puzzling that there are all these other very lengthy and maintenance-heavy work-arounds yet not one simply allows only admins or specified users to add external links or blocks all except admin-listed links. This reminds me of a story about a rescue attempt where instead of taking 10 minutes to take the patient down the stairs in a gurney, the decision was to take him out a window - which was complicated, dangerous, and took more than an hour.

Spam Cleanup Script
I downloaded the script mentioned at Anti-spam features. But there are not instructions with the script, or in this article, about how to run it.

What directory should it go into? Do I need to edit it to include my database and password, or does it find that info in LocalSettings.php? How should it be run and how often? Are any other files required?

Like many others, my wiki was insufficiently protected and while I've implemented several of these techniques today, there is a lot of spam I'd like to clean out. --

Yoga Outfits

random or editable Special:UserLogin link
just an idea (maybe hard to implement): if  could be renamed, bots would have troubles to generate useraccounts automatically, as they always access   directly.

this would not create problems for the user, as the user simply clicks on the "Log in / create account" link.

-- Nikiwaibel 11:23, 15 February 2012 (UTC)
 * That would only work for a short time, it would be trivial to load the main page and parse it for the link to the registration page. PhilHibbs (talk) 22:56, 8 May 2012 (UTC)


 * Barriers with trivial workarounds can still be very effective, especially on smaller wikis. Most spambot operators are not going to fix their link for a single wiki. They took long enough to adapt even to MediaWiki markup.
 * It would be a bit of a hassle for operators of good bots such as Pywikipediabot, but the tradeoff is probably worth it. --Chriswaterguy (talk) 02:59, 9 May 2012 (UTC)

Recommendations
I thank Dcoetzee for his recent work on this page. I do, however, think that this diff and this diff remove a little bit of information that is pretty useful: Thoughts? Sumana Harihareswara, Wikimedia Foundation Volunteer Development Coordinator (talk) 16:04, 11 April 2012 (UTC)
 * Recommendation of QuestyCaptcha specifically; admins universally recommend it far more than reCaptcha
 * a guide to using reCaptcha with MediaWiki (is it obsolete or otherwise non-useful?)
 * The link to Extension:SpamRegex
 * Hi Sumana, sorry for the slow response. ReCAPTCHA should be used through Extension:ConfirmEdit now, so yes that page is obsolete. QuestyCaptcha is, frankly, a terrible CAPTCHA - far inferior to readily available alternatives like ReCAPTCHA and Asirra which are based on peer-reviewed, published results. The problem with QuestyCAPTCHA is that it's easy to quickly construct a database of all the question/responses used by a site by probing the registration system in a fully automated manner, and it's only a matter of time before spammers figure this out. (It is probably true that some spammers have automated attacks on ReCAPTCHA, because it's so widely deployed, but such attacks are much more resource intensive, and I believe Asirra is still unbroken). I've re-added the link to Extension:SpamRegex, although my opinion is that if you're adding an extension anyway, you probably actually want Extension:SpamBlacklist. Dcoetzee (talk) 03:30, 16 April 2012 (UTC)