Extension talk:SpamBlacklist/From Meta-Wiki

From MediaWiki.org
Jump to navigation Jump to search
The following discussion has been transferred from Meta-Wiki.
Any user names refer to users of that site, who are not necessarily users of MediaWiki.org (even if they share the same username).

Unlogged editors of this page

The following users (in chronological order) contributed content to the accompanying page. For technical reasons, their edits are not logged in the edit history.

User Contributions
Nealmcb created page summarizing the extension and download procedure.
Rick DeNatale added information about using the chongqed.org blacklist.
Morbus Iff created a patch for MediaWiki 1.5 compatibility.
Emj categorized.
Halz added information about MediaWiki-default blacklisting features, updated information about chongqed.org blacklist.
JoeChongq minor corrections, removed 1.5 compatibility patch.
Angela added information about Wikia's blacklist.
Sysy reorganized.
Neurophyre linked to documentation on MediaWiki anti-spam features.
Pathoschild reformatted and reorganized.

In addition, the following users made minor changes or corrections: anonymous editors, Jwalling, Lcarsdata, Naconkantari, Silsor, Spacebirdy, and Thomas Klein.

{admin} Pathoschild 00:12, 25 January 2007 (UTC)

Not working

I put the SpamBlacklist php files in /extensions/ and added this document:

require_once( "$IP/extensions/SpamBlacklist.php" );
$wgSpamBlacklistFiles = array(
	"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1", // Wikimedia's list
	"http://blacklist.chongqed.org/mediawiki/", //chongqed's list
	"DB: codecode_wiki Blacklist", //local list

But it doesn't work. I tested adding those links to my wiki and it lets me. Can anyone help?

Yes, only do this:

require_once( "$IP/extensions/SpamBlacklist.php" );

And everything will work. Apparently that array is broken somehow. Nobody can give a proper explanation of how to make it work. -- Sy / (talk) 23:23, 16 July 2006 (UTC)

Still not working for me...

This is frustrating. For me, the black list doesn't work - array or no array. All I have is this line in my LocalSettings.php:
require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php" );
Should it be in $IP/extensions instead of the $IP/extensions/SpamBlacklist directory? Hmm... --Sam Odio 15:31, 29 September 2006 (UTC)
Nope... moved everything out of the SpamBlacklist directory and still no go. All the files are in the extensions directory and I have the following at the bottom of LocalSettings.php:
require_once( "$IP/extensions/SpamBlacklist.php" );
Ideas anyone? Is there any way to troubleshoot whether the extension is even installed? --Sam Odio 02:51, 30 September 2006 (UTC)

Yeah I'm having the same problem! It's a great extension...if I can get it to work. Ed Jan 29, 2008


This extension and this documentation was written by Tim Starling and is ambiguously licensed.

QUOTE: Please note that all contributions to Meta are considered to be released under the GNU Free Documentation License (see Meta:Copyrights for details). If you don't want your writing to be edited mercilessly and redistributed at will, then don't submit it here.

-- Sy / (talk) 23:21, 16 July 2006 (UTC)

Source from a local file

if you are hosting it locally then you need to change:

"$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimedia's list

to something like:

"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1" // Wikimedia's list

Why? $IP is already set as a variable.

This doesn't work anyway. -- Sy / (talk) 23:32, 16 July 2006 (UTC)

Still does not work

If I follow the documentation line by line, SpamBlacklist still does not work. I have installed plenty of extensions on my site successfully, something isnt' right here

-- 20:18, 1 February 2006 (UTC)

If you are trying to use wgSpamBlacklistFiles to define what blacklists you can leave that part out. The extension automatically downloads the list from here. If I remember right, it did not seem to work right away. I guess it needed time to download the blacklist. I have no idea, I just know it suddenly worked. After I knew it worked at least with that one, I was able to get it to use a local copy of the chongqed.org blacklist, but it is too large for my server and causes the extension to not work. When defining another blacklist such as chongqed.org's I think you also have to list the mediawiki blacklist (as in the example) if you want to continue using it. I have yet to figure out how to use the DB blacklist option. My suggestion is just to start simple without defining the wgSpamBlacklistFiles and if you get stuck after that, at least you have the default blacklist protecting your wiki.

--JoeChongq 01:33, 3 February 2006 (UTC)

The reason that wgSpamBlacklistFiles doesn't work straight away is that you probably need to change the local file reference to a URL, i.e :

"$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimedia's list

to something like:

"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1" // Wikimedia's list

Having done that it has worked like a dream for me. -- 11:42, 11 March 2006 (UTC)

That does not work for me. It parses my database blacklist but fails to use the URL exactly as you pasted above (and from the documentation article). I know it's not even loading it properly because when I use cleanup.php, the regex size is only 53 bytes -- my database blacklist is about this size. -- 19:46, 22 March 2006 (UTC)
Tried that, it did work but taking Wikimedia's list does sometimes blacklist sites for political reasons (wikipedia-watch dot org, kapitalism dot net and the like). Wikia's list is worse still - at least one site listed there appears solely because it's a direct competitor to the Wikia-hosted Nonsensopedia. As such, instead of linking directly to these, it is best to copy the list to a protected page on one of your own sites, remove anything that's been blacklisted for purely commercial or political reasons, and protect the page.
SpamBlacklist is a useful and valuable tool but, like many powerful tools, it does need to be used with some basic level of precaution. -- 20:18, 13 January 2007 (UTC)

Solution to local spam file

OK - I found a workaround for hosting a local file.

I added this to LocalSettings.php

$wgSpamBlacklistFiles = array(
"http://myserver.com/somewhere/spamlist.php", // Local Spamlist
"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1", // Wikimedia's list


Where myserver.com is a hard coded URL to the spamlist.php file.

I had to leave out all comments in the file, and just a list of url's (no http:), but it works.

I've had to put this file outside of our wiki directory structure (ie not under the normal wiki), because otherwise our wiki tries to make a new page called spamlist.php, but it all works.

I couldn't get the DB part to work, but this works for us. Of course you'll need admin access to the server, or at least write access to the directory where you put the file.

The Puppeteer.

Source from a DB page

I ran into an odd situation. Using a spam pagin in my database worked fine. Untill the second time I wanted to save the page. Then I got an error, because I wanted to save a page that contained a not acceptable link wich is defined on that very same spampage. Is there a way to prevent the extension to check it's own SpamBlackListPage? See: http://www.kgv.nl/wiki/index.php?title=KGV:SpamBlacklist. Ruud

I removed the Wiki spam lines and now it works (for me). Ruud

Merged discussion

The follow discussion is merged from m:SpamBlacklist extension.

load_lists bug





regular expression too large

If you use chongqed's blacklist, you probably get the following warning when saving pages:

Warning: preg_match(): Compilation failed: regular expression too large at offset 0 in ...

SpamBlacklist uses preg_match, which has a limited pattern size. For more information and a patch to get around this, see bug #3632.

When using this patch, the blacklist converter needs to be adjusted. It needs to remove any trace of regular expressions. Here's a Python script which downloads and converts the blacklist:


from urllib import urlopen
from sys import exit

blacklist = urlopen('http://blacklist.chongqed.org/').read()
blacklist = blacklist.replace('https?:\/\/([^\/]*\.)?', ).replace('\\', )

badsigns = r'|\*$^[]()'

for b in badsigns:
	if blacklist.find(b) >= 0:
		print 'black list contains regex'

open('spam_blacklist', 'w').write(blacklist)

Even with this script I am still getting the same "regular expression too large at offset 0". But if I reduce the size of the blacklist to about 1/5 its size the warning goes away and the blacklist (what is left of it) works fine. It may have to do with my server, maybe the memory I can use is limited. Any ideas? For those of you that it does work for, you don't need to use a conversion script anymore (hopefully). We now have a MediaWiki version of our blacklist. The only difference between what this script produces and ours is the periods are escaped on ours. The extension appears to have no trouble with the escaped periods though, let us know if it would be better without. -- joe(at)chongqed.org 04:31, 16 January 2006 (UTC)

I had this issue too, but Brion fixed it with the latest verson of the extension today. See here for more details. --Tderouin 20:46, 18 September 2006 (UTC)


For my wiki, external URLs are rarely used; when they are, it is to one of three or four sites. Is it possible to set this up so it follows a whitelist syntax instead of a blacklist syntax (or both)? Ideally, I'd like to say "any URLs are bad, except any containing these sites:"

[1] says : Add a local whitelist, editable by admins at MediaWiki:Spam-whitelist -- Sy / (talk)

captcha to prevent spam?

how about an captcha-extension with a verification for every edit?

That would be another extension. It exists. -- Sy / (talk) 23:16, 16 July 2006 (UTC)

Using mcc.php

I can't find any information / or a readme on how to use mcc.php to delete the spam_blacklist_regex key. Any ideas?

Looking at the mcc.php code I've noticed there's a command for "delete" but I have no idea how to use it. I've gotten this far:

#php4 mcc.php
> delete spam_blacklist_regex 
MemCached error

Thanks --SamOdio 16:19, 22 January 2006 (UTC)

Tested the extension w/ a spam url and it worked, although threw warnings

Warning: gethttp(): Failed opening 'HttpFunctions.php' for inclusion (include_path='.:/home/karavsh/public_html/wiki:/home/karavsh/public_html/wiki/includes:/home/karavsh/public_html/wiki/languages') in /home/karavsh/public_html/wiki/extensions/SpamBlacklist/SpamBlacklist_body.php on line 196

Cron script?

The article says "The extension includes a cron script that can check for automatic updates from a shared blacklist." Where is it? --CygnusTM 18:03, 9 March 2006 (UTC)

That must be the cleanup.php file. It can only be run from the command line and needs the LocalSettings.php in your source directory. I can't confirm if it actually did anything but the following command ran ok from my wiki directory: php extensions/SpamBlacklist/cleanup.php -- 10:25, 11 March 2006 (UTC)

the cleanup.php script works for me as well. -- Sy / (talk) 23:35, 16 July 2006 (UTC)

cleanup.php works, but running it as a cron job might be dangerous as it can blank pages. -- 17:08, 8 September 2006 (UTC)

Error: Spamblacklist messes up Page Edit feature

Using a fresh install of Mediawiki 1.5.7, after adding the appropriate lines to LocalSettings.php and making such the syntax and directories are correct, I get an error anytime i try to Save an edit to a page. I have replaced my domain with www.mysite.com:

An error occurred while loading http://www.mysite.com/index.php?title=Library&action=submit:

Connection to host www.mysite.com is broken.


i cannot download the .php files via Opera or Firefox browser. the site takes endless to load.

It works for me. Try viewing the files and then manually saving them. -- Sy / (talk) 23:36, 16 July 2006 (UTC)

Auto blocking spammers

Is it possible to set this script to autoblock spammers user accounts or IPs for infinite or say 2 hours after SpamBlacklist detects a spam URL. A spammer could try to spam, be denied, get annoyed and do some vandalism. Thanks :-- 21:05, 22 April 2006 (UTC)

Not with this extension at this time. =/ -- Sy / (talk) 23:24, 16 July 2006 (UTC)


I've rewritten this page and have gutted this talk page in an attempt to make the lives of future users a little easier. -- Sy / (talk) 23:47, 16 July 2006 (UTC)

The SpamBlacklist extension destroys user edits

The extension will destroy user edits when it comes across a spam link. Scenario:

  1. Edit a page.
  2. Add some text.
  3. Save.
  4. See warning message.
  5. Scratch head, since I did not add any links.
  6. Go back.
  7. Observe that all my chages have been destroyed by my browser. Yes, some contemporary browsers still do this.
  8. Throw hands up in disgust.
  9. Leave wiki without contributing.

Step 4 should display a before/after just like an edit collision, and give an opportunity for the user to edit their text. -- Sy / (talk) 23:52, 16 July 2006 (UTC)

Erase it for me ...

For now, the spam blacklist extention ask to the next user to erase the "unwanted" link. Is that realy what you whant to do?

I'm here to ask if it's possible to change this and stop only the new link, or most simply erase the present link before run the extention.--Smily 19:00, 20 August 2006 (UTC) Sorry for my english, that's not my langage

SpamBlacklist shouldn't protect the blacklist page

It's really annoying that the spam blacklist talk page is protected by the filter! There should be an exception for that, since it prevents people from discussing items which are already in the list without obfuscating the links...

List the *&^#$% links it doesn't like

I'm just trying to remove the {{sprotected}} tag from a page and I'm getting the SpamFilter message. Thing is, this page has about a 100 URLs since it is well referenced. I found and removed two links to tinyurl, which weren't spam but did need to be changed. It still won't save. WHY doesn't it list the offending URLs on the page in the SpamFilter message? This is assine to make me try and figure out what it doesn't like. —Doug Bell talkcontrib 18:19, 4 December 2006 (UTC)

Not working anymore in PHP 5.2

PHP Warning: preg_match() [<a href='function.preg-match'>function.preg-match</a>]: Compilation failed: repeated subpattern is too long at offset 20022 in /var/www/extensions/SpamBlacklist/SpamBlacklist_body.php on line 210

There is a change to the pcre library which limits (MAX_PATTERN_SIZE) the size of the subpattern.

Not working

I get "Fatal error: Call to undefined function: getexternallinks() in /server/public_html/wiki/extensions/SpamBlacklist/SpamBlacklist_body.php on line 194 " when I make an edit. 20:20, 4 May 2007 (UTC) Edit: n/m it was because my wiki wasn't up to date. 23:35, 8 May 2007 (UTC)

Getting errors

Does anyone know why I would be getting these errors? Cannot modify header information- headers already sent by(output started at htdocs/sbi/wiki/extenstion/spamblacklist/spamblacklist.php:47) in htdocs/sbi/wiki/includes/outputpage.php on line 576 and Warning: preg_match() [function.preg-match]: Compilation failed: repeated subpattern is too long at offset 20020 in htdocs/sbi/wiki/extensions/SpamBlacklist/SpamBlacklist_body.php on line 210 -- 00:12, 12 May 2007 (UTC)