Extension talk:SpamBlacklist/From MediaWiki

Unlogged editors of this page
The following users (in chronological order) contributed content to the accompanying page. For technical reasons, their edits are not logged in the edit history.

In addition, the following users made minor changes or corrections: anonymous editors, Jwalling, Lcarsdata, Naconkantari, Silsor, Spacebirdy, and Thomas Klein.

— {admin} Pathoschild 00:12, 25 January 2007 (UTC)

Not working
I put the SpamBlacklist php files in /extensions/ and added this document: require_once( "$IP/extensions/SpamBlacklist.php" ); $wgSpamBlacklistFiles = array(	"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1", // Wikimedia's list	"http://blacklist.chongqed.org/mediawiki/", //chongqed's list	"DB: codecode_wiki Blacklist", //local list ); But it doesn't work. I tested adding those links to my wiki and it lets me. Can anyone help?

Yes, only do this:

require_once( "$IP/extensions/SpamBlacklist.php" );

And everything will work. Apparently that array is broken somehow. Nobody can give a proper explanation of how to make it work. -- Sy / (talk) 23:23, 16 July 2006 (UTC)

Still not working for me...

 * This is frustrating. For me, the black list doesn't work - array or no array.  All I have is this line in my LocalSettings.php:

require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php" );
 * Should it be in $IP/extensions instead of the $IP/extensions/SpamBlacklist directory? Hmm... --Sam Odio 15:31, 29 September 2006 (UTC)
 * Nope... moved everything out of the SpamBlacklist directory and still no go. All the files are in the extensions directory and I have the following at the bottom of LocalSettings.php:

require_once( "$IP/extensions/SpamBlacklist.php" );
 * Ideas anyone? Is there any way to troubleshoot whether the extension is even installed?  --Sam Odio 02:51, 30 September 2006 (UTC)

Yeah I'm having the same problem! It's a great extension...if I can get it to work. Ed Jan 29, 2008

Copyright

 * This extension and this documentation was written by Tim Starling and is ambiguously licensed.

QUOTE: Please note that all contributions to Meta are considered to be released under the GNU Free Documentation License (see Copyrights for details). If you don't want your writing to be edited mercilessly and redistributed at will, then don't submit it here.

-- Sy / (talk) 23:21, 16 July 2006 (UTC)

Source from a local file

 * /If I specify more than one file -- Other people are having similar issues.

if you are hosting it locally then you need to change:

"$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimedia's list

to something like:

"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1" // Wikimedia's list

Why? $IP is already set as a variable.

This doesn't work anyway. -- Sy / (talk) 23:32, 16 July 2006 (UTC)

Still does not work
If I follow the documentation line by line, SpamBlacklist still does not work. I have installed plenty of extensions on my site successfully, something isnt' right here
 * --168.98.32.51 20:18, 1 February 2006 (UTC)

If you are trying to use wgSpamBlacklistFiles to define what blacklists you can leave that part out. The extension automatically downloads the list from here. If I remember right, it did not seem to work right away. I guess it needed time to download the blacklist. I have no idea, I just know it suddenly worked. After I knew it worked at least with that one, I was able to get it to use a local copy of the chongqed.org blacklist, but it is too large for my server and causes the extension to not work. When defining another blacklist such as chongqed.org's I think you also have to list the mediawiki blacklist (as in the example) if you want to continue using it. I have yet to figure out how to use the DB blacklist option. My suggestion is just to start simple without defining the wgSpamBlacklistFiles and if you get stuck after that, at least you have the default blacklist protecting your wiki.
 * --JoeChongq 01:33, 3 February 2006 (UTC)

The reason that wgSpamBlacklistFiles doesn't work straight away is that you probably need to change the local file reference to a URL, i.e :

"$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimedia's list

to something like:

"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1" // Wikimedia's list

Having done that it has worked like a dream for me. -- 80.176.79.189 11:42, 11 March 2006 (UTC)


 * That does not work for me. It parses my database blacklist but fails to use the URL exactly as you pasted above (and from the documentation article).  I know it's not even loading it properly because when I use cleanup.php, the regex size is only 53 bytes -- my database blacklist is about this size. --71.195.183.211 19:46, 22 March 2006 (UTC)


 * Tried that, it did work but taking Wikimedia's list does sometimes blacklist sites for political reasons (wikipedia-watch dot org, kapitalism dot net and the like). Wikia's list is worse still - at least one site listed there appears solely because it's a direct competitor to the Wikia-hosted Nonsensopedia. As such, instead of linking directly to these, it is best to copy the list to a protected page on one of your own sites, remove anything that's been blacklisted for purely commercial or political reasons, and protect the page.


 * SpamBlacklist is a useful and valuable tool but, like many powerful tools, it does need to be used with some basic level of precaution. --66.102.74.160 20:18, 13 January 2007 (UTC)

Solution to local spam file
OK - I found a workaround for hosting a local file.

I added this to LocalSettings.php $wgSpamBlacklistFiles = array( "http://myserver.com/somewhere/spamlist.php", // Local Spamlist "http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1", // Wikimedia's list

);

Where myserver.com is a hard coded URL to the spamlist.php file.

I had to leave out all comments in the file, and just a list of url's (no http:), but it works.

I've had to put this file outside of our wiki directory structure (ie not under the normal wiki), because otherwise our wiki tries to make a new page called spamlist.php, but it all works.

I couldn't get the DB part to work, but this works for us. Of course you'll need admin access to the server, or at least write access to the directory where you put the file.

The Puppeteer.

Source from a DB page
I ran into an odd situation. Using a spam pagin in my database worked fine. Untill the second time I wanted to save the page. Then I got an error, because I wanted to save a page that contained a not acceptable link wich is defined on that very same spampage. Is there a way to prevent the extension to check it's own SpamBlackListPage? See: http://www.kgv.nl/wiki/index.php?title=KGV:SpamBlacklist. Ruud
 * I removed the Wiki spam lines and now it works (for me). Ruud

Merged discussion
The follow discussion is merged from SpamBlacklist extension.

load_lists bug
change http://meta.wikimedia.org/wiki/Spam_blacklist?action=raw to http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw

regular expression too large
If you use chongqed's blacklist, you probably get the following warning when saving pages: Warning: preg_match: Compilation failed: regular expression too large at offset 0 in ... SpamBlacklist uses preg_match, which has a limited pattern size. For more information and a patch to get around this, see bug #3632.

When using this patch, the blacklist converter needs to be adjusted. It needs to remove any trace of regular expressions. Here's a Python script which downloads and converts the blacklist: from urllib import urlopen from sys import exit blacklist = urlopen('http://blacklist.chongqed.org/').read blacklist = blacklist.replace('https?:\/\/([^\/]*\.)?', ).replace('\\', ) badsigns = r'|\*$^[]' for b in badsigns: if blacklist.find(b) >= 0: print 'black list contains regex' exit(1) open('spam_blacklist', 'w').write(blacklist)
 * 1) !/usr/bin/python

Even with this script I am still getting the same "regular expression too large at offset 0". But if I reduce the size of the blacklist to about 1/5 its size the warning goes away and the blacklist (what is left of it) works fine. It may have to do with my server, maybe the memory I can use is limited. Any ideas? For those of you that it does work for, you don't need to use a conversion script anymore (hopefully). We now have a MediaWiki version of our blacklist. The only difference between what this script produces and ours is the periods are escaped on ours. The extension appears to have no trouble with the escaped periods though, let us know if it would be better without. -- joe(at)chongqed.org 04:31, 16 January 2006 (UTC)


 * I had this issue too, but Brion fixed it with the latest verson of the extension today. See here for more details. --Tderouin 20:46, 18 September 2006 (UTC)

Whitelist?
For my wiki, external URLs are rarely used; when they are, it is to one of three or four sites. Is it possible to set this up so it follows a whitelist syntax instead of a blacklist syntax (or both)? Ideally, I'd like to say "any URLs are bad, except any containing these sites:"

says : Add a local whitelist, editable by admins at MediaWiki:Spam-whitelist -- Sy / (talk)

captcha to prevent spam?
how about an captcha-extension with a verification for every edit?

That would be another extension. It exists. -- Sy / (talk) 23:16, 16 July 2006 (UTC)

Using mcc.php
I can't find any information / or a readme on how to use mcc.php to delete the spam_blacklist_regex key. Any ideas?

Looking at the mcc.php code I've noticed there's a command for "delete" but I have no idea how to use it. I've gotten this far: > delete spam_blacklist_regex MemCached error >
 * 1) php4 mcc.php

Thanks --SamOdio 16:19, 22 January 2006 (UTC)

Tested the extension w/ a spam url and it worked, although threw warnings
Warning: gethttp: Failed opening 'HttpFunctions.php' for inclusion (include_path='.:/home/karavsh/public_html/wiki:/home/karavsh/public_html/wiki/includes:/home/karavsh/public_html/wiki/languages') in /home/karavsh/public_html/wiki/extensions/SpamBlacklist/SpamBlacklist_body.php on line 196

Cron script?
The article says "The extension includes a cron script that can check for automatic updates from a shared blacklist." Where is it? --CygnusTM 18:03, 9 March 2006 (UTC)

That must be the cleanup.php file. It can only be run from the command line and needs the LocalSettings.php in your source directory. I can't confirm if it actually did anything but the following command ran ok from my wiki directory: php extensions/SpamBlacklist/cleanup.php  --  80.176.79.189 10:25, 11 March 2006 (UTC)

the cleanup.php script works for me as well. -- Sy / (talk) 23:35, 16 July 2006 (UTC)


 * cleanup.php works, but running it as a cron job might be dangerous as it can blank pages. --71.232.101.231 17:08, 8 September 2006 (UTC)

Error: Spamblacklist messes up Page Edit feature
Using a fresh install of Mediawiki 1.5.7, after adding the appropriate lines to LocalSettings.php and making such the syntax and directories are correct, I get an error anytime i try to Save an edit to a page. I have replaced my domain with www.mysite.com:

An error occurred while loading http://www.mysite.com/index.php?title=Library&action=submit:

Connection to host www.mysite.com is broken.

download
i cannot download the .php files via Opera or Firefox browser. the site takes endless to load.

It works for me. Try viewing the files and then manually saving them. -- Sy / (talk) 23:36, 16 July 2006 (UTC)

Auto blocking spammers
Is it possible to set this script to autoblock spammers user accounts or IPs for infinite or say 2 hours after SpamBlacklist detects a spam URL. A spammer could try to spam, be denied, get annoyed and do some vandalism. Thanks :--86.138.109.70 21:05, 22 April 2006 (UTC)

Not with this extension at this time. =/ -- Sy / (talk) 23:24, 16 July 2006 (UTC)

rewrite
I've rewritten this page and have gutted this talk page in an attempt to make the lives of future users a little easier. -- Sy / (talk) 23:47, 16 July 2006 (UTC)

The SpamBlacklist extension destroys user edits
The extension will destroy user edits when it comes across a spam link. Scenario:


 * 1) Edit a page.
 * 2) Add some text.
 * 3) Save.
 * 4) See warning message.
 * 5) Scratch head, since I did not add any links.
 * 6) Go back.
 * 7) Observe that all my chages have been destroyed by my browser.  Yes, some contemporary browsers still do this.
 * 8) Throw hands up in disgust.
 * 9) Leave wiki without contributing.

Step 4 should display a before/after just like an edit collision, and give an opportunity for the user to edit their text. -- Sy / (talk) 23:52, 16 July 2006 (UTC)

Erase it for me ...
For now, the spam blacklist extention ask to the next user to erase the "unwanted" link. Is that realy what you whant to do?

I'm here to ask if it's possible to change this and stop only the new link, or most simply erase the present link before run the extention.--Smily 19:00, 20 August 2006 (UTC) ''Sorry for my english, that's not my langage

SpamBlacklist shouldn't protect the blacklist page
It's really annoying that the spam blacklist talk page is protected by the filter! There should be an exception for that, since it prevents people from discussing items which are already in the list without obfuscating the links...

UserNames Whitelist needed
All small Wikis have a small group of usernames who edit the Wiki frequently. Its a waste of resources and potential annoyance to frequent established editors of the Wiki. Can a White list be somehow created which takes in "safe" names from a protected Mediawiki page and lets them bypass this security? I'm a sysOP at a Wiki and got a spam block message, imagine that. So this Whitelist is crucial in my opinion. People can request to be added to the whitelist as well.--Matt57 03:23, 28 October 2006 (UTC)

List the *&^#$% links it doesn't like
I'm just trying to remove the sprotected tag from a page and I'm getting the SpamFilter message. Thing is, this page has about a 100 URLs since it is well referenced. I found and removed two links to tinyurl, which weren't spam but did need to be changed. It still won't save. WHY doesn't it list the offending URLs on the page in the SpamFilter message? This is assine to make me try and figure out what it doesn't like. —Doug Bell talk•contrib 18:19, 4 December 2006 (UTC)

Not working anymore in PHP 5.2
PHP Warning: preg_match [function.preg-match]: Compilation failed: repeated subpattern is too long at offset 20022 in /var/www/extensions/SpamBlacklist/SpamBlacklist_body.php on line 210

There is a change to the pcre library which limits (MAX_PATTERN_SIZE) the size of the subpattern.

Not working
I get "Fatal error: Call to undefined function: getexternallinks in /server/public_html/wiki/extensions/SpamBlacklist/SpamBlacklist_body.php on line 194 " when I make an edit.65.6.74.112 20:20, 4 May 2007 (UTC) Edit: n/m it was because my wiki wasn't up to date. 68.222.30.14 23:35, 8 May 2007 (UTC)

Getting errors
Does anyone know why I would be getting these errors? Cannot modify header information- headers already sent by(output started at htdocs/sbi/wiki/extenstion/spamblacklist/spamblacklist.php:47) in htdocs/sbi/wiki/includes/outputpage.php on line 576 and Warning: preg_match [function.preg-match]: Compilation failed: repeated subpattern is too long at offset 20020 in htdocs/sbi/wiki/extensions/SpamBlacklist/SpamBlacklist_body.php on line 210 --72.228.4.13 00:12, 12 May 2007 (UTC)

Conflicting with ConfirmEdit Extension
Hi, it seems this extension is conflicting with the ConfirmEdit Extension (link). On creation of a new page (where a captcha is thrown) i get the following error:

Warning: Missing argument 4 for wfspamblacklistvalidate in /home/www/web125/html/rockinchina/wiki/extensions/SpamBlackList/SpamBlacklist.php on line 67


 * MediaWiki: 1.6.8, PHP: 4.4.6

Cross posting to: Extension_talk:ConfirmEdit -- Matsch 20:06, 23 August 2007 (UTC)


 * I too am seeing precisely the same error, but we don't have the ConfirmEdit extension installed.
 * MediaWiki: 1.6.9, PHP: 4.3.10-22 209.198.95.98 18:07, 27 August 2007 (UTC)

Chinese Spam
I've found that [www.fanhistory.com/index.php/MediaWiki:Spam-blacklist adding .cn] means fewer spam postings from Chinese wikispammers happen. Any other ways to minimize the Chinese based spam with Chinese characters other than trying add those character strings? And does the blocking work for them or is MediaWiki reading them as not the right way in order to block them right? --76.214.233.199 14:59, 2 September 2007 (UTC)


 * sorry, i had to unlink your link because of spam protection. -- seth 23:36, 19 April 2008 (UTC)

Line Error Message
So I load the pages into my extension directory and at the top of my wiki is spits out Warning: Call-time pass-by-reference has been deprecated; If you would like to pass it by reference, modify the declaration of [runtime function name]. If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file. in /home/.lutece/mariainc/disapedia.com/extensions/SpamBlacklist/SpamBlacklist.php on line 103

Warning: Call-time pass-by-reference has been deprecated; If you would like to pass it by reference, modify the declaration of [runtime function name]. If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file. in /home/.lutece/mariainc/disapedia.com/extensions/SpamBlacklist/SpamBlacklist.php on line 112

Warning: Call-time pass-by-reference has been deprecated; If you would like to pass it by reference, modify the declaration of [runtime function name]. If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file. in /home/.lutece/mariainc/disapedia.com/extensions/SpamBlacklist/SpamBlacklist.php on line 112 How did I screw up? 24.5.195.51 23:14, 23 November 2007 (UTC)
 * I had the same problem running Mediawiki 1.10.1 until I installed using the 1.10 branch. --Tosfos 04:17, 31 December 2007 (UTC)

Losing content
I'm wondering if anybody knows how to customize this extension so that you don't lose all your edits if you unknowingly put a link that is on the spam list. Right now it simply spits you out to the main page. It would be great if this extension simply reloaded the editing page (with all the edits) and put the warning message at the top of the editing page. Similar to the ConfirmEdit extension. Is this possible? Edward (March 28, 2008)

more detailed manual and suggestions
hi!

is there a more detailed description than Extension:SpamBlacklist? i want to find answers for the following questions: tia! -- seth 22:48, 17 April 2008 (UTC)
 * 1) where does the regexp  try to match? in the wikisource-code? in html-source? or are there preparsers which grep all urls? for example, if preparsing was used, it wouldn't be necessary to avoid patterns like ".*"
 * 2) does (?:foo|bar) vs. (foo|bar) affect speed, when S-modifier ist set? (without the S-modifier i guess that a non-capturing pattern would be faster).
 * 3) why does the header of those blacklists say "Every non-blank line is a regex fragment which will only match hosts inside URLs", while that isn't true? it does not only match hosts, it matches the path, too.

meanwhile, i checked out the source code. some things seem to be strange:
 * 4. actually  is not right, cause "!" is not the delimiter character. so \! wouldn't be necessary to match a "!".
 * 5. wouldn't it be better and faster to use just  instead of  ?

apart from that i have some feature requests:
 * 6. as there exists a log-file for all changes on the spamblacklist like meta:Spam_blacklist/Log it would be a nice feature, if MediaWiki:Spamprotectiontext would mention the reason noted in that log-file. is that possible?
 * 7. the blacklist-users should have the opportunity to chose whether a regexp should be blocked on articles only or on arcticles _and_ their talk-pages. this could be done for example by a special parameter or a separated blacklist.
 * 8. a new entry in a blacklist should not cause a spamprotection intervention on existing links, but only when someone tries to put a new (forbidden) link to a page. (i guess, this could be technically solved by simply counting the numbers of occurrences of forbidden urls before and after editing of a page: if diff!=0 then block)

-- seth 23:36, 19 April 2008 (UTC)