Manual talk:$wgSpamRegex

blocks all href links entirely

 * Large Example shows on article:

"\< \s*a\s*href|". # This blocks all href links entirely, forcing wiki syntax

in Source:

"\<\s*a\s*href|". # This blocks all href links entirely, forcing wiki syntax

So this is a parser issue? First will not work because of "/" as delimiter ends the regex. Fails with error "Unknown modifier 'p').

--Martin


 * Are there other categories which this could/should go into? ex. security or spam protection?

Sy Ali 17:48, 19 April 2006 (UTC)


 * On my MediaMiki, using the "Large Example," spam is getting through the regex for "overflow" by dropping the closing semi-colon. So, I deleted the semi-colon and that seems to be working (for now). It might be useful to others to remove it since it's not necessary. I can't, I tried (the spam protection used here won't let me save). Latrippi 02:28, 22 July 2006 (UTC)

blocking lots of links
Most wikispam I've encountered has taken the form  etc. So, what about using this to block many links in a row? I'm thinking something like...

$wgSpamRegex = '/(\[http:\/\/[a-z0-9\.\/\%_-]+(\s+[a-z0-9\.-]+)+\]\s+){10}/i'; or $wgSpamRegex = '/(\[http:\/\/[^\s]+\s+[^\[]+\s*?){10}/i';

Comments? (Handy PHP regex tester...)

--Alxndr 03:18, 26 November 2006 (UTC)

How does one stop the 'MediaWiki:Spamprotectiontext' telling the spammer what words just got banned and therefore rewording their spam to get passed it?

I'd love to know.

--Quatermass 20:43, 9 May 2007 (UTC)
 * You can change that message in Special:Allmessages Jonathan3 18:09, 8 September 2007 (UTC)


 * You read you can delete the "$1" on MediaWiki:Spamprotectionmatch in order to achieve that. w:User:JanCK10:52, 18 November 2007 (UTC)

log
Is there a log that shows how ofter the my mediawiki denies edits? w:User:JanCK00:56, 18 November 2007 (UTC)

$wgSpamRegex is not working in my wiki
Maybe someone can help me. I have configured the variable wgSpamRegex like Manual:$wgSpamRegex but if i try to test the filter with words of spam nothing happens. Is there something else to do? The version of mediawiki is 1.13.0. Thx! --88.65.198.156 18:24, 5 October 2008 (UTC)
 * You can try Extension:SpamRegex.  i Alex  18:35, 5 October 2008 (UTC)
 * Thx - now it's working but the only problem is that I get a php warning if the spamregex filter alerts. Here the output from html

Warning: preg_match [function.preg-match]: Delimiter must not be alphanumeric or backslash in /../htdocs/includes/EditPage.php on line 747

What can I do against this output? Thx again!
 * Is there nobody who has the same problem? --82.113.113.161 15:24, 13 March 2009 (UTC)

blocking by number of links
I have tried to add a limit for number of links to 15 as mentioned in the article, but am still able to add articles with more than 15 links. This is my regex in its entirety:

$wgSpamRegex = "/". # The "/" is the opening wrapper "s-e-x|zoofilia|sexyongpin|grusskarte|geburtstagskarten|animalsex|". "sex-with|dogsex|adultchat|adultlive|camsex|sexcam|livesex|sexchat|". "chatsex|onlinesex|adultporn|adultvideo|adultweb.|hardcoresex|hardcoreporn|". "teenporn|xxxporn|lesbiansex|livegirl|livenude|livesex|livevideo|camgirl|". "spycam|voyeursex|casino-online|online-casino|kontaktlinsen|cheapest-phone|". "laser-eye|eye-laser|fuelcellmarket|lasikclinic|cragrats|parishilton|". "paris-hilton|paris-tape|fuel-dispenser|fueling-dispenser|". "jinxinghj|telematicsone|telematiksone|a-mortgage|diamondabrasives|". "reuterbrook|sex-plugin|sex-zone|lazy-stars|eblja|liuhecai|". "buy-viagra|-cialis|-levitra|boy-and-girl-kissing|". # These match spammy words "dirare\.com|". # This matches dirare.com a spammer's domain name "overflow\s*:\s*auto|". # This matches against overflow:auto (regardless of whitespace on either side of the colon) "height\s*:\s*[0-4]px|". # This matches against height:0px (most CSS hidden spam) (regardless of whitespace on either side of the colon) "(http:.*){16}|". # ***** Limit total number of external links allowed per page / to 15  DOESN'T WORK! "display\s*:\s*none". # This matches against display:none (regardless of whitespace on either side of the colon) "/i";                    # The "/" ends the regular expression and the "i" switch which follows makes the test case-insensitive

It does block the other expressions, but I can still save articles with more than 15 links! I don't see what I'm doing wrong, Please help...


 * MediaWiki: 1.11.0
 * PHP: 5.2.6 (cgi-fcgi)
 * MySQL: 5.0.45-community-log

Thanks, Nathanael Bar-Aur L. 17:22, 7 October 2008 (UTC)


 * PHP 5.2.x introduced pcre.backtrack_limit with default 100000 (less than 100K). I think that is too low and trips up the regex. See stronk7 at moodle dot org's 13-Sep-2007 comment (Find '13-Sep-2007') at http://us.php.net/manual/en/ref.pcre.php. Try adding the following line to LocalSettings.php:

ini_set( 'pcre.backtrack_limit', '8M' );


 * I don't know what 'pcre.backtrack_limit' value is appropriate. 8M works for me and is lifted from paragraph 4 of Wikipedia's Perl Compatible Regular Expressions article intro. Someone who knows more please adjust that and comment. --Rogerhc 17:44, 9 November 2010 (UTC)


 * It works for me only with (.|\n)*? <-- this part crosses line ends (\n) and is ungreedy (*?). Like this it works for me up to {129} on a long page with many 200 repetitions of "http://xxxxx " on it. With {130} or higher the server gave this error message: "503 Service Unavailable - The server is temporarily busy, try again later!". Try this:

$wgSpamRegex = "/(http:(.|\n)*?){101}/";


 * --Rogerhc 05:03, 10 November 2010 (UTC)

Not working for me
i simply put the following line in my settings.

$wgSpamRegex = "/suyash jain/i";

but it is not working

Any help..

Profanity
Hey, anyone got any regex profanity checks out there?
 * You can just search Google for "Profanity word list". That will give you a number of lists with a few hundred to more than 1000 words. I e.g. found text files with one entry per line. Depending on the list you found, many of the words on it may or may not be problematic, also depending on what you are using the wiki for.
 * Once you have a list, which suits your needs, it is trivial: Just replace the line breaks with a pipe sign and you have the string for your regular expression. --87.123.6.102 14:44, 12 February 2016 (UTC)

Example blocks legitimate CSS
For example, if I were to type, "overflo:auto; height:" [with "overflow" instead of "overflo", "w" deleted by User:Rogerhc to get this through MwdiaWiki's current spam filter] I would not be allowed to save this page. Rocket000 08:05, 19 August 2009 (UTC)
 * True and noted. And I had to change your comment to get it past MediaWiki's current spam filter. However, legitimate wiki edits probably don't need that particular CSS, and disallowing it helps stop spam. So it is useful on most wikis. --Rogerhc 18:08, 9 November 2010 (UTC)

Blocking all external links, working version:
$wgSpamRegex = "/^http:|^\Nathanael Bar-Aur L. 22:03, 25 September 2009 (UTC)


 * I'm doing similar here:

$SpamRegexArray[]="http"; $SpamRegexArray[]="https"; $SpamRegexArray[]="ftp";

if (count($SpamRegexArray)) { $wgSpamRegex = "/". implode("|",array_unique($SpamRegexArray)). "/"; } //unset ($wgSpamRegex);


 * Works perfectly, users cannot save any pages containing Wiki-URLs. Have this working since few years and got (nearly) no spam (some spammers do kind of tagging with random strings to probe wikis for spamfilters, those won't get blocked by my simple filter).
 * If someone needs to save an URL he might use something like instead of Some Foo Bar . Users will get a notice on this, when hitting the spamfilter. Users might understand what to do, bots and spammers won't. It's like a Turing test. --Rabe (talk) 21:26, 17 February 2012 (UTC)

I use the example of Rabe, with the difference i turned $SpamRegexArray[]="ftp"; to $SpamRegexArray[]="www"; and added "The page you wanted to save was blocked by the spam filter." to Systemmsgs "MediaWiki:Spamprotectiontext" and "MediaWiki:Spamprotectionmatch". So no one sees the reason why something is blocked. I hope this will help some wikis to get less spam. --Feder (talk) 12:51, 26 April 2012 (UTC)

working on this page
I was working on this page, grammar, spelling etc, and moved this section to the talk page:

I think the last sentence sums it up, "In many cases it's a waste of time" most spam is from bots. This section seems more like a wishful polemic then instructions on how to use $wgSpamRegex. Errectstapler 04:03, 17 July 2011 (UTC)

User group exemptions
Is there a way to allow members of the sysop group, for example, to bypass any restrictions? 83.170.106.45 22:55, 6 May 2012 (UTC)
 * Not with $wgSpamRegex. It even affects users of the sysop and bureaucrats user groups. Use Extension:AbuseFilter to be able to set up rules, which also allow you to filter by group! I have added that to the page now. --87.123.6.102 15:00, 12 February 2016 (UTC)

Banned words evading the SpamRegex somehow
I have been suffering vast amounts of Chinese spam despite banning the key words (a series of brand names the spammers are using).

The spam is put in a User page and the banned words appear in a big heading ( = Gucci bags =, for example) and not in the text (where they put random text). Just putting it in a heading seems to evade the block. It is odd spam; it contains no links, as those are hindered by using CAPTCHA, so how it helps their SEO I do not know.

When I log in (as a normal user) and try to include a heading in a normal article with a banned word, I am blocked as I should be, but somehow the spammers are breaking through.

Is there an exemption in $wgSpamRegex for User pages? Can I reconfigure it so as to close it?

Hogweard (talk) 11:18, 17 July 2012 (UTC)

Using "Add Topic" tab bypasses this filter, i.e. http://www.mediawiki.org/w/index.php?title=Manual_talk:$wgSpamRegex
If you use the "Add Topic" tab to add a new section to a page, and put an external link as the "Subject/headline", it will bypass whatever filter you have set in $wgSpamRegex.--Chibbie (talk) 13:48, 17 June 2013 (UTC)


 * RESOLVED: You can set $wgSummarySpamRegex to the same as $wgSpamRegex, and that will filter the "Add Topic" subject as well.--Chibbie (talk) 20:55, 18 June 2013 (UTC)

Help with PHP to wildcard all top level domain
Hello, in the example on the page: "domainname\.cn|". Can I wildcard the domain name so that I can block all top level domains? For example "anydomain\.cn|". --MAHR88 (talk) 21:31, 7 December 2016 (UTC)
 * self solved: experiment,just leave blank no special syntax required: "\.fr|\.xn|\.vn|\.pl|". --MAHR88 (talk) 18:18, 13 December 2016 (UTC)