Talk:Spam Filter

Great ideas but
A google search for 'Mediawiki spam protection' brings people right to this page, and looking for an actual answer to the question 'how the **** do I stop these **** ruining my wiki?' isn't directly answered. Perhaps a conscise intro piece pointing wiki admins to the information they're probably looking for would benefit everyone greatly.

More to consider
Some other ideas related to what we're discussing (empowering small wikis to respond quickly):
 * Peer to Peer blacklists or banlists, like http://usemod.com/cgi-bin/mb.pl?PeerToPeerBanList
 * A list that learns to blacklist urls - what I have in mind is something less comlex than existing Bayesian filters - a filter that adds urls to a blacklist after X number of repeat posts of that url within Y amount of time.

I posted this to the wikitech mailing list:

''The idea is that the normal behaviour for a wiki is that it is very unlikely for an url to be referenced in an edit more than X times in 24 hours (I'll postulate twice, just for fun), but it is very common behaviour for a spammer. Therefore, if one could ascertain (do some tests) that some large percentage (say 99.9%) of valid (non-spam) urls posted via edits occur less than X in 24 hours, you could put in a filter inconveniencing a very small percentage of users that would autmatically add those urls to a blacklist.''

--Aerik 05:26, 22 January 2006 (UTC)

Perhaps the new Spam cleanup script should also be mentioned here? It's now in use on all non-large Wikimedia sites as well as all Wikicities. Angela 05:51, 26 January 2006 (UTC)
 * I added a link to that article under wiki resources. --jwalling 01:48, 1 February 2006 (UTC)

The Wikicities page mentions that pages containing spam are reverted back to the time they don't contain any spam. I am sure some wikis in the Wikipedia family are less well monitored than others. Especially with CSS Hidden Spam, spam could easily have survived many legitimate edits. Isn't there a good chance of loosing good info. I know it will still be in history, but if spam was not already removed on those pages, chances are good they aren't popular pages/sites that will be rebuilt. I do think it is good that the spam is being cleaned out though. Are there any public stats on what/how much has been cleaned? --JoeChongq 09:45, 27 January 2006 (UTC)
 * A spam cleanup log would help a reviewer to monitor for 'lost' content. --jwalling 01:45, 1 February 2006 (UTC)


 * A log would definitely be useful. At the moment, there's no public log and no stats. It does cause problems with reverting of real content, especially when users want to make good faith links to site that we blacklist (funpic.de for example). However, if there's an active community, they can easily revert that, and if there's not, it's worth the risk to keep the wiki spam free. The script never deletes a page, so everything it does is revertable by any user. Angela 09:56, 26 February 2006 (UTC)

was this inplemented?
A sysop in he wiki reported being blocked by a spam filter. is this the reason? he tried to add (or rather, edited an article containing) http://www.ynet.co.il, a major Israely online newspaper which shouldn't be blocked... Felagund 10:01, 26 January 2006 (UTC)
 * The article on Spam Filter is in the discussion phase so it can't be responsible for any blocking problems. Most domain blocking problems are reported at Talk:Spam_blacklist
 * --jwalling 01:54, 1 February 2006 (UTC)

Bayeisan Filtering
A poster above mentioned Bayesian filtering... Mozilla Thunderbird has excellent bayesian spam filtering that 'learns' the difference between real email and spam based on what is already in your inbox. I see no technical reason why a similar algorithm could not be employed to at least challenge spam-like posts with a captchka. There are countless known good posts to train from, and I'm sure, countless known bad ones too.

Great project discussion
I know many of these ideas have yet to be implemented, but this is a huge problem, and MediaWiki should develop the best solutions around to deal with it. 68.163.251.199 22:31, 5 April 2006 (UTC)

Overflow auto blocked
Got a simple problem:   is being caught by the spam filter (replace auhto with auto). Apparently the spam filter catches "overflow auto height". Switching places (height at the beginning followed by overflow) is safe. Just wanted to let you all know. -- ReyBrujo 02:29, 19 February 2007 (UTC)
 * Yes, this is bug #8829. --.anaconda 02:33, 19 February 2007 (UTC)