Extension talk:StopForumSpam

setup
It's not stated, but is setting $wgSFSIPListLocation actually required, or can it simply rely on the API directly? --92.239.152.76 09:45, 1 November 2017 (UTC)

Global blocks
Very interesting! How does this extension interact with Extension:GlobalBlocking, can data be submitted also upon global blocks? Can one use the extension in "write only" mode, i.e. only to submit data but not to use it? If yes it could be used by stewards/global sysops on Meta-Wiki to contribute to the list even though we initially not use it on Wikimedia projects. --Nemo 09:42, 16 December 2013 (UTC)
 * The main issue is that stopforumspam requires any data to be submitted to have an email, username, and IP address (see bottom of ). So it won't work with GlobalBlocking since there would be no username or email address. Write only mode would be possible, it just requires a config option to disable AbuseFilter integration. Legoktm (talk) 17:34, 16 December 2013 (UTC)
 * Thanks. Hm, I don't get it. Can't GlobalBlocking use some generic email/username/key configured with a global? On Meta, the email could be some service address or the stewards' OTRS queue. --Nemo 17:54, 16 December 2013 (UTC)
 * No, we don't want people to think "stewards@wikimedia.org" is a spam email. I also found, which clearly states that you need a username+email+IP. Legoktm (talk) 21:50, 16 December 2013 (UTC)
 * What do you mean "a spam email"? That forum post clarifies that the address is just one where it's possible to get answers from; settings this to some shared email for the wiki admins/whatever looks to me like something many wikis would like to do. Anyway, it's not so important. --Nemo 08:11, 17 December 2013 (UTC)
 * The (username, IP address, email address) tuple is stored in their spam database as a "spammer" (e.g. there was a spammer that registered as username using IP address and typed in and confirmed they owned email address -- forums typically require you to specify and confirm an email address before allowing registration). Submitting an email address for the wiki as that datapoint has a threefold effect (one positive, two negative). I think we just need to determine if the positive outweighs both negatives, and my hunch says that it does:
 * By submitting any email at all, the data point actually gets added to their spam database, making the submission mode much more reliable on MediaWiki sites. (If you couldn't tell, this one is the positive).
 * That email address will be marked as spam for anyone querying StopForumSpam for data. For this reason, you should use some throwaway email address instead of an actual one you send official emails from or register for accounts with.
 * The admins over at StopForumSpam might not understand the wiki culture about not requiring emails, and see this "fake" data and revoke API keys or wildcard block the entire domain due to numerous submissions under the same email address. This could likely be solved with open communication.
 * -- Skiz zerz  15:55, 17 December 2013 (UTC)

Performance
The StopForumSpam blacklist is very long, there are reports of wikis severely slowed down (to the point of becoming unusable) when enabling such huge blacklists, because bots produce a gazillion hits and the cost for each of them was too high. It would be interesting to have some guesstimate on the performance effects/CPU costs of this extension. --Nemo 09:42, 16 December 2013 (UTC)
 * The IP blacklist is loaded into shared memory (e.g. memcached, redis, APC, etc.) and checked against, so the cost of checking a single bot is the cost of fetching an item out from that cache (the IP blacklist will not function if $wgMainCacheType is set to CACHE_NONE). Updating that cache is another story entirely, as it can be quite slow, which is why we made a maintenance script to do it. By default, whenever it detects the cache is out of date, it will load a DeferredUpdate to run at the end of the current request, which in my testing on commodity hardware takes around 5-10 seconds (should be less on faster hardware, and legoktm reported that it took around 50 seconds on his test VM). You can disable this behavior however and only update the blacklist via the updateBlacklist.php maintenance script -- it won't run any faster but at least it offloads that work to not the web request -- running that script via cron is the recommended setup. Another performance impact I can forsee is when a the 'sfs-confidence' variable is used in an AbuseFilter, as this will cause an API hit to the StopForumSpam website. However, the result of that is also cached for a period of one day, so the same bot doing multiple hits on pages with that AbuseFilter variable should only trigger some slowness/extra strain once. -- Skiz zerz  17:16, 16 December 2013 (UTC)
 * So I guess there might still be some use for Extension:BlockOpenProxies, for those wikis that are on shared hosts that don't have caching. Leucosticte (talk) 10:47, 19 March 2014 (UTC)
 * Shared hosts would still be able to use CACHE_DB though, in the worst case scenario. Legoktm (talk) 17:26, 19 March 2014 (UTC)

Anonymous spammers
This extension doesn't report IPs of anonymous spammers, does it? Alex Mashin (talk) 16:44, 14 September 2015 (UTC)
 * No, it needs to be a registered user and has a email/username set. Anzacdoer 2 (talk) 15:48, 14 August 2017 (UTC)

Current relevance
A few things have changed since the StopForumExtension extension was first released in 2013 which may change its effectiveness in combating spam:
 * VPS servers from reputable providers are now significantly cheaper - Linode offers a 1GB RAM server for $5 a month and Vultr offers a 512MB RAM server for $2.50 a month as of August 2017. You can easily obtain a new IPv4 address by spinning up a new instance and destroying the old one afterwards. Since the charge is by hour, it's trivial in terms of cost to spin up a VPS for a few hours for spamming and then deleting the instance before IP blacklists have a chance to catch on.
 * Internet of Things (IoT) botnets are a real threat now (e.g. Mirai, Linux/IRCTelnet, and Bashlight) due to poorly secured consumer goods. The threat is only going to increase as more and more people buy Internet connected refrigerators, toasters, home security cameras, etc. Since these devices are attacking from an innocent device owner's network, the IP is going to be a regular consumer IP not previously known for spamming. Given how the same IPs are being used by legitimate users to register on other websites, the chance of false positives is likely to increase. The dynamic nature of many residential IPs is also probably going to add to the problem of hitting innocent users and having too many IP addresses to blacklist.
 * IPv6 is gaining wider acceptance and block assignments aren't standardized. Hosting companies typically assign a /64 per server or per account, but DigitalOcean assigns a smaller block so a /64 block would hit innocent users too. There are also simply too many individual IPv6 addresses to reasonably blacklist.

Note this isn't really about the extension itself, but rather the service the extension uses (stopforumspam.com) - an IP blacklist for combating spam appears to be increasingly ill-suited for the future due to the explosion of Internet connected devices and IP addresses out there.

Perhaps a more reasonable approach would be a domain blacklist? Domains (usually) cost money to register and we can easily identify if it's a free domain (.tk and other Freenom domains) or a newly registered domain. URL forwarders and free subdomains (e.g. *.wordpress.com) and also be blocked preemptively to prevent bypassing the blacklist, and cheap/commonly abused gTLDs can be subject to stricter requirements (e.g. XYZ, PARTY, SCIENCE). Anzacdoer 2 (talk) 16:12, 14 August 2017 (UTC)
 * Have you been encountering spammers slipping through the StopForumSpam extension blocklist? Legoktm (talk) 03:37, 15 August 2017 (UTC)
 * A few, and the number seems to be slowly ticking up unfortunately. It got my attention because SFS has typically been one of the more effective ways of stopping spam, but I noticed the number of spammers not on the blacklist has been steadily increasing for about a year and a half or so now. I ended up restricting new users from being able to submit external links, which has helped dramatically but it's quite restricting. Do you think a MediaWiki specific IP blacklist would be more effective, or is there a huge overlap between forum spam/comment and MediaWiki spam? Anzacdoer 2 (talk) 16:04, 15 August 2017 (UTC)
 * Hmm, that's a bummer. SFS does provide IPv6 blocklists, but we'd have to update the code to support that (filed as T173399). I've thought about having a central MediaWiki server that people can send spam IPs to and then we distribute blocklists (aka the MW version of SFS), but I'm not sure we'd ever be able to reach the same level of reporting. Another idea I've had is to use the Wikimedia global block list as a source for other MW instances to use. That would probably be easier from an implementation point of view. Legoktm (talk) 06:39, 16 August 2017 (UTC)

php extensions/StopForumSpam/updateBlacklist.php does not work.
I downloaded ip blacklist as written on wiki, the beggining looks like:

1.0.132.45

1.0.132.71

1.0.134.19

1.0.135.30

...

almost 470k lines

I put it on my server, call "php extensions/StopForumSpam/updateBlacklist.php" and it hangs forever. I started to investigate it, added to code a little logging and see that this variable $ip reads only "." symbol. And all code works in the line "continue; // discard invalid lines". Where is my fault? I see fgetcsv which reads csv and my blacklist from http://www.stopforumspam.com/ is not csv. So what to do? Convert it to csv? Or did I download wrong file? Нирваньчик (talk) 20:25, 9 January 2018 (UTC)
 * The problem was that I set $wgSFSIPListLocation to wrong path. The script hangs forever in this case. Нирваньчик (talk) 20:22, 23 January 2018 (UTC)

Extension status
Is there any way to get some information about this extension working status: how many ips in blacklist, is blacklist updated from file or needs updating (file changed recently), how many actions (edits/registrations) prevented using blacklist? I installed it and I don't understand if it's working or not at all. I already know about Special:Version but it just shows that extension is running, but it's not enough to know that it is running, we want more. Нирваньчик (talk) 20:41, 9 January 2018 (UTC)
 * +1 68.179.178.199 18:08, 25 February 2020 (UTC)
 * I've added a Logging section to the extension page which tells you how to configure a log file which records this information. There are currently no plans to make this logging data available on-wiki. However, you are welcome to open a feature request on phabricator to add an on-wiki interface for viewing general stats or specific logging details. -- Skiz zerz  20:38, 25 February 2020 (UTC)

more verbosity when updateBlacklist.php
It would be nice if updateBlacklist.php have a --verbose flag, with which it would show how many lines have read (every 5 seconds), and time estimate (how much time left). ip length is almost constant and you can read file size fast, so it's easy to make estimation how many lines the file counts. Нирваньчик (talk) 20:47, 9 January 2018 (UTC)
 * I don't need this feature anymore since script runs in 10 seconds. Нирваньчик (talk) 20:23, 23 January 2018 (UTC)

maintenance/updateBlacklist.php renamed to maintenance/updateDenyList.php
Why was updateBlacklist.php renamed to updateDenyList.php? This should at least be documented here on the extension page.

I have cron jobs that run daily to download fresh lists and update them, my task broke after upgrading mediawiki from 1_35 to 1_36, until I figured out that the script had been renamed...

wgSFSIPThreshold relevant?
I posted this on german discussion page, but it makes more sense here. To do no double post here's the link: https://www.mediawiki.org/wiki/Topic:Wkbitfkgozm6olxk