Extension talk:StopForumSpam

From mediawiki.org
Latest comment: 2 months ago by Seb35 in topic Performance

setup[edit]

It's not stated, but is setting $wgSFSIPListLocation actually required, or can it simply rely on the API directly? --92.239.152.76 09:45, 1 November 2017 (UTC)Reply

Global blocks[edit]

Very interesting! How does this extension interact with Extension:GlobalBlocking, can data be submitted also upon global blocks? Can one use the extension in "write only" mode, i.e. only to submit data but not to use it? If yes it could be used by stewards/global sysops on Meta-Wiki to contribute to the list even though we initially not use it on Wikimedia projects. --Nemo 09:42, 16 December 2013 (UTC)Reply

The main issue is that stopforumspam requires any data to be submitted to have an email, username, and IP address (see bottom of [1]). So it won't work with GlobalBlocking since there would be no username or email address. Write only mode would be possible, it just requires a config option to disable AbuseFilter integration. Legoktm (talk) 17:34, 16 December 2013 (UTC)Reply
Thanks. Hm, I don't get it. Can't GlobalBlocking use some generic email/username/key configured with a global? On Meta, the email could be some service address or the stewards' OTRS queue. --Nemo 17:54, 16 December 2013 (UTC)Reply
No, we don't want people to think "stewards@wikimedia.org" is a spam email. I also found [2], which clearly states that you need a username+email+IP. Legoktm (talk) 21:50, 16 December 2013 (UTC)Reply
What do you mean "a spam email"? That forum post clarifies that the address is just one where it's possible to get answers from; settings this to some shared email for the wiki admins/whatever looks to me like something many wikis would like to do. Anyway, it's not so important. --Nemo 08:11, 17 December 2013 (UTC)Reply
The (username, IP address, email address) tuple is stored in their spam database as a "spammer" (e.g. there was a spammer that registered as username using IP address and typed in and confirmed they owned email address -- forums typically require you to specify and confirm an email address before allowing registration). Submitting an email address for the wiki as that datapoint has a threefold effect (one positive, two negative). I think we just need to determine if the positive outweighs both negatives, and my hunch says that it does:
  • By submitting any email at all, the data point actually gets added to their spam database, making the submission mode much more reliable on MediaWiki sites. (If you couldn't tell, this one is the positive).
  • That email address will be marked as spam for anyone querying StopForumSpam for data. For this reason, you should use some throwaway email address instead of an actual one you send official emails from or register for accounts with.
  • The admins over at StopForumSpam might not understand the wiki culture about not requiring emails, and see this "fake" data and revoke API keys or wildcard block the entire domain due to numerous submissions under the same email address. This could likely be solved with open communication.
--Skizzerz 15:55, 17 December 2013 (UTC)Reply

Performance[edit]

The StopForumSpam blacklist is very long, there are reports of wikis severely slowed down (to the point of becoming unusable) when enabling such huge blacklists, because bots produce a gazillion hits and the cost for each of them was too high. It would be interesting to have some guesstimate on the performance effects/CPU costs of this extension. --Nemo 09:42, 16 December 2013 (UTC)Reply

The IP blacklist is loaded into shared memory (e.g. memcached, redis, APC, etc.) and checked against, so the cost of checking a single bot is the cost of fetching an item out from that cache (the IP blacklist will not function if $wgMainCacheType is set to CACHE_NONE). Updating that cache is another story entirely, as it can be quite slow, which is why we made a maintenance script to do it. By default, whenever it detects the cache is out of date, it will load a DeferredUpdate to run at the end of the current request, which in my testing on commodity hardware takes around 5-10 seconds (should be less on faster hardware, and legoktm reported that it took around 50 seconds on his test VM). You can disable this behavior however and only update the blacklist via the updateBlacklist.php maintenance script -- it won't run any faster but at least it offloads that work to not the web request -- running that script via cron is the recommended setup. Another performance impact I can forsee is when a the 'sfs-confidence' variable is used in an AbuseFilter, as this will cause an API hit to the StopForumSpam website. However, the result of that is also cached for a period of one day, so the same bot doing multiple hits on pages with that AbuseFilter variable should only trigger some slowness/extra strain once. --Skizzerz 17:16, 16 December 2013 (UTC)Reply
So I guess there might still be some use for Extension:BlockOpenProxies, for those wikis that are on shared hosts that don't have caching. Leucosticte (talk) 10:47, 19 March 2014 (UTC)Reply
Shared hosts would still be able to use CACHE_DB though, in the worst case scenario. Legoktm (talk) 17:26, 19 March 2014 (UTC)Reply
In the current version (February 2024, Git#21501f2) (which uses a wanCache and a srvCache), this extension crashed our server after introducing a list with 560k IPs. The issue is not really the size of the cache (8,4 MiB), but the size of the objet Wikimedia\IPSet (1,15 GiB). Currently I consider this extension cannot be used with more than ≈ 50k IPs (and it costs request-time and memory).
I guess a performance improvement would be to use some on-disk data (without loading in memory the full list, it might be also multiple lists stored in cache) or a unique copy in srvCache (I’m not sure if it is possible) or an external service with a client-server mode. ~ Seb35 [^_^] 12:43, 6 February 2024 (UTC)Reply

Anonymous spammers[edit]

This extension doesn't report IPs of anonymous spammers, does it?
Alex Mashin (talk) 16:44, 14 September 2015 (UTC)Reply

No, it needs to be a registered user and has a email/username set. Anzacdoer 2 (talk) 15:48, 14 August 2017 (UTC)Reply

Current relevance[edit]

A few things have changed since the StopForumExtension extension was first released in 2013 which may change its effectiveness in combating spam:

  • VPS servers from reputable providers are now significantly cheaper - Linode offers a 1GB RAM server for $5 a month and Vultr offers a 512MB RAM server for $2.50 a month as of August 2017. You can easily obtain a new IPv4 address by spinning up a new instance and destroying the old one afterwards. Since the charge is by hour, it's trivial in terms of cost to spin up a VPS for a few hours for spamming and then deleting the instance before IP blacklists have a chance to catch on.
  • Internet of Things (IoT) botnets are a real threat now (e.g. Mirai, Linux/IRCTelnet, and Bashlight) due to poorly secured consumer goods. The threat is only going to increase as more and more people buy Internet connected refrigerators, toasters, home security cameras, etc. Since these devices are attacking from an innocent device owner's network, the IP is going to be a regular consumer IP not previously known for spamming. Given how the same IPs are being used by legitimate users to register on other websites, the chance of false positives is likely to increase. The dynamic nature of many residential IPs is also probably going to add to the problem of hitting innocent users and having too many IP addresses to blacklist.
  • IPv6 is gaining wider acceptance and block assignments aren't standardized. Hosting companies typically assign a /64 per server or per account, but DigitalOcean assigns a smaller block so a /64 block would hit innocent users too. There are also simply too many individual IPv6 addresses to reasonably blacklist.

Note this isn't really about the extension itself, but rather the service the extension uses (stopforumspam.com) - an IP blacklist for combating spam appears to be increasingly ill-suited for the future due to the explosion of Internet connected devices and IP addresses out there.

Perhaps a more reasonable approach would be a domain blacklist? Domains (usually) cost money to register and we can easily identify if it's a free domain (.tk and other Freenom domains) or a newly registered domain. URL forwarders and free subdomains (e.g. *.wordpress.com) and also be blocked preemptively to prevent bypassing the blacklist, and cheap/commonly abused gTLDs can be subject to stricter requirements (e.g. XYZ, PARTY, SCIENCE). Anzacdoer 2 (talk) 16:12, 14 August 2017 (UTC)Reply

Have you been encountering spammers slipping through the StopForumSpam extension blocklist? Legoktm (talk) 03:37, 15 August 2017 (UTC)Reply
A few, and the number seems to be slowly ticking up unfortunately. It got my attention because SFS has typically been one of the more effective ways of stopping spam, but I noticed the number of spammers not on the blacklist has been steadily increasing for about a year and a half or so now. I ended up restricting new users from being able to submit external links, which has helped dramatically but it's quite restricting. Do you think a MediaWiki specific IP blacklist would be more effective, or is there a huge overlap between forum spam/comment and MediaWiki spam? Anzacdoer 2 (talk) 16:04, 15 August 2017 (UTC)Reply
Hmm, that's a bummer. SFS does provide IPv6 blocklists, but we'd have to update the code to support that (filed as phab:T173399). I've thought about having a central MediaWiki server that people can send spam IPs to and then we distribute blocklists (aka the MW version of SFS), but I'm not sure we'd ever be able to reach the same level of reporting. Another idea I've had is to use the Wikimedia global block list as a source for other MW instances to use. That would probably be easier from an implementation point of view. Legoktm (talk) 06:39, 16 August 2017 (UTC)Reply

php extensions/StopForumSpam/updateBlacklist.php does not work.[edit]

I downloaded ip blacklist as written on wiki, the beggining looks like:

1.0.132.45

1.0.132.71

1.0.134.19

1.0.135.30

...

almost 470k lines

I put it on my server, call "php extensions/StopForumSpam/updateBlacklist.php" and it hangs forever. I started to investigate it, added to code a little logging and see that this variable $ip reads only "." symbol. And all code works in the line "continue; // discard invalid lines". Where is my fault? I see fgetcsv() which reads csv and my blacklist from http://www.stopforumspam.com/ is not csv. So what to do? Convert it to csv? Or did I download wrong file? Нирваньчик (talk) 20:25, 9 January 2018 (UTC)Reply

The problem was that I set $wgSFSIPListLocation to wrong path. The script hangs forever in this case. Нирваньчик (talk) 20:22, 23 January 2018 (UTC)Reply

Extension status[edit]

Is there any way to get some information about this extension working status: how many ips in blacklist, is blacklist updated from file or needs updating (file changed recently), how many actions (edits/registrations) prevented using blacklist? I installed it and I don't understand if it's working or not at all. I already know about Special:Version but it just shows that extension is running, but it's not enough to know that it is running, we want more. Нирваньчик (talk) 20:41, 9 January 2018 (UTC)Reply

+1 68.179.178.199 18:08, 25 February 2020 (UTC)Reply
I've added a Logging section to the extension page which tells you how to configure a log file which records this information. There are currently no plans to make this logging data available on-wiki. However, you are welcome to open a feature request on phabricator to add an on-wiki interface for viewing general stats or specific logging details. --Skizzerz 20:38, 25 February 2020 (UTC)Reply

more verbosity when updateBlacklist.php[edit]

It would be nice if updateBlacklist.php have a --verbose flag, with which it would show how many lines have read (every 5 seconds), and time estimate (how much time left). ip length is almost constant and you can read file size fast, so it's easy to make estimation how many lines the file counts. Нирваньчик (talk) 20:47, 9 January 2018 (UTC)Reply

I don't need this feature anymore since script runs in 10 seconds. Нирваньчик (talk) 20:23, 23 January 2018 (UTC)Reply

maintenance/updateBlacklist.php renamed to maintenance/updateDenyList.php[edit]

Why was updateBlacklist.php renamed to updateDenyList.php? This should at least be documented here on the extension page.

I have cron jobs that run daily to download fresh lists and update them, my task broke after upgrading mediawiki from 1_35 to 1_36, until I figured out that the script had been renamed... — Preceding unsigned comment added by Lwangaman (talkcontribs) 09:53, 3 October 2021 (UTC)Reply

wgSFSIPThreshold relevant?[edit]

I posted this on german discussion page, but it makes more sense here. To do no double post here's the link: https://www.mediawiki.org/wiki/Topic:Wkbitfkgozm6olxk — Preceding unsigned comment added by MrThorstenM (talkcontribs) 13:49, 14 November 2021 (UTC)Reply


link to a troubleshooting resource?[edit]

If this is wrong I apologize in advance. I have installed this on my wiki by the directions on the page and it causes

Internal error [Y9yBEILL3dLYYYCw9XkQZQAAwxg] 2023-02-03 03:35:44: Fatal exception of type "Error"

On every page (the [error number] appears random)

is there something (other than setting it in localsettings.php) that needs to be done for $wgMainCacheType = CACHE_DB; to work? searching for this problem dosen't give any helpful results — Preceding unsigned comment added by Devon9342 (talkcontribs) 04:32, 3 February 2023 (UTC)Reply

Me too, 165.0.131.172 12:15, 18 April 2023 (UTC)Reply
Same issue, when running my PHP update script I also get
Error: Interface 'MediaWiki\Extension\AbuseFilter\Hooks\AbuseFilterBuilderHook' not found
 Backtrace:
 from /var/www/mediawiki/extensions/StopForumSpam/includes/Hooks.php(40)
LittleWhole (talk) 23:05, 11 June 2023 (UTC)Reply

gzdecode(): data error in DenyListManager.php[edit]

Anyone else getting this issue? I tried binding $wgSFSIPListLocation to https://huckjones.strawberryforum.org/w/listed_ip_30_all.txt but I am somehow getting this message. Blakegripling ph (talk) 02:31, 13 May 2023 (UTC)Reply

Combined ipv4 and ipv6 lists[edit]

Is it possible to combine the listed_ip_30.zip and listed_ip_30_ipv6.zip lists? ~~ Forza ~~ (talk) 18:11, 29 October 2023 (UTC)Reply