Extension talk:StopForumSpam

Global blocks
Very interesting! How does this extension interact with Extension:GlobalBlocking, can data be submitted also upon global blocks? Can one use the extension in "write only" mode, i.e. only to submit data but not to use it? If yes it could be used by stewards/global sysops on Meta-Wiki to contribute to the list even though we initially not use it on Wikimedia projects. --Nemo 09:42, 16 December 2013 (UTC)
 * The main issue is that stopforumspam requires any data to be submitted to have an email, username, and IP address (see bottom of ). So it won't work with GlobalBlocking since there would be no username or email address. Write only mode would be possible, it just requires a config option to disable AbuseFilter integration. Legoktm (talk) 17:34, 16 December 2013 (UTC)
 * Thanks. Hm, I don't get it. Can't GlobalBlocking use some generic email/username/key configured with a global? On Meta, the email could be some service address or the stewards' OTRS queue. --Nemo 17:54, 16 December 2013 (UTC)
 * No, we don't want people to think "stewards@wikimedia.org" is a spam email. I also found, which clearly states that you need a username+email+IP. Legoktm (talk) 21:50, 16 December 2013 (UTC)
 * What do you mean "a spam email"? That forum post clarifies that the address is just one where it's possible to get answers from; settings this to some shared email for the wiki admins/whatever looks to me like something many wikis would like to do. Anyway, it's not so important. --Nemo 08:11, 17 December 2013 (UTC)
 * The (username, IP address, email address) tuple is stored in their spam database as a "spammer" (e.g. there was a spammer that registered as username using IP address and typed in and confirmed they owned email address -- forums typically require you to specify and confirm an email address before allowing registration). Submitting an email address for the wiki as that datapoint has a threefold effect (one positive, two negative). I think we just need to determine if the positive outweighs both negatives, and my hunch says that it does:
 * By submitting any email at all, the data point actually gets added to their spam database, making the submission mode much more reliable on MediaWiki sites. (If you couldn't tell, this one is the positive).
 * That email address will be marked as spam for anyone querying StopForumSpam for data. For this reason, you should use some throwaway email address instead of an actual one you send official emails from or register for accounts with.
 * The admins over at StopForumSpam might not understand the wiki culture about not requiring emails, and see this "fake" data and revoke API keys or wildcard block the entire domain due to numerous submissions under the same email address. This could likely be solved with open communication.
 * -- Skiz zerz  15:55, 17 December 2013 (UTC)

Performance
The StopForumSpam blacklist is very long, there are reports of wikis severely slowed down (to the point of becoming unusable) when enabling such huge blacklists, because bots produce a gazillion hits and the cost for each of them was too high. It would be interesting to have some guesstimate on the performance effects/CPU costs of this extension. --Nemo 09:42, 16 December 2013 (UTC)
 * The IP blacklist is loaded into shared memory (e.g. memcached, redis, APC, etc.) and checked against, so the cost of checking a single bot is the cost of fetching an item out from that cache (the IP blacklist will not function if $wgMainCacheType is set to CACHE_NONE). Updating that cache is another story entirely, as it can be quite slow, which is why we made a maintenance script to do it. By default, whenever it detects the cache is out of date, it will load a DeferredUpdate to run at the end of the current request, which in my testing on commodity hardware takes around 5-10 seconds (should be less on faster hardware, and legoktm reported that it took around 50 seconds on his test VM). You can disable this behavior however and only update the blacklist via the updateBlacklist.php maintenance script -- it won't run any faster but at least it offloads that work to not the web request -- running that script via cron is the recommended setup. Another performance impact I can forsee is when a the 'sfs-confidence' variable is used in an AbuseFilter, as this will cause an API hit to the StopForumSpam website. However, the result of that is also cached for a period of one day, so the same bot doing multiple hits on pages with that AbuseFilter variable should only trigger some slowness/extra strain once. -- Skiz zerz  17:16, 16 December 2013 (UTC)