Extension talk:SmiteSpam

Feedback and suggestions are welcome! - Polybuildr (talk)

Possible to do automatic scheduled smiting?
I've been using this regularly for a few months now—after the initial scan, I never get false positives and I'd like to be bold and daring and just have the extension run on a daily basis and smite everything. Other than setting up something with some sort of browser automation, is there any way to make this happen? --DAH

I love it, though some suggestions ...
A great one, I am currently smiting heaps of stuff. --&#91;&#91;kgh&#93;&#93; (talk) 21:28, 20 October 2015 (UTC)
 * 1) It will be nice if the "SmiteSpam" special page will be accessible via the "AdminLinks" special page provided by the Admin Links extension.
 * 2) Currently it is only possible to do a select all for all pages shown on the "SmiteSpam" special. It will be nice if this is possible to do it on a user level, e.g. select all edits by user "Dagobyte Dominator"


 * Ah, I see, usually there is only one spam page per spammer, so 2 is not as useful as it looked at the beginning. --&#91;&#91;kgh&#93;&#93; (talk) 21:31, 20 October 2015 (UTC)


 * I'm very glad you found the extension useful, &#91;&#91;kgh&#93;&#93;. :D Thanks a lot for your feedback. I'll make it show up in the Admin Links soon. The second feature is also reasonably easy to implement. If it'll help I'll add it in. -- Polybuildr (talk) 11:40, 21 October 2015 (UTC)


 * Thanks for your reply. After sleeping over it I do not think it is necessary at all to add the second feature. Admittedly it took me a bit to gain experience with it's behaviour when using it. I should add a more detailed description soon to the extension's page. --&#91;&#91;kgh&#93;&#93; (talk) 11:46, 21 October 2015 (UTC)


 * ✅ Links should now show up in AdminLinks interface. Tracked it as and resolved in 8eb8d80e02b. Hope it works, &#91;&#91;kgh&#93;&#93;! -- Polybuildr (talk) 18:42, 23 October 2015 (UTC)


 * Great, just installed the new version and it works perfect! Perhaps you could start versioning and tagging you extension to keep track of small or big changes to your extension. --&#91;&#91;kgh&#93;&#93; (talk) 21:35, 24 October 2015 (UTC)

Another suggestion. Perhaps it is nice to have the "Smite Spam" button also show up at the bottom of the special page. After one has worked her or his way through the suggested pages you have to return to the top of the page to get things going. That's a lot of scrolling and I already have reduced the respective stetting to 100 to avoid gigantic pages. Cheers --&#91;&#91;kgh&#93;&#93; (talk) 21:38, 24 October 2015 (UTC)


 * Yes, that certainly makes sense. Willdo soon. :) -- Polybuildr (talk) 15:36, 26 October 2015 (UTC)

Re: "Currently it is only possible to do a select all for all pages shown on the "SmiteSpam" special. It will be nice if this is possible to do it on a user level, e.g. select all edits by user 'Dagobyte Dominator'"

There is a checkbox like that currently in the extension, maybe. If it's present, it will be wedged between the user's name and the "Trust΅ button - directly above the list of that user's posts. Unfortunately, that checkbox completely goes away if the user has been banned, leaving just a (user banned) comment. 204.237.89.128 02:11, 27 April 2023 (UTC)

recentchanges table
Does this extension rely on RC, or revision? A user on #mediawiki some weeks ago said the extension didn't find/report/suggest for deletion any of their (rather stale) spam pages. --Nemo 14:52, 16 September 2016 (UTC)


 * I'm not sure why that would happen. The extension selects pages by looking at the  table and  retrieves the current content by calling , which should be the content of the latest revision. The pages are ordered by page_id though, and pages created by "trusted users" will not show up in the list either so maybe that has something to do with the problem. Of course, it could just be that the extension fails to flag those pages as spam. Is there a way we can get further details from the user? - Polybuildr (talk) 20:21, 16 September 2016 (UTC)


 * The extension starts scanning by page_id 1 and going upwards. This is infeasible on large wikis with lots of pages. I think it should be checking newest pages first. Also the default setting for wgDisplayPageSize should be smaller to show results quicker if you don't have a lot of spam. Nikerabbit (talk) 06:37, 31 August 2018 (UTC)


 * Good point, don't know why I didn't consider this at the time. The first will just require a small change to https://github.com/wikimedia/mediawiki-extensions-SmiteSpam/blob/a48115b48d29/includes/SmiteSpamAnalyzer.php#L60 in case someone else plans to do it, not sure when I'll be able to do it myself. Polybuildr (talk) 11:16, 31 August 2018 (UTC)


 * https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SmiteSpam/+/456601 Nikerabbit (talk) 11:36, 31 August 2018 (UTC)


 * Thank you! Polybuildr (talk) 12:10, 31 August 2018 (UTC)

Why only 250 at a time?
I have 35k spams on my wiki. I'd like to just add my trusted users and smite everything else on one click. Clicking through 250 at a time is a pain. Wwwhatsup (talk) 17:00, 26 September 2018 (UTC)


 * Hmm, yes, 250 is too low for your case. You can configure that number using the configuration variables. More details at https://www.mediawiki.org/wiki/Extension:SmiteSpam#Parameters Polybuildr (talk) 20:18, 26 September 2018 (UTC)


 * Aha! A RTFM situation. Just upped it to 5k :) Non MR users might appreciate the option right there on the page to reload with an increased page count. Wwwhatsup (talk) 01:56, 29 September 2018 (UTC)


 * I´d be cautious. If an ongoing spambot attack has left tens of thousands of "bad" pages on your wiki, doing this 250 at a time (the default) isn't practical. Unfortunately, if you set this to 1000 or 5000 or something reasonable, the script will be silently killed by PHP as soon as it reaches the execution time limit specified in php.ini - with the icon still spinning aimlessly in the browser and no indication to the user that the process has died. I'd like to see some way for the user to specify how many pages to return, without having to edit files like extension.json (in the individual extension's folder) or php.ini (which affects the entire server). I don't think there is one yet. 204.237.89.128 01:16, 27 April 2023 (UTC)

Smite spam page not working
I can't get this extension to work. When I go to Special:SmiteSpam, under "Possible spam pages" it only shows a loading animation that never goes away. No pages are ever listed.

Could the size of my wiki be the issue? I just left the default configuration of the extension, so I believe only 500 pages are examined each time.

Edit: I just uploaded the newest version but Special:Version shows the installed version as 0.1 (should be 0.3). Also, when I activate the extension using  my entire Wiki goes blank. If I use  it seems to work (until I go to Special:SmiteSpam). --Lost Student (talk) 16:04, 25 July 2020 (UTC)


 * Try tweaking the configuration variables. Also enable error reporting so that you can actually see what is causing the blank pages. Nikerabbit (talk) 14:22, 30 July 2020 (UTC)
 * Yes, and also - you should download the latest version of SmiteSpam. Don't use ExtensionDistributor (which I'm guessing is what you used) to download this extension. Yaron Koren (talk) 16:10, 30 July 2020 (UTC)
 * Blame me for trying to use standard templates and thus inadvertently suggesting using ExtensionDistributor. I've reverted the problematic parts of that edit. * Pppery * it has begun 02:51, 2 August 2020 (UTC)

Smite spam page not working
It keeps on loading but not detecting spam pages it worked fine before but now it is not working I'm using Mediaiwki 1.34.0 and the latest version Smite Spam 0.3 Ramu ummadishetty (talk) 08:11, 13 November 2020 (UTC)


 * Can you look in the console for your browser (if you know how to do that) and see if there are any JavaScript errors on that page? Yaron Koren (talk) 16:32, 13 November 2020 (UTC)
 * Thank you @Yaron Koren for responding I dont see any JavaScript errors on that page. Ramu ummadishetty (talk) 19:32, 13 November 2020 (UTC)

Infinite loading
On no.everybodywiki.com/Special:SmiteSpam

the spinner load indefinitely

Version of the extension 0.3 (6a601df)

MW version 1.37.3

No related JS errors seen in the console.

Thx WikiMaster (EverybodyWiki) (talk) 09:32, 12 July 2022 (UTC)


 * Sorry about that! It turns out that SmiteSpam was not working with MediaWiki versions 1.37 or higher. I just checked in what I think is a fix for this - if you get the latest code for this extension, it should hopefully work. Yaron Koren (talk) 15:27, 12 July 2022 (UTC)
 * Thanks a lot ! could you please cherry pick your edit for branches 1_37 and 1_38 ? Seb35 has already initilised the cherrypick. WikiMaster (EverybodyWiki) (talk) 15:12, 21 July 2022 (UTC)
 * I approved the patches, so I think it's done now. (I'm personally against the use of "REL" branches for non-WMF extensions, but some people like them.) Yaron Koren (talk) 16:47, 21 July 2022 (UTC)

It looks like this is still happening. On MediaWiki 1.35 I sometimes see a condition where going to Special:SmiteSpam just leaves the icon spinning forever. In Apache's error.log I see this:
 * [Thu Apr 27 00:25:35.953131 2023] [php7:error] [pid 9574] [client 204.237.89.128:40424] PHP Fatal error: Maximum execution time of 30 seconds exceeded in /var/www/wiki/includes/libs/rdbms/database/Database.php on line 2814, referer: wiki.example.org/wiki/Special:SmiteSpam

Unfortunately, it does nothing to inform the user what has happened. The icon just keeps spinning, as if it were trying to fool the user into thinking something is still happening when the task has already died. Changing the maximum number of pages (configured in the ./extensions/SmiteSpam/extension.json file might help, but there's no way from the user's browser to detect that the error has occurred or to adjust the number of pages to compensate. It would be preferable if, as soon as the time taken is nearing the php.ini execution limit, to just return what you already have and exit normally instead of having the script silently killed by PHP.

Apparently there is a way to register a shutdown function in PHP at www.php.net/manual/en/function.register-shutdown-function.php which could be used to notify the user that they are requesting too many pages at once. Unfortunately, the question of how many pages is "too many" tends to vary based on the contents of the wiki and the speed of the processor. It might be 500, it might be 5000, but the limit seems to become more of a problem later - once much of the spam has already been nuked, the script needs more time to search to find however many pieces of spam the configuration file has requested be returned.

There's some discussion at stackoverflow.com/questions/6861033/how-to-catch-the-fatal-error-maximum-execution-time-of-30-seconds-exceeded-in-p which suggests retrieving the maximum execution time from PHP, reading the time at the beginning of script execution, then clock-watching periodically so that the script can bail out gracefully with whatever results it has as soon as the time looks to be getting close to the system-configured limit.

I'd like to be able to have some means of specifying parameters when invoking this from the browser, maybe example.wiki/index.php/Special:SmiteSpam?pages=1000&bot=1 because it's a bit unrealistic to require changing an extension.json item on the server every time this error turns up in Apache's log. 204.237.89.128 01:06, 27 April 2023 (UTC)

Trust User button also deletes selected pages/bans selected users
I haven't tested this beyond the initial experience (so there's a chance something else is going on), but I chose select all to start and was planning on deselecting non-spam entries. When I clicked the "Trust User" button for a user, after trusting that user, it started deleting all the selected pages and banning the selected users, several of which were not spam. Easy enough to work around by just trusting users before selecting pages, but would be nice if it didn't do that. Trekopep (talk) 16:46, 2 September 2022 (UTC)


 * I've seen this too. I usually abruptly force the browser to reload the page or close the tab in order to interrupt the run, then I have to go to Special:RecentChanges, find what it's banned or deleted (and it usually bans the users first) and manually fix the mess by unblocking the individual false-positive users. Not nice. 204.237.89.128 01:11, 27 April 2023 (UTC)

Add sysops and long-established users to "trusted" list by default?
I see that initially the "trusted" list is blank. That can lead to a few extra false-positives which could have been averted. It's entirely possible for a sysop to block themselves or block another admin - not too serious if admins are able to unban themselves, but an annoyance if other senior users are being blocked in error.

As a related issue: pages in the "MediaWiki" namespace, such as MediaWiki:Spam-blacklist or MediaWiki:Sidebar, really shouldn't be being analysed here to determine whether they're spambot droppings. They're part of the user interface, only sysops are able to edit that namespace and deleting the page would affect how the entire wiki functions - usually by setting some piece of the interface back to MediaWiki defaults. 204.237.89.128 01:20, 27 April 2023 (UTC)

Is there an option to hide the flood of Special:RecentChanges entries?
I believe, for an ordinary page deletion, there was some way for an admin to set the 'bot' flag to hide a long series of deletions from Special:RecentChanges - I think it used a bot=1 parameter like www.example.wiki/index.php?title=Some_spammy_page&action=delete&bot=1

There needs to be something similar in this extension. Cleaning up a particularly bad spambot infestation often involves thousands of page deletions and blocking hundreds of bulk-created accounts. That pretty much crowds out anything else in Special:RecentChanges while this is running. 204.237.89.128 01:25, 27 April 2023 (UTC)


 * I'm not aware of any way to retroactively mark an edit as bot-generated - or, for that matter, any other way to flag edits so that they no longer show up in Special:RecentChanges. If you find one, though, it would be good to know about. Yaron Koren (talk) 14:59, 27 April 2023 (UTC)


 * Maybe what I've asked is unclear. There are three types of bot scripts:
 * a) Good bots which are part of the project, like Pywikibot. They carry out repetitive tasks like removing images from articles if the images was deleted from Commons or changing category tags on every article if a category has been renamed. These tasks are mundane but repetitive, so a flag is set in the RecentChanges table to indicate 'b' (user is a helpful bot). Those edits don't appear unless the user clicks the "show bots" link above the list of recent changes.
 * b) Crawlers and spiders of varying levels of utility. Some are legit (if they're feeding search engines). Some are less desirable - but all of them read without changing the wiki. Googlebot and Bingbot would be the obvious examples. They are run by external entities, but their IP address and user agent will clearly identify them as bots and show who owns them. They usually at least attempt to obey robots.txt so as not to place too much load on the servers.
 * c) Outright malicious bots, such as spambots. They deface pages by spamming links to external sites, they create spurious new account registrations in bulk, they refuse to indicate that they are indeed bots and not legit human users.
 * The 'bot' flag in Special:RecentChanges and the associated database table is for bots in category (a). There is a way for a sysop to set this flag on the deletion of one individual page, so that these don't clutter recentchanges unless the user clicks "show bots" to make them visible.
 * SpiteSpam deletes thousands of spam pages and blocks hundreds if not thousands of automatically-created spurious accounts. If those changes had the 'bot' flag (b) in Special:RecentChanges, that would avoid the page being flooded with thousands of entries unless the user expressly asks to "show bots". 204.237.89.128 22:54, 27 April 2023 (UTC)
 * I think I understand all of that, but again, I don't know how to retroactively change the "bot" flag. Yaron Koren (talk) 21:15, 28 April 2023 (UTC)

Pages with multiple editors?
The handling of pages which have been edited by multiple users leaves much to be desired. Here's how it can go wrong:


 * User:AGoodPerson creates a page or template with valid, useful content
 * User:EvilSpambot666 then replaces the entire body text with a page of spamlinks to herbal-v1agra.example.com
 * Special:SmiteSpam helpfully tells the administrator that there is a whole page of spammy v1agra links, and graciously points out that that page was originally created by User:AGoodPerson. It then provides two handy checkboxes, one to block User:AGoodPerson from the wiki forever for being the original creator of this horrible, spammy page and another to delete the page and all of its revisions. Just to be extra helpful, the extension displays the current content of the page (alas, with no indication of the existence of prior revisions, multiple editors or anything else.
 * The administrator, relying on this handy extension, goes off and does whatever bastard operators from hell normally do. Aren't computers great?

The extension is doing what its original creator programmed it to do. I'm just not sure that this is what they intended to see happen? 204.237.89.128 01:37, 27 April 2023 (UTC)


 * That's true, yes. My experience with spammers is that they tend not to modify "good" pages, but simply create their own pages - with "Main Page" being the big exception. Maybe some spammers now work differently, though - in which case the handling should definitely change. Yaron Koren (talk) 15:02, 27 April 2023 (UTC)


 * I think that there are other places in MediaWiki (such as an admin deleting one individual page) where a one-line message does appear to inform the user that the article has (n) multiple revisions or has been edited by multiple users - I don't remember the exact wording. It doesn't stop anyone from going ahead with the deletion, it just gives enough info that the user knows that there are other prior revisions. The user is then free to decide if those might be worth checking before trashing the whole mess. 204.237.89.128 22:44, 27 April 2023 (UTC)

Error checking?
I'm seeing a few cases where the server returns an error. The browser debug window (press F12 and - if it doesn't appear immediately - keep repeatedly pressing it), then go to the "network" tab looks like:

GET example.wiki/api.php?action=query&meta=tokens&format=json returns:
 * {"batchcomplete":"","query":{"tokens":{"csrftoken":"d88e757788d1dd9a6ccafedf7ec209e86449f621+\\"}}}

(so far, so good) then:

GET example.wiki/api.php?action=smitespamanalyze&format=json&offset=0&limit=250 returns:
 * {"error":{"code":"internal_api_error_MWException","info":"[af601560de4ab0bf1a42d076] Caught exception of type MWException","errorclass":"MWException"}}

(uh, oh). There's no indication to the user that the request (or a subsequent request for the next 250 pages, if the script keeps firing these off to the server for some reason - until one fails) has already returned, and has ended with an error. The icon just keeps spinning forever as if something were still happening, when the operation has failed for reasons unknown and will likely fail again if retried with the same pagecount.

In some cases, the browser will go into a seemingly-endless loop requesting the next n pages, the next n pages after that... ad infinitum. In some cases, the server will return an HTTP 500 or some similar code - possibly by hitting PHP's execution time limit. Sometimes, the server will return a JSON error string, but the browser will just keep spinning instead of displaying any indication that the operation has failed.

No idea what caused the original error, but telling the user that it's still processing isn't helpful. 204.237.89.128 04:45, 27 April 2023 (UTC)


 * Could you add  to your LocalSettings.php? That would lead to a more helpful error message, if/when you see this problem again. Yaron Koren (talk) 15:03, 27 April 2023 (UTC)

In one example:
 * GET example.org/api.php?action=smitespamanalyze&format=json&offset=0&limit=1000 returns valid data
 * GET example.org/api.php?action=smitespamanalyze&format=json&offset=1000&limit=1000 then fails by returning this:

No idea whether the issue is with the extension code or with my user/actor tables being inconsistent in some way. I also can't say that every instance in which the browser shows the icon still spinning to infer that processing is still taking place when the server has already returned an abnormal termination is happening in every context in which something fails. Every time. Even if the first request successfully returns data and a subsequent API call to request more pages fails, that little circle keeps spinning instead of the browser-side JS either reporting that an error has occurred or just displaying whatever valid data did come in before the error.

It doesn't matter whether it's the server repeatedly returning the JSON encoding of an empty array, it doesn't matter if it's an HTTP 500 or the PHP script being killed off outright for exceeding the php.ini execution time limit, it doesn't matter if it's the server returning an invalid result or a JSON-encoded error message like the one above: if something goes wrong, the user is never informed of the error and the browser sits endlessly spinning in order to mislead the user that processing is still taking place. At that point, there is no termination condition - unless every API call succeeds, the user can wait forever watching that little spinning circle icon and dream (in vain) that actual processing is somehow continuing and will complete some day.

There needs to be some error handling on the client side to inform the user about what's actually happening - and that never occurs with the extension (which is otherwise a useful tool).

In some cases, an API call fails where a similar call requesting a smaller number of pages might have succeeded. Unfortunately, there's no way to change the number of pages requested (shot of editing extensions/SmiteSpam/extension.json on the server) and no way to see what's going on short of hitting  or whatever launches the in-browser debugger. 204.237.89.128 22:28, 27 April 2023 (UTC)

May a banned user be trusted?
An odd case:

A normal user has [username] [ ], checkbox to "block", pushbutton to "Trust" above the list of their posts.


 * Conversely, a "banned" user has just [username] and (blocked) - no checkbox, no "Trust" button. It's possible that a user is partially or temporarily blocked, but still needs to be added to Extension:SpiteSpamTrustedUsers to indicate they are not robots and should not have their valid posts being suggested for deletion by this extension.

For instance, a ban or block like:


 * This user is currently partially blocked. The latest block log entry is provided below for reference: 18:12, 4 July 1776 MaryBarra (talk contribs block) changed block settings for BillFord (talk contribs) preventing edits on the page Cadillac_(motorcar) with an expiration time of 06:05, 30 July 2099 (Please stop editing this article to say "buy a new Mustang GT instead") (unblock | change block) View full log

probably doesn't indicate an actual spambot. The only way to get him off the list so that he's not being checked as a possible bot would be to go to Special:SpiteSpamTrustedUsers and add him manually, as there's no "Trust" button to press to indicate he is not a robot. 204.237.89.128 04:45, 27 April 2023 (UTC)

IP blacklists?
There are a few lists of spambot IP's in various formats - some are downloadable lists (like cleantalk.org/blacklists - which is non-free), some are RBL (real-time blacklist) aka DNSBL (domain name server which accepts blacklist enquiries), some are lists of IP's blocked locally or with an extension like Extension:GlobalBlocking on the same or another wiki (some may want to import Wikipedia's list). Usually these lists are used when deciding whether to accept new posts or new connection attempts from some particular IP (and a search for "RBL blacklist" finds hundreds of options, mostly for spam e-mail blocking but occasionally for forum, wiki or blog comment spam). Would it be practical to import one of these lists of spammy IPv4's and check it against the users who have already posted (which Extension:ConfirmEdit seems to keep for thirty days by default)? So basically:


 * Spambot posts to your wiki yesterday. Their IP is logged by ConfirmEdit.
 * The same spambot gets caught posting spam to a dozen other sites today. It's blacklisted by stopforumspam, cleantext or whomever.
 * The SpiteSpam extension runs on your wiki tomorrow, finds posts from an IP which is now on the "grit list" for spamming and asks if that IP's posts on your system are spam which should be nuked or auto-generated accounts which should be blocked.

Just a suggestion. I learned of SmiteSpam recently from workingwithmediawiki.com/book/chapter14.html and it has been a very valuable tool, bugs and all, because I'm trying to run damage control after a major spambot attack which left tens of thousands of bad pages across multiple projects. It likely wouldn't have been easy or feasible to clean up this mess from the browser without this fine tool. Thanks. 204.237.89.128 06:09, 27 April 2023 (UTC)


 * ConfirmEdit? Shouldn't that be Extension:CheckUser? 66.102.87.40 16:38, 27 April 2023 (UTC)