Extension talk:AbuseFilter/Rules format

About this board

19 previous topics. Previous discussion was archived at Extension talk:AbuseFilter/Rules format/Archive 1 on 2016-10-24.

How to detect "br /" ?

5
Erik Baas (talkcontribs)

This filter:

added_lines irlike ".*\<(\/br|br\/|br&#32;\/)\>.*"

detects "/br" and "br/", but not "br /"! Replacing "&#32;" with a literal space character doesn't work either. What am I doing wrong?

Daimona Eaytoy (talkcontribs)

Entities should not be encoded, so replacing " " with a literal space is the right thing to do. You can test the modified regex on Special:AbuseFilter/tools and you'll see that it does match "br /". Also, regexps are not anchored in AbuseFilter, meaning you can omit the ".*" at the beginning and end of the expression. You could rewrite the above as added_lines irlike "<(\/br|br *\/)>", and this should work as far as I can see.

BDavis (WMF) (talkcontribs)

You could try using \s* as the PCRE expression for "zero or more consecutive whitespace characters". That could look something like .*<\/?br\w*\/?>.*. That regex could be explained as "anything, followed by '<', optionally followed by '/', followed by 'br', followed by zero or more consecutive whitespace characters, optionally followed by '/', followed by '>', followed by anything".

Erik Baas (talkcontribs)

Yes, thank you!

Od1n (talkcontribs)

Minor mistake in BDavis code: should be .*<\/?br\s*\/?>.*. Also (as already noted by Daimona Eaytoy), you may omit the leading and trailing .*, they are useless though make the regexp slower (because of backtracking).

Reply to "How to detect "br /" ?"
Ponor (talkcontribs)

Can someone please add the oauth_consumer variable to the table and list (some of) its values? It was announced in January 2022: A oauth_consumer variable has been added to enable identifying changes made by specific tools. [42]

Reply to "oauth_consumer"

Why was "minor_edit" removed and obsoleted?

4
Equinox (talkcontribs)

An explanation would be nice. Also what are we supposed to write instead, to get the same effect?

Matěj Suchánek (talkcontribs)
Daimona Eaytoy (talkcontribs)

Exactly. Moreover, I think (but I'm unsure where I may have heard this) it's also because there's little benefit in having minor_edit: it wouldn't mean much in many typical use cases for filters.

Ciencia Al Poder (talkcontribs)

Too bad it was requested in phab:T28636 and phab:T19674. However, I understand why it was removed, since it affects stashed edits.

The minor flag is not really meaningful, and if you want to enforce the minor edit flag (or lack of), you should implement your own validation with a hook

Check an uploaded file text

4
Andriy.v (talkcontribs)

I want to check if an uploaded file with a dimention defined by variable file_width and file_height contains a define wikitext. According to Note text in main page: "Variables related to the page edit, including summary, new_wikitext and several others, are now available for action='upload'" i suppose that i can use new_wikitext on this purpose, but it do not work. Can someone help me with this issue?

Sakretsu (talkcontribs)

It looks like the new_wikitext variable isn't available when examining past uploads. Try creating your filter, it should work.

Daimona Eaytoy (talkcontribs)

new_wikitext should indeed work. Edit-related variables are not available when using /test because that data is saved separately in the recentchanges table, and AbuseFilter doesn't query it at the moment.

Andriy.v (talkcontribs)

Yes, i've already saw that. Thank you 2 for confirm. Grazie mille:)!

Reply to "Check an uploaded file text"
Summary by Clump

Subpages not included in the filter.

Clump (talkcontribs)

I was wondering how this edit got through filter 57, but I am guessing that reverting an edit is not one of the editing actions that the abuse filter tracks--is that the case? If so, is it possible to include revert?

Pppery (talkcontribs)

It has nothing to do with reverts. The edit didn't trigger the filter because the filter only applies to root user pages, not user subpages.

Clump (talkcontribs)

Ah, thank you---I missed the "/"! :)

"is_proxy" instruction

4
Andriy.v (talkcontribs)

Can someone explane me how to use "is_proxy" instruction? In the description it show that it returns "true" (boolean), but than it shows that it's an integer data type, which is true?

Matma Rex (talkcontribs)

I've never actually used it, but that probably means that it's 1 when the edit was performed through a proxy, and 0 otherwise.

Andriy.v (talkcontribs)

Thank you @Matma Rex: is this instruction actually supported in Wikimedia projects?

Daimona Eaytoy (talkcontribs)

Correct: apparently, AutoProxyBlock is using integers rather than booleans. I've updated the docs.

And no, AutoProxyBlock is not installed on Wikimedia projects.

Reply to ""is_proxy" instruction"

Add effective user age

3
Il Gatto Obeso (talkcontribs)

I (and probably several wiki admins) prefer a "user age since first edit" variable being added. When will it be possible?

DannyS712 (talkcontribs)

Please file a request on phabricator to request a new feature like this, though it would likely be expensive to implement (user age can be easily detected from the `user` table, age since first edit less so)

Il Gatto Obeso (talkcontribs)

Alternately, since nobody as of now has had the need to do that implementation, same work could be made by a MW extension that adds above-stated feature to Abuse Filter (if it even allows to be extended) which creates a new metadata table. I also thought about Extension:Editcount or something, or using MW's own cache (if it even exists) to store that info, but I have currently no accurate idea of a possible implementation.

Reply to "Add effective user age"
MarioSuperstar77 (talkcontribs)

I'm trying to figure out a way to prevent ddosers from editing the same page one million times on my wiki (This can be done by using bots mind you), but none of the functions here seem to work.

The closest to what I want is Timestamp which checks the current time, however it is the current time of the wiki, not the time of the last edit on any given article. I did not realize it immediately, so I wrote this timestamp <= (timestamp+10)thinking this would stop the spambots from editing causing issues to my wiki, but it flagged all the edits made by other users.

I eventually opted to prevent editing from my wiki, unless you register your email address until I get this covered.

What I need does not seem to exist on this list, so it would be great to have a function like last_edit to check when was the last edit made on an article.

Before you tell me to just "ban them", keep in mind spammers can just register another account and evade IP bans through VPNs, just help me fix my issue instead of gossiping.

Zzuuzz (talkcontribs)

You can probably use a abusefilter throttle for this. Simply check for any edit to a specific page or group of pages, perhaps for new or non-confirmed users only, and then set the throttle limit to, say, one edit per day, with a disallow action. In other words, let the throttle handle the time tracking. Depending on your problem you could also group the throttle by page or by user. ~~~~

Ciencia Al Poder (talkcontribs)

Use Manual:$wgRateLimits for that.

AbuseFilter would need a timestamp of the last (current) edit on that page, which doesn't exist as a variable, but you may request it if not requested already.

Daimona Eaytoy (talkcontribs)

As mentioned by Zzuuzz above, the canonical solution here is using throttling. You'll want your filter pattern to just check the specs of the current edit, e.g. !("confirmed" in user_groups), and then configure the throttle action to allow say one edit on the same page every x minutes, and add "disallow" or "block" as additional action. Also, note that timestamp <= (timestamp+10) is a tautology, regardless of the language we're talking about.

You can also propose the implementation of a new variable, but I'd like to see concrete use cases before implementing new variables.

Finally, I do think that using $wgRateLimit as a safety net is really a good idea, regardless of spammers and filters.

HTH!

Alternatives to added_lines for new pages

9
Wedhro (talkcontribs)

I'm writing a basic anti-spam filter that checks added_lines but it seems to only work on edits of already existing pages, not for new pages, which have no diffs. Would new_text do the job? Any other alternative?

Ciencia Al Poder (talkcontribs)

added_lines works for new pages too. On page creation, added_lines and new_text are the same

Wedhro (talkcontribs)

I'm asking this because the spam filter prevented all edits including an URL except one, which was a newly created page.

Daimona Eaytoy (talkcontribs)

added_lines is likely not the cause. The variable is available for all edits, including new page creations. As an alternative, you may try new_wikitext (not new_text), but the result will be the same.

Wedhro (talkcontribs)

You're right, now I see where the problem is: there's a & !"://" in removed_lines to allow edits on already existing URLs so that only new URLs are prevented, but that doesn't work on new pages because they have no previous version without an URL. I have no idea how to make this work logic-wise, can you help me?

Daimona Eaytoy (talkcontribs)

If you want to catch the addition of new links exclusively, you can use added_links; but beware, that tends to be slowish. You'd better evaluate that variable only as the last condition of the filter (i.e. after user_groups checks etc.).

Wedhro (talkcontribs)

I thought about that but it's not very clear how it works, especially where it says "Only unique links are added to the array." Would a generic search for "https://" or such work, or does it only accept full URLs to specific websites?

Daimona Eaytoy (talkcontribs)

That means that every link only appears once. Being an array, you can use length() on that, or string manipulation functions.

Ciencia Al Poder (talkcontribs)

Oops, yes, I meant to say added_lines was the same as new_wikitext, not new_text

Reply to "Alternatives to added_lines for new pages"

Identifying new pages

5
Erutuon (talkcontribs)

I have a question about statements in the documentation about identifying new pages. I am an admin in the English Wiktionary. In the documentation for page_id and page_age, it seems to say that for new pages page_id may be 0 and page_age is always 0, but there were no matches for either condition (page_id == 0 or page_age == 0) in Special:AbuseFilter/test except for page deletions, whereas old_size == 0 yields a lot of edits that created pages (for which neither page_id nor page_age were equal to 0). Is the documentation wrong that page_id == 0 and page_age == 0 identify newly created pages or am I misinterpreting what it says somehow?

Ciencia Al Poder (talkcontribs)

page_id is a non-deterministic value that changes when the page is created. The filters are checked before the new page is assigned a new ID, hence the filter matches when the page is evaluated at the time of creation, but not when the page has already been created, because then the page already has an ID.

Erutuon (talkcontribs)

So basically the page creation actions seen in Special:AbuseFilter/test do not contain the actual value of page_id when the action occurred, the value that a filter will see when it is saved and in operation? And the same with page_age? That is not helpful because it does not allow Special:AbuseFilter/test to give a completely realistic picture of how a filter will work, as far as page_id is concerned. I would have expected to be searching a "snapshot" of the actual values when the action happened. But I suppose the workaround is to create a test filter.

Ciencia Al Poder (talkcontribs)

Yeah, that data is not saved. Probably because that would be a lot of information to store for each edit/action that doesn't match any filter. This is task T102944

Erutuon (talkcontribs)

Thanks for explaining this to me, and for posting about this on the Phabricator task.

Reply to "Identifying new pages"