Extension:AbuseFilter/Conditions

From mediawiki.org

Essay: The condition limiter is a somewhat improvised tool for preventing performance problems. To the extent that you want to worry about performance, execution times are generally a better measure to be thinking about. The per filter time and conditions numbers are somewhat broken (race conditions can cause them to be off), but most of the time they should be good enough to rely on.

The condition limit is (more or less) tracking the number of comparison operators + number of function calls entered. However, it is also smart enough to bypass functions and parenthetical groups if the value doesn't matter. For example, in the expression A & B, the details of B are only evaluated if A is true. For that reason, it is beneficial for performance to put simple limiting conditions, e.g., checks for article namespace, in front of more complex expressions. Lastly, it should be noted that function calls are cached, so they only add to the condition count the first time a specific function result is asked for.

MediaWiki 1.27 and later

Practical advice

  • Put the easy to evaluate but hard to match conditions at the front of a filter. This will allow filter matching to finish as soon as possible and will improve run times and reduce condition usage.
  • When checking for occurrences of multiple strings in text (common in filters detecting spam), it is a lot faster to use contains_any(text, 'a', 'b', 'c') or text rlike 'a|b|c' than a separate test for each string ('a' in text | 'b' in text | 'c' in text). It consumes fewer conditions, too.
  • All user_* variables except for user_name potentially require a database query, so using them is more expensive than pre-computed variables like action and page_namespace. They probably shouldn't be used as the first condition of a filter. (This might decrease or increase condition count, depending on whether the new order causes the matching to finish earlier or later, but should improve the actual performance.)

General counting

Condition counting
Rules Conditions used Notes
'foo' == 'bar' 1 A simple test counts as one condition.
'foo' == 'bar' | 'baz' == 'qaz' 2
'foo' == 'bar' & 'baz' == 'qaz' 1 Tests are not counted when they don't need to be evaluated to determine the result (short-circuit evaluation).
  • In the first example, 'foo' == 'bar' is false, so the overall result is also false regardless of the second test.
  • In the second example, 'foo' == 'foo' is true, so the overall result is also true regardless of the second test.
'foo' == 'foo' | 'baz' == 'qaz' 1
str_replace( 'FooFoo', 'Foo', '' ) == 'bar' 2 Each function call also counts as one condition.
str_replace( 'FooFoo', 'Foo', '' ) == 'bar'
| str_replace( 'FooFoo', 'Foo', '' ) == 'baz'
3 Repeated function calls with identical arguments are only counted once.

Example 1

For a practical example, consider filter 59 from the English Wikipedia:

page_namespace == 6
& !("autoconfirmed" in user_groups)
& !(user_name in page_recent_contributors)
& rcount ("\{\{.*\}\}", removed_lines) > rcount ("\{\{.*\}\}", added_lines)

This can be simplified as:

A & !B & !C & fun1() > fun2()

Depending on the values of variables, the filter can consume from 1 to 6 conditions:

  • 1 condition (1 comparison) if the first test is false—remaining tests are not evaluated
  • 2 conditions (2 comparisons) if the first test is true, but the second is false—remaining tests are not evaluated
  • 3 conditions (3 comparisons) if the first and second test are true, but the third is false—remaining tests are not evaluated
  • 6 conditions (3 comparisons + 2 function calls + 1 comparison) if the first, second and third tests are true

If the initial condition is rarely true, as page_namespace == 6 probably is, the filter will consume only one condition in most runs.

Prior to MediaWiki 1.27