Extension:AbuseFilter/Conditions

Essay: The condition limiter is a somewhat ad hoc tool for preventing performance problems. To the extent that you want to worry about performance, execution times are generally better measure to be thinking about. The per filter time and conditions numbers are somewhat broken (race conditions can cause them to be off), but most of the time they should be good enough to rely on.

The condition limit is (more or less) tracking the number of comparison operators + number of function calls entered. However, it is also smart enough to bypass functions and parenthetical groups if the value doesn't matter. For example, in the expression A & B, the details of B are only evaluated is A is true. For that reason it is beneficial to performance to put simple limiting conditions, e.g. checks for article namespace, in front of more complex expressions. Lastly, I should note that function calls are cached, so they only add to the condition count the first time a specific function result is asked for.

Practical advice

 * Put the easy to evaluate but hard to match conditions at the front of a filter. This will allow filter matching to finish as soon as possible and will improve run times and reduce condition usage.
 * When checking for occurrences of multiple strings in text (common in filters detecting spam), it is a lot faster to use   or  than a separate test for each string . It consumes fewer conditions too.
 * All   variables except for   potentially require a database query, so using them is more expensive than pre-computed variables like   and  . They probably shouldn't be used as the first condition of a filter. (This might decrease or increase condition count, depending on whether the new order causes the matching to finish earlier or later, but should improve the actual performance.)

Example 1
For a practical example, consider filter 59 from English Wikipedia:

article_namespace == 6 & !("autoconfirmed" in user_groups) & !(user_name in article_recent_contributors) & rcount ("\{\{.*\}\}", removed_lines) > rcount ("\{\{.*\}\}", added_lines)

This can be simplified as:


 * A & !B & !C & fun1 > fun2

Depending on the values of variables, the filter can consume from 1 to 6 conditions: If the initial condition is rarely true, as  probably is, the filter will consume only one condition in most runs.
 * 1 condition (1 comparison) if the first test is false – remaining tests are not evaluated
 * 2 conditions (2 comparisons) if the first test is true, but second is false – remaining tests are not evaluated
 * 3 conditions (3 comparisons) if the first and second test are true, but third is false – remaining tests are not evaluated
 * 6 conditions (3 comparisons + 2 function calls + 1 comparison) if the first, second and third tests are true