Extension:AbuseFilter/Conditions/Prior to MediaWiki 1.27

From mediawiki.org

Prior to MediaWiki 1.27, the per filter reporting of condition numbers was completely wonky / broken and should not be considered accurate in any way, so don't necessarily rely on those numbers when identifying problems. This was fixed in https://gerrit.wikimedia.org/r/#/c/282399/.

The condition count was also incremented for every function parameter and every nested parenthetical condition. This was fixed in https://gerrit.wikimedia.org/r/#/c/282477/.

General counting[edit]

Condition counting
Rules Conditions used Notes
'foo' == 'bar' 1 A simple test counts as one condition
false & false & false & false & false 5[1]
( 'foo' == 'bar' ) 2 Evaluating parenthesis also counts as conditions
( 'foo' ) == ( 'bar' ) 3
(((( 'foo' == 'bar' )))) 5
false & ( false & false & false & false ) 2 But they can be used to force a short-circuit
false & ( true & true & true & true ) 2
true & ( false & false & false & false ) 6 Rearranging and grouping the conditions according to their likelihood of being true might represent a big difference in the total number of conditions used by a complex filter
true & ( false & ( false & false & false ) ) 4
str_replace( 'FooFoo', 'Foo', '' ) == 'bar' 5 Each function call and each parameter evaluation also counts as one condition
str_replace( 'FooFoo', 'Foo', '' ) == 'bar'
| str_replace( 'FooFoo', 'Foo', '' ) == 'baz'
9 1 from str_replace + 3 from its parameters + 1 from the first == + 3 for the same parameters + 1 for the second ==
str_replace( 'FooFoo', 'Foo', '' ) 5 equivalent to "str_replace( 'FooFoo', 'Foo', '' ) = 1"
  1. ↑ The last 4 conditions are counted due to phab:T43693

Example 1[edit]

For a practical example, consider filter 59:

article_namespace == 6
& !("autoconfirmed" in user_groups)
& !(user_name in article_recent_contributors)
& rcount ("\{\{.*\}\}", removed_lines) > rcount ("\{\{.*\}\}", added_lines)

This can be simplified as:

A & !(B) & !(C) & rcount( D, E ) > rcount( F, G )

Let's consider the branching chart:

  • A is true: new boolean operand, +1 condition
    • B is true: new boolean operand, and enter paren, +2 condition
      A & !(B) is false, enter bypass mode
      • C is true / false: new boolean operand, skip paren, +1 condition
        • rcount expressions: new boolean operand, skip functions, +1 condition
          • Total: 5 conditions
    • B is false: new boolean operand, and enter paren, +2 condition
      • C is true: new boolean operand, and enter paren, +2 condition
        A & !(B) & !(C) is false, enter bypass mode
        • rcount expressions: new boolean operand, skip functions, +1 condition
          • Total: 6 conditions
      • C is false: new boolean operand, and enter paren, +2 condition
        • rcount expressions: new boolean operand, evaluate D, E, F, and G, and evaluate rcount( D, E ) and rcount( F, G ), +7 conditions
          • Total: 12 conditions
  • A is false: new boolean operand, +1 condition
    A is false, enter bypass mode
    • B is true / false: new boolean operand, skip paren, +1 condition
      • C is true / false: new boolean operand, skip paren, +1 condition
        • rcount expressions: new boolean operand, skip functions, +1 condition
          • Total: 4 conditions

So, that filter runs from 4 conditions if the first operation is false to 12 conditions if every operation must be evaluated.

Example 2[edit]

Now consider an alternative construction with explicit parentheses for groups and removing excess parentheses around the "in" operations:

article_namespace == 6 & 
( 
  ! "autoconfirmed" in user_groups & 
  (
    ! user_name in article_recent_contributors & 
    rcount ("\{\{.*\}\}", removed_lines) > rcount ("\{\{.*\}\}", added_lines)
  )
)

This can be simplified as:

A & ( ! B & ( ! C & rcount( D, E ) > rcount( F, G ) ) )

Let's consider the branching chart:

  • A is true: new boolean operand, +1 condition
    • B is true: new boolean operand, and enter paren, +2 condition
      A & ! B is false, enter bypass mode
      • C is true / false: new boolean operand, skip paren, +1 condition
        • Total: 4 conditions
    • B is false: new boolean operand, and enter paren, +2 condition
      • C is true: new boolean operand, and enter paren, +2 condition
        A & ! B & ! C is false, enter bypass mode
        • rcount expressions: new boolean operand, skip functions, +1 condition
          • Total: 6 conditions
      • C is false: new boolean operand, and enter paren, +2 condition
        • rcount expressions: new boolean operand, evaluate D, E, F, and G, and evaluate rcount( D, E ) and rcount( F, G ), +7 conditions
          • Total: 12 conditions
  • A is false: new boolean operand, +1 condition
    A is false, enter bypass mode
    • B is true / false: new boolean operand, skip paren, +1 condition
      • Total: 2 conditions

So, that filter runs from 2 conditions if the first operation is false to 12 conditions if every operation must be evaluated. If the initial condition is rarely true, as article_namespace == 6 probably is, then the modified filter will consume only two conditions in most runs, compared to 4 conditions in the example without explicit parentheses. Stacking easy to evaluate but hard to match conditions at the front of a filter will generally improve run times and reduce condition usage. In most cases, the use of explicit parentheses also helps the edit filter parser more efficiently determine branching and also reduce both condition counts and runtimes.