Preprocessor ABNF

MediaWiki preprocessor syntax in ABNF (RFC 5234).

Ideal rules
Since MW 1.12:

Where parts-L2 is like parts except that it allows headings inside it:

These "broken" rules, when matched, produce output similar to a literal start followed by ordinary wikitext. The difference is that they compete on the same precedence level as the unbroken rules. So the previous example is parsed as a broken-template containing a broken-link containing a long string and a literal "}}". Based on the ideal rules, we would expect the literal interpretation of "}}" to have a lower precedence than its interpretation as the end of a template. But with the "broken" rules, the broken-link takes precedence over the template, being the rightmost-opened structure.

Broken rules always run to the end of the input string, because the only other way to terminate a broken rule is to turn it into an unbroken rule by closing it.

Because a heading</tt> or a broken-heading</tt> can appear in a part-L2</tt>, there is now ambiguity between the equals sign of the name/value separator, and the equals sign for the heading. We resolve it in the following way:


 * For level 1 headings (i.e. one equals sign on each side), the part</tt> takes precedence.
 * For level 2-6 headings, the heading takes precedence.

If the part-L2</tt> later becomes a part</tt> because the template</tt> or tplarg</tt> is closed, we could now have an errant heading</tt> in wikitext-L3</tt>, where it's not allowed. The heading</tt> can easily be disabled, but the name/value separator can't easily be recovered. To represent the syntactic effect of this, we introduce another rule:

The disambiguation of disabled-heading</tt> with part</tt> works in the same way as the disambiguation of heading</tt> with part-L2</tt>, described above.

Possible improvements
If an efficient algorithm could be found for disambiguating the ideal rules, without introducing "broken" rules, that would be great. It would be a b/c break, but probably beneficial. Backwards compatibility was broken anyway by introducing broken-heading</tt> (the "newsome" bug on MNPP).

Line-eating comments could very easily be made to match at the start of the string. Currently they don't since there is no <tt>LF</tt> at the start of the string, just a <tt>LINE-START</tt>.

The "rightmost opening" rule for bracketed precedence is arbitrary, an artifact of implementation. Leftmost opening would probably be more intuitive.