Jump to content

Parsoid/MediaWiki DOM spec/Parser Functions

From mediawiki.org

In T373253 we introduced an updated DOM specification for parser functions. This was initially restricted to parser functions which used the new fragment handler API (T374616), but we expect to gradually roll out this new DOM output for all parser functions: first expanding to all parser functions invocations which begin with a hash (T394834) and then all parser functions (T394836).

For context, Parsoid's original DOM spec doesn't have any special support for parser function markup -- they were represented with a mw:Transclusion typeof attribute with the data-mw attribute embedding the knowledge of the parser function deep inside the rich attribute value.

For the input wikitext {{#padleft:xyz|4|-}}[1] the current MediaWiki DOM Spec 2.8.0 output is:

<p about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"#padleft:xyz","function":"padleft"},"params":{"1":{"wt":"4"},"2":{"wt":"-"}},"i":0}}]}'>-xyz</span></p>

and the new output will be:

<p about="#mwt1" typeof="mw:Transclusion mw:ParserFunction/padleft" data-mw='{"parts":[{"parserfunction":{"target":{"wt":"#padleft","key":"padleft"},"params":{"1":{"wt":"xyz","order":1},"2":{"wt":"4","order":2},"3":{"wt":"-","order":3}},"i":0}}]}'>-xyz</p>

Changes from the MediaWiki DOM Spec 2.8.0 output include:

Before: Attribute/sub-value (jq notation) Before: Value After: Attribute/sub-value (jq notation) After: Value Change
typeof mw:Transclusion typeof mw:Transclusion mw:ParserFunction/padleft Add mw:ParserFunction/key with the key being the message key for the parser function name
data-mw.parts[0].template.target {"wt":"#padleft:xyz","function":"padleft"} data-mw.parts[0].parserfunction.target {"wt":"#padleft","key":"padleft"} template -> parserfunction ; function -> key; wt has zeroth param moved out
data-mw.parts[0].template.params {"1":{"wt":"4"},"2":{"wt":"-"}} data-mw.parts[0].parserfunction.params {"1":{"wt":"xyz","order":0},"2":{"wt":"4","order":1},"3":{"wt":"-","order"2}} Zeroth param in wt is now first param, others renumbered. An "order" property clarifies the order of the arguments (see below).

Note that the data-mw.parts[0].parserfunction.target.wt value will have the # included iff the original wikitext did, but the key is the magic word ID and will never have a #.

The legacy parser did not support named arguments for parser functions (T204307) and so {{#foo|bar|bat}} and {{#foo|2=bat|1=bar}} can yield different results for certain legacy parser functions. Because of this, the params array for parser functions contains two new properties which are not present/necessary for template invocations: order and eq. The order parameter gives the (1-based) order of the arguments in the wikitext, since the params JSON object is not guaranteed to preserve key order. The eq parameter distinguishes {{#foo|bar}} from {{#foo|1=bar}}; the eq parameter will be true in the latter example to indicate that an explicit equal sign was present in the wikitext.

For conciseness and greater compatibility with our previous output, if the key is a numeric string (matches /^[0-9]+$/), the order property will default to the numeric value of that string. This enables us to omit order entirely when none of the parameters have eq set to true (that is, none of the parameters are named or out of order parameters; the keys in this case will all be sequential numeric strings starting with "1"). Similarly, the eq property will default to false if the key is numeric (matches /^[0-9]+$/) and true otherwise, which will reduce the need to explicitly encode eq for most named arguments.

Since data-mw.parts[0].parserfunctions.params remains a map, not an ordered list, arguments with duplicate keys could get lost. In order to prevent this, we prefix entries for any duplicate keys with a unique string matching /=\d+=/ since legitimate keys can't begin with an equal sign. This prefix should be stripped from the key if present; the order property will be sufficient to reconstruct an ordered list of arguments even in the presence of duplicated keys. (Although a number is present in the duplicate key prefix, it is just for uniqueness and should not be considered meaningful; use the order property not the contents of the prefix string.)

For the wikitext with duplicated argument names:

{{#f9:1|4|2=2|foo=bar|3|foo=bat}}

The DOM output is:

<span typeof="mw:Transclusion mw:ParserFunction/f9" data-mw='{"parts":[{"parserfunction":{"target":{"wt":"#f9","key":"f9"},"params":{"1":{"wt":"1"},"2":{"wt":"4"},"=2=2":{"wt":"2","eq":true,"order":3},"foo":{"wt":"bar","order":4},"3":{"wt":"3","order":5},"=5=foo":{"wt":"bat","order":6}},"i":0}}]}'>4</span>

Named arguments are supported by the Parsoid PFragmentHandler API. The data-mw output for a parser function with named parameters is still considered experimental and may change (T390344). Note that the typical wikitext semantics for an invocation with duplicated argument names is that the rightmost definition wins; that is, Arguments::getNamedArgs(...) would return [1=>'1',2=>'2',3=>'3','foo'=>'bat'].[2]

  1. Note that the padleft parser function doesn't "really" have a leading # in MediaWiki; we've added one for this example just to show where it would/would not show up in the Parsoid output.
  2. The values in this mapping are actually PFragment objects, not strings, but they are presented as strings here for readability.