User:Kephir/XML parse tree

The following is an unofficial documentation of the XML parse tree format, as returned by Special:ExpandTemplates and the API, like API:Expandtemplates and API:Properties, when a  argument is passed to the API call.

Elements

 * root
 * The root element. Has no interesting attributes by itself.
 * Since whitespace is significant in reconstructing wiki markup, it is a good idea to parse the XML document as if  had an   attribute. MediaWiki does not specify it explicitly, however.


 * template
 * Indicates a template invocation . Must contain at least a  element, followed by optional   elements.
 * The  attribute is present and set to 1 if the template immediately follows a newline.


 * tplarg
 * Indicates a template argument reference . Contents are just like, a   element followed by optional  s. The   attribute has the same meaning as above.


 * part
 * Indicates a template argument (or default value for a template argument reference). Always contains a  and a   element, in that order, with an equal sign between them if the name is given explicitly. If the template argument is an implicitly numbered one, the   element will be empty and contain an   attribute specifying the index.
 * For  elements, only the first   child should be looked at to provide default arguments, the rest are ignored. The split into   and   is disregarded.


 * h
 * Indicates a header . The  attribute contains the header level, while   contains the section number, regardless of level (the same that the   query string parameter uses).


 * ext
 * Indicates a parser extension tag, such as ,  or . Not all tags are parser extension tags;  or , for example, are not. Which tags are considered parser tags depends on MediaWiki installation. To obtain a list of extension tags, use API:Meta with the  query parameter.
 * This element always contains (possibly empty)  (tag name) and   (attributes) child elements, optionally an   element, and optionally   following it. The contents of   need not conform to HTML or XML attribute syntax.
 * If the parser tag is specified in a self-closing form (e.g. ), the   element will lack   and   child elements.


 * ignore
 * Indicates text to be ignored, usually a ,  or  tag and/or its contents.
 * There is no option in the publicly available API to preprocess wikitext in transclusion mode, i.e. ignoring contents of <noinclude ></noinclude> while parsing <includeonly ></includeonly> or restricting parsing to <onlyinclude ></onlyinclude> (bug #49353).


 * comment
 * Indicates an HTML-style comment, i.e. . The contents of this element include the comment start mark  and end mark.

Serialisation
Turning the XML parse tree back into wiki markup is rather simple. It amounts to four substitutions, three of them being:

&lt;template>...&lt;/template> → {&#123;...}} &lt;tplarg>...&lt;/tplarg> → {&#123;{...}}} &lt;part>...&lt;/part> → |...

Care has to be taken when handling  elements. For elements that contain  element, the following substitution is appropriate:

&lt;ext>&lt;name>...&lt;/name>&lt;attr>...&lt;/attr>...&lt;/ext> → &lt;......>...

Otherwise, use:

&lt;ext>&lt;name>...&lt;/name>&lt;attr>...&lt;/attr>&lt;/ext> → &lt;....../>

Other elements can have their contents passed through as is.

The whole process is equivalent to applying the following XSLT stylesheet:

Implementation

 * [//git.wikimedia.org/blob/mediawiki%2Fcore.git/HEAD/includes%2Fparser%2FPreprocessor_DOM.php <tt>Preprocessor_DOM.php</tt> in the git tree]
 * [//git.wikimedia.org/blob/mediawiki%2Fcore.git/HEAD/includes%2Fparser%2FPreprocessor_Hash.php <tt>Preprocessor_Hash.php</tt> in the git tree]