User:DanielRenfro/Parser Notes

The goal of these notes are to:
 * 1) understand parsing in general
 * 2) thoroughly understand and document the MediaWiki parser

General

 * Lexical analysis
 * Parser

Software

 * Lex
 * Yacc
 * Flex lexical analyser
 * GNU bison

Requirements

 * UTF-8

Types of tokens

 * syntax
 * human-language
 * namespaces
 * magic words
 * html-like tags
 * templates (recursive)
 * parser functions


 * 1) Call helper function Parser::internalParse, which in turns calls
 * 2) Parser::replaceVariables, which replaces magic variables, templates, and template arguments with the appropriate text.
 * 3) It calls Parser::preprocessToDom, which preprocesses some wikitext and returns the document tree.
 * 4) Next it creates a PPFrame DOM object and calls its expand method to do the actual template magic.
 * 5) Sanitizer::removeHTMLtags, which cleans up HTML, removes dangerous tags and attributes, and removes HTML comments.
 * 6) Parser::doTableStuff, which handles and renders the wikitext for tables.
 * 7) Parser::doDoubleUnderscore, which removes valid double-underscore items, like , and records them in array.
 * 8) Parser::doHeadings</tt>, which parses  and renders section headers.
 * 9) Parser::replaceInternalLinks</tt>, which processes internal links  and stores them in   (a LinkHolderArray</tt> object),
 * 10) Parser::doAllQuotes</tt>, which replaces single quotes with HTML markup ( <i>, <b>, etc).
 * 11) Parser::replaceExternalLinks</tt>, which replaces and renders external links.
 * 12) Parser::doMagicLinks</tt>, which replaces special strings like "ISBN xxx" and "RFC xxx" with magic external links.
 * 13) Parser::formatHeadings</tt>, which:
 * 14) * auto numbers headings if that options is enabled,
 * 15) * adds an [edit] link to sections for users who have enabled the option and can edit the page,
 * 16) * adds a Table of contents on the top for users who have enabled the option, and
 * 17) * auto-anchors headings.
 * 18) Next, parse</tt> calls Parser::doBlockLevels</tt>, which renders lists from lines starting with ':', '*', '#', etc.
 * 19) Parser::replaceLinkHolders</tt> is called, which calls LinkHolderArray::replace</tt> on  to replace link placeholders with actual links, in the buffer Placeholders created in Skin::makeLinkObj
 * 20) Next, the text is language converted (when applicable) using the convert</tt> method of the appropiate Language</tt> object.
 * 21) Parser::replaceTransparentTags</tt> is called, which replaces transparent tags with values which are provided by the callback functions in $Parser->mTransparentTagHooks</tt>. Transparent tag hooks are like regular XML-style tag hooks, except they operate late in the transformation sequence, on HTML instead of wikitext.
 * 22) Sanitizer::normalizeCharReferences</tt> is called, which ensures that any entities and character references are legal for XML and XHTML specifically.
 * 23) If HTML tidy is enabled, <tt>MWTidy::tidy</tt> is called to do the tidying.
 * 24) Finally the rendered HTML result of the parse process is stored in the <tt>ParserOutput</tt> object, which is returned to the caller of <tt>Parser::parse</tt>.

External link

 * The MediaWiki parser, uncovered