User:DanielRenfro/Parser Notes

The goal of these notes are to:
 * 1) understand parsing in general
 * 2) thoroughly understand and document the MediaWiki parser

General

 * Lexical analysis
 * Parser

Software

 * Lex
 * Yacc
 * Flex lexical analyser
 * GNU bison

Requirements

 * UTF-8

Types of tokens

 * syntax
 * human-language
 * namespaces
 * magic words
 * html-like tags
 * templates (recursive)
 * parser functions


 * 1) Call helper function , which in turns calls
 * , which replaces magic variables, templates, and template arguments with the appropriate text.
 * 1) It calls, which preprocesses some wikitext and returns the document tree.
 * 2) Next it creates a object and calls its   method to do the actual template magic.
 * , which cleans up HTML, removes dangerous tags and attributes, and removes HTML comments.
 * , which handles and renders the wikitext for tables.
 * , which removes valid double-underscore items, like, and records them in array.
 * , which parses and renders section headers.
 * , which processes internal links and stores them in   (a  object),
 * , which replaces single quotes with HTML markup ( , , etc).
 * , which replaces and renders external links.
 * , which replaces special strings like "ISBN xxx" and "RFC xxx" with magic external links.
 * , which:
 * 1) * auto numbers headings if that options is enabled,
 * 2) * adds an [edit] link to sections for users who have enabled the option and can edit the page,
 * 3) * adds a Table of contents on the top for users who have enabled the option, and
 * 4) * auto-anchors headings.
 * 5) Next,  calls , which renders lists from lines starting with ':', '*', '#', etc.
 * 6)  is called, which calls   on   to replace link placeholders with actual links, in the buffer Placeholders created in Skin::makeLinkObj
 * 7) Next, the text is language converted (when applicable) using the  method of the appropriate  object.
 * 8)  is called, which replaces transparent tags  with values which are provided by the callback functions in  . Transparent tag hooks are like regular XML-style tag hooks, except they operate late in the transformation sequence, on HTML instead of wikitext.
 * 9)  is called, which ensures that any entities and character references are legal for XML and XHTML specifically.
 * 10) If HTML tidy is enabled,  is called to do the tidying.
 * 11) Finally the rendered HTML result of the parse process is stored in the object , which is returned to the caller of.

External link

 * The MediaWiki parser, uncovered