Manual:Parser

This is an overview of the design of the MediaWiki parser.

Design principles
The MediaWiki parser is not really a parser, in the strict sense of the word. It does not recognise a grammar, rather it translates wikitext to HTML. It was called a parser for want of a better word. At least, even before the term was introduced as a class name, it was generally understood what was meant by "the Mediawiki Parser".

Performance is its primary goal, taking precedence over readability of the code and the simplicity of the markup language it defines. As such, changes which improve the performance of the parser will be warmly received.

Since the parser operates on potentially malicious user input up to 2MB in size, it is essential that it has a worst case execution time proportional to the input size, rather than proportional to the square of the input size.

The parser targets a low-memory environment, assuming a few hundred MB of RAM, and thus it uses markup as intermediate state where possible instead of generating inefficient PHP data structures.

Security is also a critical goal -- user input cannot be allowed to leak through into unvalidated HTML output, except if this is specifically configured for the wiki. Remote images, and other markup which causes the client to send a request to an arbitrary remote server, is not allowed by default, for privacy reasons.

TODO: write remainder

Markup transformation passes

 * doTableStuff
 * doDoubleUnderscore
 * doHeadings
 * replaceInternalLinks
 * doAllQuotes
 * replaceExternalLinks
 * doMagicLinks
 * formatHeadings

internalParseHalfParsed

 * Guillemet
 * doBlockLevels
 * replaceLinkHolders
 * Language conversion
 * Tidy
 * The non-tidy cases