Markup spec/BNF/Noparse-block

A "noparse block" (my term) is a block that is parsed according to totally different logic. It is the first thing the current parser does after preprocessing, in the "strip" method. The only thing that ends one of these blocks is the matching close tag.

Notes:
 * Nowiki, pre and html-comment are always available.
 * Html is available if $wgRawHtml is true in localsettings.php
 * Math is available if the math extension is installed
 * Other tags may be available if installed and present in parser->mTagHooks.

Nowiki
The &lt;nowiki&gt; tag prevents special markup (like  ''  for italics) from being recognized.

In words, if a &lt;nowiki&gt; tag is not closed, then it is taken to run until the end. (?=EOF) is a look-ahead assertion, like in PCRE. It asserts that an EOF follows, but does not consume the EOF.

Translating to HTML
To translate a nowiki tag to HTML, perform the following transformations:
 *  is replaced with &lt;p&gt;
 *  is replaced with &lt;/p&gt;
 *  terminals within  are replaced with the appropriate  (see Fundamental elements).
 *  is otherwise output more or less literally. Whitespace is treated as normal: single new lines are ignored, consecutive new lines are converted into  and   elements. Leading and trailing space from each line is removed, and runs of spaces are normalised to a single space within a line.
 * The elements in the top-level  are discarded.

, and each paragraph being trimed. --HappyDog 15:11, 18 June 2006 (UTC)''
 * '' Actually, this is not true. The  is treated as paragraphs of text, as in the main tag, with blank lines being replaced with
 * Noted, thanks. Stevage 04:43, 4 December 2007 (UTC)

Pre
The &lt;pre&gt; tag behaves much like nowiki, but generates a literal &lt;pre> tag, which causes different output. Notably, a nowiki is treated literally inside a pre tag, and vice versa.

Notes:
 * Not quite accurate,  is recognised, although is not.

Translating to HTML

 *  terminals are replaced.
 * New lines are retained literally.
 * The whole block is wrapped in

Translating to HTML

 * All characters, including whitespace, newlines, and "html-unsafe-symbol" terminals are output literally.
 * The block is not wrapped in anything.

Translating to HTML
HTML comments are completely stripped out, never to be seen again. It's possible that with the new parser, this behaviour could be changed - it was primarily to avoid conflict with other parts of the parser that generated internal comments, such as to identify section headings with.

Note: Unlike in HTML, this stripping is repeated until there is nothing left to strip, i.e.  becomes <!> (nothing).