Markup spec/BNF/Article

Wiki-page
The top-level element is wiki-page which describes the contents of a page. A page can either be a redirect or a normal article.

              ::= [ ] | [ ] ::=    ( | | EOL)            ::= FROM_LANGUAGE_FILE

, ,  and are defined in ../Links/ Notes:

The  is language-specific, and may have more than one possible value. By default the value for the right-hand-side of the expression (replacing FROM_LANGUAGE_FILE) is, but in Estonian it is. This match is case-insensitive (though this again may be overridden in the language file).

should be non-greedy, matching the largest subset of characters that does not contain .

For example,   will match the following, and treat it as a redirect to foo:
 *  #REDireCTnon%^sense[[foo|and this is parsed as article content </tt>


 * Interwiki prefixes may not be supported in redirect links. (Is this configurable?)
 * The following the redirect link is not rendered. However, it is parsed. So, interwiki links, category links and even normal links are still treated and behave "normally".
 * Anchors (Article#Section) are supported, but not yet described in the grammar.

Article
This describes the contents of an article. An article consists of blocks, which come in two flavours: paragraphs and special blocks. Both of them end with a newline. Paragraphs are separated by empty lines.

<special-block-and-more> ::= <special-block> ( EOF | [ ] <special-block-and-more>                                                      | ( | "") <paragraph-and-more> ) <paragraph-and-more>     ::= ( EOF | [ ] <special-block-and-more>                                                  | <paragraph-and-more> )

The nonterminals special-block-and-more</tt> and paragraph-and-more</tt> are not disjoint; the parser should first try to match against special-block-and-more</tt>.

The expression ( | "")</tt> is a greedy version of [ ]</tt>. If both the empty string and a newline can be matched, then the former expression matches the newline, while the latter expression would match the empty string according to the conventions on ../.

Paragraph
Every paragraph ends with a newline character. A paragraph translated in a &lt;p&gt; element.

<lines-of-text>          ::= <line-of-text> [<lines-of-text>] <line-of-text>           ::= <inline-text> <inline-text>            ::= <inline-element> [<inline-text>] <inline-element>         ::= | <magic-link> | <nowiki-tag> | ... |                    ::= ( - ) [ ]

In the penultimate rule, link</tt>, magic-link</tt> and nowiki-tag</tt> are described in ../Links/, ../Magic links/ and ../Nowiki/, respectively. The dots need to be filled in. Again, link</tt> and text</tt> are not disjoint; the parser should try text</tt> last.

The recursion in the second rule should be non-greedy, i.e., it should match as few lines as possible. For instance,
 * abc</tt>
 * </tt>

should be parsed as one line-of-text</tt> and one horizontal-rule</tt>, but
 * abc</tt>
 * <tt>---</tt>

should be parsed as two <tt>line-of-text</tt> nonterminals.

If a paragraph starts with a newline, the newline is as a &lt;br&gt; element.

Special block
Special blocks are things like itemized lists starting with <tt>*</tt> ; they can only be specified at the start of a line and usually run till the end of the line.

<special-block>          ::= <horizontal-rule> | | <list-item> | ... |

The dots need to be filled in.

Horizontal rule
A horizontal rule is specified by 4 or more dashes. It is translated to an &lt;hr&gt; element.

<horizontal-rule>        ::= "" [ ] [<inline-text>] ::= "-" [ ]

If the <tt>inline-text</tt> is present, it is not wrapped in a &lt;p&gt; element.

Heading
A level-n heading is translated to an &lt;hn&gt; element.

| <level-3-heading> | <level-2-heading> | <level-1-heading> <level-6-heading>        ::= "======" <inline-text> "======" <space-tabs> <level-5-heading>        ::= "====="  <inline-text> "====="  <space-tabs> <level-4-heading>        ::= "===="   <inline-text> "===="   <space-tabs> <level-3-heading>        ::= "==="    <inline-text> "==="    <space-tabs> <level-2-heading>        ::= "=="     <inline-text> "=="     <space-tabs> <level-1-heading>        ::= "="      <inline-text> "="      <space-tabs>

The alternatives in the first rule need to be tried from left to right.

Some notes (as implied by the grammar):
 * An unterminated heading tag is treated as normal text.
 * Unbalanced tags are treated as the shorter of the two tags (i.e. ==== heading == renders as the level 2 heading == heading)
 * More than 6 = signs are treated as 6, with the extra symbols being included in the header.

List item
<list-item>              ::= <indent-item> |  <enumerated-item> | <bullet-item> <indent-item>            ::= ":" [(<list-item> | <item-body>)] <enumerated-item>        ::= "#" [(<list-item> | <item-body>)] <bullet-item>            ::= "*" [(<list-item> | <item-body>)] <item-body>              ::= <defined-term> | [ ] <inline-text>

<defined-term>           ::= ";" [ ] ::= ":" <inline-text>

Semantics:
 * <indent-item> and are translated to a &lt;dd> element, wrapped in a &lt;dl>
 * A <bullet-item> is translated to a &lt;li> element wrapped in a &lt;ul>.
 * An <enumerated-item> is translated to a &lt;li> element wrapped in a &lt;ol>.


 * A <defined-term> is translated to a &lt;dt> element wrapped in a &lt;dl>.

Notes:
 * The grouping of successive list items cannot be captured in EBNF. The simplest approach would appear to be a second pass whereby successive pairings of close/open list are eliminated. For example, <ol><li>Foo</li> </ol><ol> <li>Boo</li></ol> would be rewritten as <ol><li>Foo</li><li>Boo</li></ol>
 * <list-item> and <defined-term> are obviously matched in preference to <inline-text>. The user has to insert whitespace in order to get inline-text starting with #, ;, * or :.
 * The current parser accepts a wide range of syntax than the above, allowing other list items to appear after a definition list . This appears to be arbitrary, unpredictable and not particularly useful. See bug11894.