Markup spec/BNF/Article

Wiki-page
The top-level element is wiki-page which describes the contents of a page. A page can either be a redirect or a normal article.

              ::= [ ] | [ ] ::=    ( | | EOL)            ::= FROM_LANGUAGE_FILE

, ,  and are defined in ../Links/

The  is language-specific, and may have more than one possible value. By default the value for the right-hand-side of the expression (replacing FROM_LANGUAGE_FILE) is, but in Estonian it is. This match is case-insensitive (though this again may be overridden in the language file).

should be non-greedy, matching the largest subset of characters that does not contain .

For example,   will match the following, and treat it as a redirect to foo:
 *  #REDireCTnon%^sense[[foo|and this is parsed as article content </tt>

Article
This describes the contents of an article. An article consists of blocks, which come in two flavours: paragraphs and special blocks. Both of them end with a newline. Paragraphs are separated by empty lines.

<special-block-and-more> ::= <special-block> ( EOF | [ ] <special-block-and-more>                                                      | ( | "") <paragraph-and-more> ) <paragraph-and-more>     ::= ( EOF | [ ] <special-block-and-more>                                                  | <paragraph-and-more> )

The nonterminals special-block-and-more</tt> and paragraph-and-more</tt> are not disjoint; the parser should first try to match against special-block-and-more</tt>.

The expression ( | "")</tt> is a greedy version of [ ]</tt>. If both the empty string and a newline can be matched, then the former expression mathes the newline, while the latter expression would match the empty string according to the conventions on ../.

Paragraph
Every paragraph ends with a newline character. A paragraph translated in a &lt;p&gt; element.

<lines-of-text>          ::= <line-of-text> [<lines-of-text>] <line-of-text>           ::= <inline-text> <inline-text>            ::= <inline-element> [<inline-text>] <inline-element>         ::= | <magic-link> | <nowiki-tag> | ... |                    ::= ( - ) [ ]

In the penultimate rule, link</tt>, magic-link</tt> and nowiki-tag</tt> are described in ../Links/, ../Magic links/ and ../Nowiki/, respectively. The dots need to be filled in. Again, link</tt> and text</tt> are not disjoint; the parser should try text</tt> last.

The recursion in the second rule should be non-greedy, i.e., it should match as few lines as possible. For instance,
 * abc</tt>
 * </tt>

should be parsed as one line-of-text</tt> and one horizontal-rule</tt>, but
 * abc</tt>
 * <tt>---</tt>

should be parsed as two <tt>line-of-text</tt> nonterminals.

If a paragraph starts with a newline, the newline is as a &lt;br&gt; element.

Special block
Special blocks are things like itemized lists starting with <tt>*</tt> ; they can only be specified at the start of a line and usually run till the end of the line.

<special-block>          ::= <horizontal-rule> | | | ... |

The dots need to be filled in.

Horizontal rule
A horizontal rule is specified by 4 or more dashes. It is translated to an &lt;hr&gt; element.

<horizontal-rule>        ::= "" [ ] [<inline-text>] ::= "-" [ ]

If the <tt>inline-text</tt> is present, it is not wrapped in a &lt;p&gt; element.

Heading
A level-n heading is translated to an &lt;hn&gt; element.

| <level-3-heading> | <level-2-heading> | <level-1-heading> <level-6-heading>        ::= "======" <inline-text> "======" <space-tabs> <level-5-heading>        ::= "====="  <inline-text> "====="  <space-tabs> <level-4-heading>        ::= "===="   <inline-text> "===="   <space-tabs> <level-3-heading>        ::= "==="    <inline-text> "==="    <space-tabs> <level-2-heading>        ::= "=="     <inline-text> "=="     <space-tabs> <level-1-heading>        ::= "="      <inline-text> "="      <space-tabs>

The alternatives in the first rule need to be tried from left to right.

Some notes (as implied by the grammar):
 * An unterminated heading tag is treated as normal text.
 * Unbalanced tags are treated as the shorter of the two tags (i.e. ==== heading == renders as the level 2 heading == heading)
 * More than 6 = signs are treated as 6, with the extra symbols being included in the header.

Lists
<definition-list>        ::= (<undefined-term> | <defined-term> | <termless-definition>) [ <definition-list>]

<bullet-list>            ::= <bullet-item> [ <bullet-list>] <bullet-item>            ::= "*" ( | [<inline-text>])

<enumerated-list>        ::= <enumerated-item> [ <enumerated-list>] <enumerated-item>        ::= "#" ( | [<inline-text>])

<undefined-term>         ::= ";" <inline-text> <defined-term>           ::= ";" (<inline-text> | ) ::= ":" (<inline-text>) <termless-definition>    ::= ":" (<inline-text> | )

Semantics:
 * A definition-list is translated to a &lt;dl> element.
 * A bullet-list is translated to a &lt;ul> element.
 * An enumerated-list is translated to a &lt;ol> element.
 * A defined-term or undefined-term is translated to a &lt;dt> element. If it contains a definition, the &lt;dt> element is closed first.
 * A definition or termless-definition is translated to a &lt;dd> element.
 * A bullet-item or enumerated-item is translated to a &lt;li> element.

Notes: :#two
 * These rules are a very rough draft. Not much attention has been paid to newlines, for instance.
 * There is at least one mistake in the above grammar: ;#foo:bar renders as: <dl> <dt>foo</dt> <dd> <ol> <li>bar</li> </ol> </dd> </dl> <dl> That is, the OL element is applied to the definition rather than the defined-term.
 * The above grammar does not come close to capturing the logic for sequences of list items. For instance:
 * 1) one

creates two separate enumerated lists. Can this even be expressed in BNF?