Markup spec

From mediawiki.org
Revision as of 12:47, 27 May 2006 by Jitse Niesen (talk | contribs) (add some, move table to binary section)

MediaWiki markup spec project:

Goals

  • Produce a specification of MediaWiki's markup format that is sufficiently complete and consistent that multiple compatible parser implementations can be built from it.
    • Spec may or may not use EBNF etc. Might have to just use lots of words. ;)
  • Define a data model for a parse tree
    • The data model should be representable in XML, though an official XML schema for such a representation may or may not be defined.
    • Round-trip conversion between source code and the data model must be possible. There may be a many-to-one relationship between source code and parse trees, but the canonical transformation from parse tree to source code should always parse back to the same parse tree.
  • A parser built from this spec will replace MediaWiki's current parser in the future.

Compatibility

  • In general, the spec will strive to avoid deviating from present behavior where it is reasonable and well-defined, and will seek to avoid adding new behaviour without considering whether it may break already existent pages.
  • Where the current parser's behavior is undefined or obviously buggy, the spec may define new behavior which is different.

Difficulties

  • Some of the syntax is kind of hairy. Bleah!
  • Language-sensitive and otherwise customizable keywords.
  • Extensions...
  • Integrated HTML and HTML-like tags.
  • Lots of scary context-sensitivity.

Resources

The Markup Language

The MediaWiki markup language (commonly referred to within the MediaWiki community as wikitext, though this usage is ambiguous within the larger wiki community) uses sometimes paired non-textual ASCII characters to indicate to the parser how the editor wishes an item or section of text to be displayed. The parser translates these tokens into (X)HTML as closely as semantically possible.

Current (v1.6) Markup tokens

The markup tokens fall into two broad categories: unary tokens (like : or * used at the beginning of a line), which stand alone, and binary tokens (like those for italic or boldface) which must be used in matched pairs.

Unary

Start of line only

  • Horizontal line: ---- (4 or more hyphens)
  • Pre-formatted text: (space)
  • Lists
    • Bulleted: *
    • Numbered: #
    • Indent with no marking: :
    • Definition list: ;
    Notes:
    • These may be combined at the start of the line to create nested lists, e.g. *** to give a bulleted list three levels deep, or **# to have a numbered list within two-levels of bulleted list nesting.
  • Redirects: #redirect or #REDIRECT (followed by wikilink)

Can be used anywhere

  • "Magic words", e.g. __FORCETOC__, __NOEDITSECTION__ (see m:Help:Magic words)
  • Signatures:
    • ~~~ Replaced with your username
    • ~~~~ Replaced with your username and the date
    • ~~~~~ Replaced with the date.
    Notes:
    • These tags are replaced at the point the edit is saved.
  • Magic links: ISBN, RFC, PMED

Binary

The ellipses (...) are used to indicate where the content goes and are not part of the markup.

  • Square brackets are used for links:
    • Internal/interwiki link + language links + category links + images: [[ ... ]] (see also Namespaces below)
      vertical bars separate optional parameters, which are:
      • link: first parameter: display text (also defaulted using "pipe trick") (also trailing concatenated text included in display, e.g. s for plural)
      • image: many parameters; see w:Wikipedia:Extended image syntax
      • category: first parameter: sort order in category list
    • External link: [ ... ]
      space separates optional first parameter, which is display text
    • undecorated URLs are also recognized and hotlinked
  • Apostrophes are used for formatting:
    • Italic: '' ... ''
    • Bold: ''' ... '''
    • Bold + Italic: ''''' ... '''''
  • Curly braces are used for transclusion:
    • Include template: {{ ... }} (see also Namespaces below)
    • Include template parameter: {{{ ... }}}
    • Interpolate built-in variable: {{PAGENAME}} (see m:Help:Variable)
  • Equals signs are used for headings (must be at start of line)
    • 1st level heading: = ... =
    • 2nd level heading: == ... ==
    • 3rd level heading: === ... ===
    • 4th level heading: ==== ... ====
    • 5th level heading: ===== ... =====
    • 6th level heading: ====== ... ======
    Some notes:
    • An unterminated heading tag is treated as normal text.
    • Unbalanced tags are treated as the shorter of the two tags (i.e. ==== heading == renders as the level 2 heading == heading)
    • Text after the closing tag appears on the next line as standard text.
    • More than 6 = signs are treated as 6, with the extra symbols being included in the header.
  • The whole quagmire that is table formatting: {| ... |} with in between |- |+ || | !! !
  • Various HTML style tags:
    • <nowiki>
    • <math> if $wgUseTeX is set
    • <html> if $wgRawHtml is set
    • <pre>
    • <gallery>
    • <onlyinclude> <noinclude> <includeonly>
    • Parser extension tags, like <ref> (using Cite.php)
    • Plus most 'non-dangerous' HTML tags: 'b', 'del', 'i', 'ins', 'u', 'font', 'big', 'small', 'sub', 'sup', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'cite', 'code', 'em', 's', 'strike', 'strong', 'tt', 'var', 'div', 'center', 'blockquote', 'ol', 'ul', 'dl', 'table', 'caption', 'pre', 'ruby', 'rt' , 'rb' , 'rp', 'p', 'span', 'u', 'br', 'hr', 'li', 'dt', 'dd', 'td', 'th', 'tr'
  • <!-- ... --> HTML-style comments

Namespaces

In wikilinks and template inclusions, colons set off namespaces and other modifiers:

  • proper namespaces: Talk:, User:, project, etc.
  • "special" namespaces: Image:, Category:, Template:
  • pseudo-namespaces: Special:, Media:
  • lone/leading :
    • lone : forces main namespace
    • leading : allows link to image page rather than inline image, or similarly to category or template page
  • interwiki links:
    • same project, different language: two-letter code
    • different project, same language: w: for Wikipedia, wt: for Wiktionary, m: for Meta, etc.
  • subst: force one-time template substitution upon edit, rather than dynamic expansion on each view
  • int:, msg:, msgnw:, raw: -- see m:Help:Magic words#Template modifiers
  • MediaWiki: magically access mediawiki formatting and boilerplate text (e.g. MediaWiki:copyrightwarning)
  • Colon functions: UC:, LC:, etc. (see m:Help:Colon function)
  • Parser functions: #expr:, #if:, #switch:, etc. (see m:ParserFunctions)
  • other extensions?

Several combinations of the above are possible, e.g. m:Help:Variable -- help namespace within Meta project.

See also m:Help:Magic words#Template modifiers.