Alternative parsers/diq

This page is a compilation of links, descriptions, and status reports of the various alternative MediaWiki parsers—that is, programs and projects, other than MediaWiki itself, which are able or intended to translate MediaWiki's text markup syntax into something else. Some of these have quite narrow purposes, while others are possible contenders for replacing the somewhat labyrinthine code that currently drives MediaWiki itself.

Many of the things linked here are likely to be out of date and under-maintained, or even abandoned. But in the interest of not duplicating the same work over and over, it seemed sensible to collect together what was "out there". In addition, although so many alternative parsers exist, almost no unofficial parser powers any wiki site, except for wikitextparser which powers the OpenTTD wiki through TrueWiki.

Parsers that build an abstract syntax tree (AST) and provide access to it are listed under #Parsers providing an AST; parsers that don't build an AST but extract some information are listed under #Parsers extracting some information; the rest of the parsers are listed under #Other parsers.

A non-parser dumper
One of the common uses of alternative parsers is to dump wiki content into static form, such as HTML or PDF. Tim Starling has written a script which isn't a parser, but uses the MediaWiki internal code to dump an entire wiki to HTML, from the command-line. See Extension:DumpHTML. This has been used (years ago) to create the static dumps at https://dumps.wikimedia.org

There are also similar dumpers as part of the Kiwix project, for example mwoffliner, and you can query the RESTBase API to obtain HTML-format output with semantic information (such as tranclusions) included.

Related topics

 * If you want to convert MediaWiki documents into some other format, the above tools are useful. If you want to convert HTML documents or other formats into MediaWiki documents, you may find Wikipedia: Wikipedia: Tools/Editing tools and Manual: importing external content more useful.
 * One-pass parser
 * MediaWiki lexer and MediaWiki flexer (not parsers as such, just grammar definitions; probably superseded by/within other projects below)
 * en:Wikipedia:Text editor support includes various scripts and extensions for things like syntax highlighting for things like EMACS, Vim, and all sorts; some of these may include rudimentary parsing capabilities.
 * Here are some proof of concept rules for a subset of the Mediawiki markup: these are written in a metalanguage that treats preformatted text as source text, and everything else as comment.
 * Markup spec aims to produce a specification of MediaWiki's markup format.
 * Help:Extension:ParserFunctions is the main parser extension for MediaWiki.
 * mwparserfromhell and Parsoid's similar jsapi are useful tools for extraction and transformation tasks.
 * If no library suits your needs, you still have the option of parsing the data dumps: see meta:Data_dumps and meta:Data_dumps/Other_tools.