Alternative parsers
From MediaWiki.org
This page is a list of links, descriptions, and status reports of the various alternative MediaWiki parsers—that is, programs and projects, other than MediaWiki itself, which are able or intended to translate MediaWiki's text markup syntax into something else. Some of these have quite narrow purposes, others are possible contenders for replacing the somewhat labyrinthine code that currently drives MediaWiki itself.
Many of the things linked here are likely to be out of date and under-maintained, even abandoned. But in the interest of not duplicating the same work over and over, it seemed sensible to collect together what was "out there".
[edit] Related topics
- One-pass parser
- MediaWiki lexer and MediaWiki flexer (not parsers as such, just grammar definitions; probably superseded by/within other projects below)
- en:Wikipedia:Text editor support includes various scripts and extensions for things like syntax highlighting for things like EMACS, Vim, and all sorts; some of these may include rudimentary parsing capabilities.
- Here are some proof of concept rules for a subset of the Mediawiki markup: these are written in a metalanguage that treats preformatted text as source text, and everything else as comment.
- Markup spec aims to produce a specification of MediaWiki's markup format.
[edit] A non-parser dumper
One of the common uses of alternative parsers is to dump wiki content into static form, such as HTML or PDF. Tim Starling has written a script which isn't a parser, but uses the MediaWiki internal code to dump an entire wiki to HTML, from the command-line. See dumpHTML.php and dumpHTML.inc in the MediaWiki CVS repository. This has been used (years ago) to create the static dumps at http://static.wikipedia.org
[edit] Known implementations
| Name and link | Principal author(s) | Language | Input | Output | Comments / other info |
|---|---|---|---|---|---|
| mw2html | Connelly Barnes | Python | Wiki url | HTML | Mininimal setup - gets the basic job of creating a static copy of the wiki done |
| mwlib | PediaPress.com | Python | Markup, XML dump, and other | parse tree, HTML, PDF, XML, DocBook, OpenDocument | Part of cooperation between Wikimedia Foundation and PediaPress |
| Mediawiki2HTML Machine | Johannes Buchner | PHP | Markup | HTML | New project for parsing without the Mediawiki engine. |
| Mylyn WikiText | David Green | Java | Local files | HTML, DocBook, Eclipse Help, DITA, extensible | Integration with Ant and Eclipse runtime |
| Java API (Bliki engine) and Eclipse Plugin | axelcl | Java | Markup fragment (supports ParserFunctions) | On-screen preview, HTML, PDF | Java Wikipedia API and a plugin for the Eclipse IDE for assisted editing of Wikipedia (and anything else MediaWiki-based) |
| FlexBisonParse | Timwi | flex, bison and C | Markup fragment | Custom XML | Intended as an eventual replacement to the parsing code inside MediaWiki itself |
| JAMWiki | Ryan | Java | JAMWiki front-end | HTML | Java Wiki engine that supports MediaWiki syntax. The roadmap also calls for XML import and export that will be compatible with Mediawiki. |
| Live Preview | Pilaf | JavaScript | Markup fragment | HTML | Provides instant preview while editing a page (without reloading). Note: name change pending. |
| Magnus' magic wiki-to-XML converter | Magnus Manske | PHP | Markup fragment or list of article titles | Custom XML, plain text, DocBook XML | Feature-complete parser (except math and timeline); pure PHP, so slow but portable. Can directly generate PDFs if DocBook infrastructure is installed |
| Perl MediaWiki Emulator | Victor Porton | Perl | Markup fragment | (X)HTML+ | Perl MediaWiki formatter developed for a now turned spam site Keywords Homepages. See history for old URLs |
| Perl Wikipedia Toolkit | Michal Jurosz | Perl | XML dump, SQL dump | Own parse tree, WikiMedia markup | Perl Wikipedia Toolkit developed for Computer-assisted Wikipedia translation. (Little functional.) |
| Head | Java | SQL dump | HTML | Hasn't been updated since 2003. Looks dangerous. | |
| Tero-dump | Tero Karvinen | ? | Local wiki installation, including MySQL, PHP, web server | HTML | Scripts for grabbing the whole wiki; does not include images. |
| Textile-J | David Green | Java | Local files | HTML, DocBook, Eclipse Help, extensible | Integration with Ant and Eclipse runtime. Now moved to Mylyn WikiText |
| TomeRaider export | Erik Zachte | Perl | XML dump | TomeRaider database | See en:Wikipedia:TomeRaider database for more details |
| Waikiki | Magnus Manske | C++ | SQL dump (via SQLite) | HTML | abandoned in favour of "flexbisonparse", but has been used inside some experimental "front ends" |
| Wikiwyg | Jim Higson | JavaScript | A live installation of MediaWiki | HTML (via XML) | More than just a parser; attempts to create a fully functional client-side interface |
| wik2dict | Guaka | Python | SQL dump | DICT | |
| wiki2pdf | Stephan Walter | Python (and PHP) | Markup fragment or set of online articles | LaTeX, PDF | Project is incomplete and dormant |
| WikiPDF | Felipe Sanches | Python (and PHP) | One selected article | LaTeX based on templates, PDF | Mediawiki extension that uses Stephan Walter's wiki2pdf as backend. |
| wiki2static | Alfio Puglisi | Perl | SQL dump | HTML | Used during 2004 to generate static html dumps of various languages, now taken offline. |
| Wiki2XML | Magnus Manske | C++ | Markup fragment (?) | Custom XML | Another aborted project on the way to 'flexbisonparse' |
| HTML2FPDF | Renato A. C. | PHP | A PHP class that transforms HTML into a feed for FPDF resulting in a PDF file | HTML -> HTML2FPDF -> FPDF -> PDF | Not specifically for Mediawiki, but easy to install using an updated version of this tool:updated html2fpdf.php. See HTML2FPDF and Mediawiki for more instructions. |
| WikiOnCD | Andrew Rodland | Perl | SQL Dump or markup | HTML, Parse tree (eventually?) | Started out as an offline wiki browser, but grew a parser when Wiki2static turned out to be too limiting. No web presence yet; code is in the SVN. |
| WikiPress Publisher | Erwin Jurschitza | Delphi 7 | XML dump | DocBook XML, Digibib XML, HTML | Used for the German DVD, generates lists of bad markup. |
| WikiTaxi | Ralf Junker | Delphi / Pascal | MediaWiki markup, page or fragment | Node-tree, HTML, potentially others | Hand-crafted parser with template expansion, parser functions (core and extended), tag extensions (<ref>, <source>), wiki text parsing. Used for the WikiTaxi offline reader. |
| Wikifilter | ? | C++ (VS) | XML dumps | HTML | A windows program that uses apache/iis to serve the pages. (May 2006) |
| Wikipedia Dump Reader | Benjamin Thyreau | Python | XML dumps | On screen | Cross platform viewer (GPLv2/~BSD license) |

