Jump to content

Alternative parsers: Difference between revisions

From mediawiki.org
Content deleted Content added
added Mediawiki2HTML Machine
Line 10: Line 10:


== A non-parser dumper ==
== A non-parser dumper ==
One of the common uses of alternative parsers is to dump wiki content into static form, such as HTML or PDF. [[User:Tim Starling|Tim Starling]] has written a script which ''isn't'' a parser, but uses the MediaWiki internal code to dump an entire wiki to HTML, from the command-line. See [http://cvs.sourceforge.net/viewcvs.py/wikipedia/phase3/maintenance/dumpHTML.php?view=markup dumpHTML.php] and [http://cvs.sourceforge.net/viewcvs.py/wikipedia/phase3/maintenance/dumpHTML.inc?view=markup dumpHTML.inc] in the MediaWiki CVS repository. This has been used to create the [[static dumps]] at http://static.wikipedia.org
One of the common uses of alternative parsers is to dump wiki content into static form, such as HTML or PDF. [[User:Tim Starling|Tim Starling]] has written a script which ''isn't'' a parser, but uses the MediaWiki internal code to dump an entire wiki to HTML, from the command-line. See [http://cvs.sourceforge.net/viewcvs.py/wikipedia/phase3/maintenance/dumpHTML.php?view=markup dumpHTML.php] and [http://cvs.sourceforge.net/viewcvs.py/wikipedia/phase3/maintenance/dumpHTML.inc?view=markup dumpHTML.inc] in the MediaWiki CVS repository. This has been used (years ago) to create the [[static dumps]] at http://static.wikipedia.org


== Known implementations ==
== Known implementations ==

Revision as of 16:44, 7 May 2006

This page is a list of links, descriptions, and status reports of the various alternative MediaWiki parsers - that is, programs and projects, other than MediaWiki itself, which are able or intended to translate MediaWiki's text markup syntax into something else. Some of these have quite narrow purposes, others are possible contenders for replacing the somewhat labyrinthine code that currently drives MediaWiki itself.

Many of the things linked here are likely to be out of date and under-maintained, even abandoned. But in the interest of not duplicating the same work over and over, it seemed sensible to collect together what was "out there".

  • One-pass parser
  • MediaWiki lexer and MediaWiki flexer (not parsers as such, just grammar definitions; probably superceded by/within other projects below)
  • w:en:Wikipedia:Text editor support includes various scripts and extensions for things like syntax highlighting for things like EMACS, Vim, and all sorts; some of these may include rudimentary parsing capabilities.
  • Here are some proof of concept rules for a subset of the Mediawiki markup: these are written in a metalanguage that treats preformatted text as source text, and everything else as comment.

A non-parser dumper

One of the common uses of alternative parsers is to dump wiki content into static form, such as HTML or PDF. Tim Starling has written a script which isn't a parser, but uses the MediaWiki internal code to dump an entire wiki to HTML, from the command-line. See dumpHTML.php and dumpHTML.inc in the MediaWiki CVS repository. This has been used (years ago) to create the static dumps at http://static.wikipedia.org

Known implementations

Name and link Principal author(s) Language Input Output Comments / other info
Mediawiki2HTML Machine Johannes Buchner PHP Markup HTML New project for parsing without the Mediawiki engine.
Eclipse Wikipedia Plugin axelcl Java Markup fragment On-screen preview, HTML, PDF Plugin for the Eclipse IDE for assisted editing of Wikipedia (and anything else MediaWiki-based)
FlexBisonParse Timwi flex, bison and C Markup fragment Custom XML Intended as an eventual replacement to the parsing code inside MediaWiki itself
Live Preview Pilaf JavaScript Markup fragment HTML Provides instant preview while editing a page (without reloading). Note: name change pending.
Magnus' magic wiki-to-XML converter Magnus Manske PHP Markup fragment or list of article titles Custom XML, plain text, DocBook XML Feature-complete parser (except math and timeline); pure PHP, so slow but portable. Can directly generate PDFs if DocBook infrastructure is installed
Perl MediaWiki Emulator Victor Porton Perl Markup fragment (X)HTML+ Perl MediaWiki formatter developed for Keywords Homepages. (Little functional, but developing as of 30 March 2005.)
Head Java SQL dump HTML
Tero-dump Tero Karvinen ? Local wiki install.ation, including MySQL, PHP, web server HTML Scripts for grabbing the whole wiki; does not include images.
TomeRaider export Erik Zachte Perl XML dump TomeRaider database See en:Wikipedia:TomeRaider database for more details
Waikiki Magnus Manske C++ SQL dump (via SQLite) HTML abandoned in favour of "flexbisonparse", but has been used inside some experimental "front ends"
Wikiwyg Jim Higson JavaScript A live installation of MediaWiki HTML (via XML) More than just a parser; attempts to create a fully functional client-side interface
wiki2PDF Stephan Walter Python (and PHP) Markup fragment or set of online articles LaTeX, PDF Project is incomplete and dormant
wiki2static Alfio Puglisi Perl SQL dump HTML Used during 2004 to generate static html dumps of various languages, now taken offline. Development ceased.
Wiki2XML Magnus Manske C++ Markup fragment (?) Custom XML Another aborted project on the way to 'flexbisonparse'
HTML2FPDF Renato A. C. PHP A PHP class that transforms HTML into a feed for FPDF resulting in a PDF file HTML -> HTML2FPDF -> FPDF -> PDF Not specifically for Mediawiki, but easy to install using an updated version of this tool:updated html2fpdf.php. See HTML2FPDF and Mediawiki for more instructions.
WikiOnCD Andrew Rodland Perl SQL Dump or markup HTML, Parse tree (eventually?) Started out as an offline wiki browser, but grew a parser when Wiki2static turned out to be too limiting. No web presence yet; code is in the SVN.
WikiPress Publisher Erwin Jurschitza Delphi 7 XML dump DocBook XML, Digibib XML, HTML Used for the German DVD, generates lists of bad markup.