This page was moved from MetaWiki.
It probably requires cleanup – please feel free to help out. In addition, some links on the page may be red; respective pages might be found at Meta. Remove this template once cleanup is complete.
This is a proposal page opened some years ago. See Extension:XML_Bridge for a current implementation.
MediaWiki 1.1.0 has a feature for exporting articles to a simplified XML format which wraps the raw wikitext with revision metadata (author, timestamp, comment, title). This proposal is for supplementing this feature with an export to DocBook XML.
DocBook XML is a standard for marking up books and articles in XML. The original standard began in the early 1990s, as an interchange format for printers and desktop-publishing software, and was tuned for creating software documentation. It has become a de facto standard for marking up formatted text documents of all kinds.
Using a standard XML markup for MediaWiki means that we can leverage other work in the field of document processing. There are a number of existing tools -- Open Source and proprietary -- for converting DocBook XML to other formats, such as Rich Text Format, PostScript, and PDF. Some word processors, such as OpenOffice, now support DocBook as an input and output format.
DocBook is a very large markup language -- it has something like 400+ elements in the Document Type Definition (DTD). A more abbreviated version, Simplified DocBook, removes a number of redundant elements. It would probably be sufficient for MediaWiki articles.
Comment: I really like this idea, but perhaps the target ought to be the new DocBook NG schema under development, one of the benefits of which is really easy customizability. It's also namespaced, and allows doing away with doctype declarations.
The user interface for exporting to DocBook would be similar to the "Printable version" link found currently on MediaWiki links. "Save as XML" or "Save as Docbook" would be a separate link that would export the current article to DocBook XML.
It may be useful to dump DocBook information rather than the SQL dumps that are currently used for Wikipedia. This would include author credits and other information necessary to conform with copyleft licenses.
A fair bit of the html in the current content won't validate as xhtml/xml. A possible solution to this could be tidy -asxhtml applied to newly saved/ previewed content. The special wiki tags would need to be ignored by tidy, there are some options in tidy.conf that allow the configuration of custom tags. This would need a fair bit of testing.
Wiki2XML - Separate (abandoned) web-based conversion utility. Convert from MediaWiki wiki text to various formats including DocBook. PHP source code in subversion. Ignores any blocks of preformatted text.