Extension:XML Bridge/Examples
From MediaWiki.org
Contents |
[edit] Example using MWXHTML
http://localhost:8000/mwxhtml/mediawiki.org/w/Extension:XML_Bridge
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"> <head> <style media="screen, projection" type="text/css"> @import "http://en.wikipedia.org/skins-1.5/common/shared.css?156"; @import "http://en.wikipedia.org//skins-1.5/monobook/main.css?156"; </style> <title>Extension:XML_Bridge</title> </head><body> <div class="mwx.section"> <h1>Extension:XML_Bridge</h1> <div class="mwx.paragraph" /> <table class="ext-infobox ext-status-unstable" style="float:right; min-width:20%; background-color:white"> <strong> <a class="mwx.link.article" href="Manual:Extensions">Manual on MediaWiki Extensions</a> </strong><br /> <strong> <a class="mwx.link.article" href="Extension Matrix">List of MediaWiki Extensions</a> </strong><tr> <td colspan="2" style="padding-top:0.5em"> <a class="mwx.link.image" href="Image:Crystal Clear app error.png"> <img alt="" src="/imageresolver/Crystal Clear app error.png" width="40" />Crystal Clear app error.png</a><span style="font-size: 130%;">XML Bridge</span> <br /> Release status: unstable </td> </tr><tr> <td style="vertical-align:top"> <a class="mwx.link.special" href="Extension#description"> <strong>Description</strong> </a> </td><td> Converts MediaWiki markup to XHTML </td> </tr><tr> <td style="vertical-align:top"> <a class="mwx.link.special" href="Extension#download"> <strong>Download</strong> </a> </td><td> <a class="mwx.link.external" href="http://code.pediapress.com/wiki/wiki/mwlib">http://code.pediapress.com/wiki/wiki/mwlib</a> </td> </tr> </table><div class="mwx.paragraph" /> <div class="mwx.section"> <h2>Introduction </h2> <div class="mwx.paragraph">Wiki syntax, due to its lack of formalization and “ad hoc” nature, is not well- suited for text transformation to other formats. It is desirable to implement support for an intermediate format based on XML, which will make it possible to use standard XML parsing and transformation libraries on the source content. While MediaWiki's native parser exports to XHTML- transitional, the conversion from wiki syntax to XHTML is a lossy one: information about the templates used, the parameters for extensions and images, and so on, is not preserved. This makes many conversions impossible, because the information needed for the conversion is not present.</div> <div class="mwx.paragraph">It is therefore planned to develop software that converts MediaWiki-articles to an XHTML-based representation. XHTML is well suited to derive other formats like PDF or ODF.</div> <div class="mwx.paragraph">As much semantic information from the wiki source text as possible will be preserved by using XHTML features such as namespaces.</div> <div class="mwx.paragraph">The transformation to an XHTML-based format that preserves semantic information will enable a vast number of uses by programmers, and will also allow a long-term transition to XML as a backend storage format for wiki articles.</div> <div class="mwx.paragraph">Development is assigned to <a class="mwx.link.external" href="http://pediapress.com"> PediaPress</a> and funded by the <a class="mwx.link.external" href="http://col.org"> Commonwealth of Learning</a> </div> </div><div class="mwx.section"> <h2>Current Status </h2> <div class="mwx.paragraph">An initial alpha code release is available as part of the <a class="mwx.link.external" href="http://code.pediapress.com"> mwlib</a> python MediaWiki library. (see <a class="mwx.link.external" href="http://code.pediapress.com/hg/mwlib/file/tip/mwlib/xhtmlwriter.py"> xhtmlwriter.py</a>). Feel free to use and comment on it.</div><div class="mwx.paragraph">Although this code is still lacking some features it may be a good starting point to develop alternative XML output formats.</div> <div class="mwx.paragraph">There is a <a class="mwx.link.external" href="http://groups.google.com/group/mwlib"> google group</a> for support and discussion of mwlib and derived applications.</div><div class="mwx.paragraph">See <a class="mwx.link.external" href="http://code.pediapress.com/wiki/wiki/mwlib"> this page for installation instructions</a>.</div> </div><div class="mwx.section"> <h2>Current Implementation </h2> <div class="mwx.paragraph">The initial implementation is based on XHTML1.0 transitional extended by <a class="mwx.link.external" href="http://en.wikipedia.org/wiki/Microformats"> Microformats</a> where necessary. </div><div class="mwx.paragraph">This is to support the <a class="mwx.link.external" href="http://meta.wikimedia.org/wiki/Help:HTML_in_wikitext"> presentational HTML4.01 Elements allowed in wikitext</a> by MediaWiki.</div><div class="mwx.paragraph">A future implementation could be based on XHTML1.1 strict plus MathML.</div> <div class="mwx.paragraph">You may want to have a look at the proposed XML-Format <a class="mwx.link.article" href="Extension:XML Bridge/MWXHTML">Extension:XML Bridge/MWXHTML</a> </div><div class="mwx.paragraph">The XML is generated based on the parse-tree generated by the <a class="mwx.link.external" href="http://code.pediapress.com"> mwlib</a> MediaWiki-markup parsing library.</div> </div><div class="mwx.section"> <h2>Development & Evaluation </h2> <div class="mwx.paragraph">The xhtmlwriter.py is part of the <a class="mwx.link.external" href="http://code.pediapress.com"> mwlib python library</a>. See <a class="mwx.link.external" href="http://code.pediapress.com"> this page</a> for installation instructions.</div><div class="mwx.paragraph">There is a xml-server app in the sandbox directory, which acts as a Mediawiki (which must support the new API) proxy, converting wikitext to xhtml as you browse.</div> </div><div class="mwx.section"> <h2>Long Term Goal </h2> <div class="mwx.paragraph">... is to have a solid XML-Export/Import that allows to replace the MediaWiki-Markup with a XML-representation, this may coincide with WYSIWIG-editing in MediaWiki.</div> <div class="mwx.paragraph">Steps toward this goal: </div> <ul> <li> initial release of XML-Exporter code </li> <li> develop XML->mw-markup converter so one can convert back and forth </li> <li> discuss and incrementally improve the xml-markup </li> <li> discuss whether usage of certain html-styling and template usage can be labeled deprecated</li> <li> check edits and notify users if using wrong or deprecated markup </li> <li> fix or remove all broken/deprecated markup,html-styling,inappropriate template usage</li> <li> switch to xml</li> </ul><div class="mwx.paragraph" /> </div><div class="mwx.section"> <h2>Other considered Implementation Options </h2> <div class="mwx.section"> <h3>MediaWiki specific XML Language </h3> <div class="mwx.paragraph"> <a class="mwx.link.external" href="http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages"> Don’t Invent XML Languages</a> - at least for now.</div> </div><div class="mwx.section"> <h3>DocBook </h3> <div class="mwx.paragraph">DocBook is a very large markup language. A more abbreviated version, Simplified DocBook, removes a number of redundant elements. DocBook NG schema (customizable namespaces) is under development. </div> <div class="mwx.paragraph">Conclusion: DocBook is overly complicated while still lacking features in order to fully support a lossless representation of MW markup.</div> </div><div class="mwx.section"> <h3>XHTML </h3> <div class="mwx.paragraph">XHTML is well supported by many applications and libraries. XHTML can be mixed with other namespaces (xhtml remains if stripped). Currently MW-markup allows to mix in HTML and even css styles. Therefore a lossless XML representation would need to support a subset of the XHTML specification.</div> <div class="mwx.paragraph">MW-markup expresses semantics (e.g. sections) which are not supported by XHTML1.0. Hence pure XHTML is not sufficient. </div> </div><div class="mwx.section"> <h3>XHTML + Additional namespace </h3> <div class="mwx.paragraph">We considered to combine XHTML with a proprietary MediaWiki specific namesspace (xmlns:mwx) using the best of two worlds (compatibility with existing tools and lossless representation).</div> <div class="mwx.paragraph">For e.g. a category-link could be written as:</div> <div class="mwx.paragraph"> <code class="mwx.source"><a href="Kategorie:Extensions" mwx:linktype="category">Extensions</a></code> </div><div class="mwx.paragraph">Other suitable namespaces can be included like <a class="mwx.link.external" href="http://www.w3.org/Math/"> MathML</a> </div><div class="mwx.paragraph">This still requires to invent and add a new XML-Language which is considered harmful(see above).</div> </div><div class="mwx.section"> <h3>XHTML + Microformats </h3> <div class="mwx.paragraph">Use Microformats to semantically annotate generated XHTML.</div> <div class="mwx.paragraph"> <code class="mwx.source"> <div class="mwx.section" title="some heading"> <h2>some heading</h2> <a href="SomePage" class="mwx.link.internal">some page within the same wiki</a> </div> </code> </div><div class="mwx.paragraph"> <a class="mwx.link.external" href="http://microformats.org/wiki?title=mediawiki-mark-up-issues&redirect=no"> Discussion on microformats in MediaWikis</a> </div><div class="mwx.paragraph">See the planned implementation: <a class="mwx.link.article" href="Extension:XML_Bridge/MWXHTML">Extension:XML_Bridge/MWXHTML</a> </div> </div> </div><div class="mwx.section"> <h2>Open Issues </h2> <div class="mwx.section"> <h3>Templates </h3> <ul> <li> Currently it seem impossible to correctly mark all uses of templates within the XML output.</li> </ul> </div> </div><div class="mwx.section"> <h2>Related Projects </h2> <ul> <li> <a class="mwx.link.article" href="Extension:Wiki2xml">Extension:Wiki2xml</a> (abandoned)</li><li> <a class="mwx.link.article" href="DocBook_XML_export">DocBook_XML_export</a> (never started)</li><li> <a class="mwx.link.article" href="Extension:Open_Office_Export">Extension:Open_Office_Export</a> (abandoned)</li><li> <a class="mwx.link.article" href="Extension:Data Transfer">Extension:Data Transfer</a> </li><li> <a class="mwx.link.external" href="http://cnx.org/help/CNXMLLanguage"> Connexions XML Language</a> </li><li> <a class="mwx.link.external" href="http://ilps.science.uva.nl/WikiXML/xmlformat.php"> WikiXML: XML format</a> </li><li> <a class="mwx.link.external" href="http://www.riehle.org/wp-content/uploads/2008/01/a5-junghans.pdf"> An XML Interchange Format for Wiki Creole 1.0</a> </li><li> <a class="mwx.link.external" href="http://web.archive.org/web/20060704163408/doc-book.sourceforge.net/homepage/"> DocBook Wiki</a> </li><li> <a class="mwx.link.external" href="http://meta.wikimedia.org/wiki/XHTML"> about XHTML produced by MediaWiki</a> </li><li> <a class="mwx.link.external" href="http://meta.wikimedia.org/wiki/Wikitext_standard"> Wikitext Standard</a> ... describe and formalize a 1.0 version of the Wikitext language, based on what is used currently. (last edit: 29 June 2007)</li><li> <a class="mwx.link.external" href="http://www.usemod.com/cgi-bin/mb.pl?WikiMarkupStandard"> WikiMarkup Standard</a> discusses ways to allow visitors from one wiki engine to edit pages on other wikis without having to learn their WikiSyntax. (last edit: May 10, 2008 <strong>active discussion</strong>)</li><li> <a class="mwx.link.external" href="http://web.archive.org/web/20060716015316/http://hula-project.org/Wiki_Conversion"> page collecting info on "wiki conversion"</a> (last modified 15:40, 21 Jun 2006.)</li><li> <strong> <a class="mwx.link.article" href="Wikipedia_DTD">Wikipedia DTD</a> (last real edit 9 April 2006)</strong> </li> </ul> </div><div class="mwx.section"> <h2>See also </h2> <ul> <li> <a class="mwx.link.external" href="http://en.wikipedia.org/wiki/MediaWiki#Limitations"> MediaWiki Limitations</a> </li><li> <a class="mwx.link.article" href="Alternative parsers">Alternative parsers</a> </li><li> <a class="mwx.link.external" href="http://blogs.sun.com/lebo/entry/who_isn_t_using_a"> Can you roundtrip OpenOffice.org to MediaWiki and back again?</a> </li><li> <a class="mwx.link.external" href="http://code.pediapress.com/">http://code.pediapress.com/</a> </li><li> <a class="mwx.link.external" href="http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages"> Don’t Invent XML Languages</a> </li><li> <a class="mwx.link.external" href="http://www.codinghorror.com/blog/archives/001116.html"> Is HTML a Humane Markup Language?</a> </li><li> <a class="mwx.link.external" href="http://freshmeat.net/projects/xmldiff/"> XML Diff</a> </li><li> <a class="mwx.link.article" href="WYSIWYG_editor">WYSIWYG_editor</a> </li> </ul><div class="mwx.paragraph" /> </div> </div> </body><!-- Article 'Extension:XML_Bridge': 12 children Paragraph '': 0 children Table '' {'style': {u'float': u'right', u'min-width': u'20%', u'background-color': u'white'}, u'class': u'ext-infobox ext-status-unstable'}: 4 children Caption '' {}: 4 children u' ' Strong '': 1 children Link '': 1 children u'Manual on MediaWiki Extensions' BreakingReturn u'br': 0 children Strong '': 1 children Link '': 1 children u'List of MediaWiki Extensions' Row '' {u'class': u'ext-header'}: 1 children Cell '' {u'colspan': 2, 'style': {u'padding-top': u'0.5em'}}: 5 children ImageLink '': 1 children u'Crystal Clear app error.png' Span u'span': 1 children u'XML Bridge' BreakingReturn u'br': 0 children u' Release status: unstable ' CategoryLink '': 1 children u'unstable extensions' Row '' {}: 2 children Cell '' {'style': {u'vertical-align': u'top'}}: 1 children SpecialLink '': 1 children Strong '': 1 children u'Description' Cell '' {}: 1 children u' Converts MediaWiki markup to XHTML ' Row '' {}: 2 children Cell '' {'style': {u'vertical-align': u'top'}}: 1 children SpecialLink '': 1 children Strong '': 1 children u'Download' Cell '' {}: 1 children URL u'http://code.pediapress.com/wiki/wiki/mwlib': 0 children Paragraph '': 1 children CategoryLink '': 1 children u'All extensions' Section '': 5 children Paragraph '': 1 children u"Wiki syntax, due to its lack of formalization and \u201cad hoc\u201d nature, is not well- suited for text transformation to other formats. It is desirable to implement support for an intermediate format based on XML, which will make it possible to use standard XML parsing and transformation libraries on the source content. While MediaWiki's native parser exports to XHTML- transitional, the conversion from wiki syntax to XHTML is a lossy one: information about the templates used, the parameters for extensions and images, and so on, is not preserved. This makes many conversions impossible, because the information needed for the conversion is not present." Paragraph '': 1 children u'It is therefore planned to develop software that converts MediaWiki-articles to an XHTML-based representation. XHTML is well suited to derive other formats like PDF or ODF.' Paragraph '': 1 children u'As much semantic information from the wiki source text as possible will be preserved by using XHTML features such as namespaces.' Paragraph '': 1 children u'The transformation to an XHTML-based format that preserves semantic information will enable a vast number of uses by programmers, and will also allow a long-term transition to XML as a backend storage format for wiki articles.' Paragraph '': 4 children u'Development is assigned to ' NamedURL u'http://pediapress.com': 1 children u' PediaPress' u' and funded by the ' NamedURL u'http://col.org': 1 children u' Commonwealth of Learning' Section '': 4 children Paragraph '': 5 children u'An initial alpha code release is available as part of the ' NamedURL u'http://code.pediapress.com': 1 children u' mwlib' u' python MediaWiki library. (see ' NamedURL u'http://code.pediapress.com/hg/mwlib/file/tip/mwlib/xhtmlwriter.py': 1 children u' xhtmlwriter.py' u'). Feel free to use and comment on it.' Paragraph '': 1 children u'Although this code is still lacking some features it may be a good starting point to develop alternative XML output formats.' Paragraph '': 3 children u'There is a ' NamedURL u'http://groups.google.com/group/mwlib': 1 children u' google group' u' for support and discussion of mwlib and derived applications.' Paragraph '': 3 children u'See ' NamedURL u'http://code.pediapress.com/wiki/wiki/mwlib': 1 children u' this page for installation instructions' u'.' Section '': 5 children Paragraph '': 3 children u'The initial implementation is based on XHTML1.0 transitional extended by ' NamedURL u'http://en.wikipedia.org/wiki/Microformats': 1 children u' Microformats' u' where necessary. ' Paragraph '': 3 children u'This is to support the ' NamedURL u'http://meta.wikimedia.org/wiki/Help:HTML_in_wikitext': 1 children u' presentational HTML4.01 Elements allowed in wikitext' u' by MediaWiki.' Paragraph '': 1 children u'A future implementation could be based on XHTML1.1 strict plus MathML.' Paragraph '': 2 children u'You may want to have a look at the proposed XML-Format ' Link '': 1 children u'Extension:XML Bridge/MWXHTML' Paragraph '': 3 children u'The XML is generated based on the parse-tree generated by the ' NamedURL u'http://code.pediapress.com': 1 children u' mwlib' u' MediaWiki-markup parsing library.' Section '': 2 children Paragraph '': 5 children u'The xhtmlwriter.py is part of the ' NamedURL u'http://code.pediapress.com': 1 children u' mwlib python library' u'. See ' NamedURL u'http://code.pediapress.com': 1 children u' this page' u' for installation instructions.' Paragraph '': 1 children u'There is a xml-server app in the sandbox directory, which acts as a Mediawiki (which must support the new API) proxy, converting wikitext to xhtml as you browse.' Section '': 4 children Paragraph '': 1 children u'... is to have a solid XML-Export/Import that allows to replace the MediaWiki-Markup with a XML-representation, this may coincide with WYSIWIG-editing in MediaWiki.' Paragraph '': 1 children u'Steps toward this goal: ' ItemList '': 7 children Item '': 1 children u' initial release of XML-Exporter code ' Item '': 1 children u' develop XML->mw-markup converter so one can convert back and forth ' Item '': 1 children u' discuss and incrementally improve the xml-markup ' Item '': 1 children u' discuss whether usage of certain html-styling and template usage can be labeled deprecated' Item '': 1 children u' check edits and notify users if using wrong or deprecated markup ' Item '': 1 children u' fix or remove all broken/deprecated markup,html-styling,inappropriate template usage' Item '': 1 children u' switch to xml' Paragraph '': 0 children Section '': 5 children Section '': 1 children Paragraph '': 2 children NamedURL u'http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages': 1 children u' Don\u2019t Invent XML Languages' u' - at least for now.' Section '': 2 children Paragraph '': 1 children u'DocBook is a very large markup language. A more abbreviated version, Simplified DocBook, removes a number of redundant elements. DocBook NG schema (customizable namespaces) is under development. ' Paragraph '': 1 children u'Conclusion: DocBook is overly complicated while still lacking features in order to fully support a lossless representation of MW markup.' Section '': 2 children Paragraph '': 1 children u'XHTML is well supported by many applications and libraries. XHTML can be mixed with other namespaces (xhtml remains if stripped). Currently MW-markup allows to mix in HTML and even css styles. Therefore a lossless XML representation would need to support a subset of the XHTML specification.' Paragraph '': 1 children u'MW-markup expresses semantics (e.g. sections) which are not supported by XHTML1.0. Hence pure XHTML is not sufficient. ' Section '': 5 children Paragraph '': 1 children u'We considered to combine XHTML with a proprietary MediaWiki specific namesspace (xmlns:mwx) using the best of two worlds (compatibility with existing tools and lossless representation).' Paragraph '': 1 children u'For e.g. a category-link could be written as:' Paragraph '': 1 children Source u'source': 1 children u'<a href="Kategorie:Extensions" mwx:linktype="category">Extensions</a>' Paragraph '': 2 children u'Other suitable namespaces can be included like ' NamedURL u'http://www.w3.org/Math/': 1 children u' MathML' Paragraph '': 1 children u'This still requires to invent and add a new XML-Language which is considered harmful(see above).' Section '': 4 children Paragraph '': 1 children u'Use Microformats to semantically annotate generated XHTML.' Paragraph '': 1 children Source u'source': 1 children u'\n <div class="mwx.section" title="some heading">\n <h2>some heading</h2>\n \n <a href="SomePage" class="mwx.link.internal">some page within the same wiki</a>\n \n </div>\n' Paragraph '': 1 children NamedURL u'http://microformats.org/wiki?title=mediawiki-mark-up-issues&redirect=no': 1 children u' Discussion on microformats in MediaWikis' Paragraph '': 2 children u'See the planned implementation: ' Link '': 1 children u'Extension:XML_Bridge/MWXHTML' Section '': 1 children Section '': 1 children ItemList '': 1 children Item '': 1 children u' Currently it seem impossible to correctly mark all uses of templates within the XML output.' Section '': 1 children ItemList '': 13 children Item '': 2 children Link '': 1 children u'Extension:Wiki2xml' u' (abandoned)' Item '': 2 children Link '': 1 children u'DocBook_XML_export' u' (never started)' Item '': 2 children Link '': 1 children u'Extension:Open_Office_Export' u' (abandoned)' Item '': 1 children Link '': 1 children u'Extension:Data Transfer' Item '': 1 children NamedURL u'http://cnx.org/help/CNXMLLanguage': 1 children u' Connexions XML Language' Item '': 1 children NamedURL u'http://ilps.science.uva.nl/WikiXML/xmlformat.php': 1 children u' WikiXML: XML format' Item '': 1 children NamedURL u'http://www.riehle.org/wp-content/uploads/2008/01/a5-junghans.pdf': 1 children u' An XML Interchange Format for Wiki Creole 1.0' Item '': 1 children NamedURL u'http://web.archive.org/web/20060704163408/doc-book.sourceforge.net/homepage/': 1 children u' DocBook Wiki' Item '': 1 children NamedURL u'http://meta.wikimedia.org/wiki/XHTML': 1 children u' about XHTML produced by MediaWiki' Item '': 2 children NamedURL u'http://meta.wikimedia.org/wiki/Wikitext_standard': 1 children u' Wikitext Standard' u' ... describe and formalize a 1.0 version of the Wikitext language, based on what is used currently. (last edit: 29 June 2007)' Item '': 4 children NamedURL u'http://www.usemod.com/cgi-bin/mb.pl?WikiMarkupStandard': 1 children u' WikiMarkup Standard' u' discusses ways to allow visitors from one wiki engine to edit pages on other wikis without having to learn their WikiSyntax. (last edit: May 10, 2008 ' Strong '': 1 children u'active discussion' u')' Item '': 2 children NamedURL u'http://web.archive.org/web/20060716015316/http://hula-project.org/Wiki_Conversion': 1 children u' page collecting info on "wiki conversion"' u' (last modified 15:40, 21 Jun 2006.)' Item '': 1 children Strong '': 2 children Link '': 1 children u'Wikipedia DTD' u' (last real edit 9 April 2006)' Section '': 2 children ItemList '': 8 children Item '': 1 children NamedURL u'http://en.wikipedia.org/wiki/MediaWiki#Limitations': 1 children u' MediaWiki Limitations' Item '': 1 children Link '': 1 children u'Alternative parsers' Item '': 1 children NamedURL u'http://blogs.sun.com/lebo/entry/who_isn_t_using_a': 1 children u' Can you roundtrip OpenOffice.org to MediaWiki and back again?' Item '': 1 children URL u'http://code.pediapress.com/': 0 children Item '': 1 children NamedURL u'http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages': 1 children u' Don\u2019t Invent XML Languages' Item '': 1 children NamedURL u'http://www.codinghorror.com/blog/archives/001116.html': 1 children u' Is HTML a Humane Markup Language?' Item '': 1 children NamedURL u'http://freshmeat.net/projects/xmldiff/': 1 children u' XML Diff' Item '': 1 children Link '': 1 children u'WYSIWYG_editor' Paragraph '': 1 children CategoryLink '': 1 children u'Data_extraction_extensions' --> </html>
[edit] Example using MWXML
http://localhost:8000/mwxml/mediawiki.org/w/Extension:XML_Bridge
<?xml version="1.0" encoding="UTF-8"?> <mwlibxml> <article caption="Extension:XML_Bridge"> <paragraph> <table class="ext-infobox ext-status-unstable" style="float:right; min-width:20%; background-color:white"> <caption> <strong> <link target="Manual:Extensions">Manual on MediaWiki Extensions</link> </strong><breakingreturn caption="br" starttext="&lt;br /&gt;" /> <strong> <link target="Extension Matrix">List of MediaWiki Extensions</link> </strong> </caption><row class="ext-header"> <cell colspan="2" style="padding-top:0.5em"> <imagelink align="left" target="Crystal Clear app error.png" width="40">Crystal Clear app error.png</imagelink> <span caption="span" endtext="&lt;/span&gt;" starttext="&lt;span style=&quot;font-size: 130%;&quot;&gt;" style="font-size:130%">XML Bridge</span> <breakingreturn caption="br" starttext="&lt;br /&gt;" /> Release status: unstable <categorylink target="unstable extensions">unstable extensions</categorylink> </cell> </row><row /> <row /> <row> <cell style="vertical-align:top"> <speciallink ns="template" target="Extension#description"> <strong>Description</strong> </speciallink> </cell><cell> Converts MediaWiki markup to XHTML </cell> </row><row /> <row /> <row /> <row /> <row> <cell style="vertical-align:top"> <speciallink ns="template" target="Extension#download"> <strong>Download</strong> </speciallink> </cell><cell> <url caption="http://code.pediapress.com/wiki/wiki/mwlib" /> </cell> </row><row /> <row /> <row /> <row /> </table><categorylink target="All extensions">All extensions</categorylink> </paragraph><section level="2"> <node>Introduction </node> <paragraph>Wiki syntax, due to its lack of formalization and “ad hoc” nature, is not well- suited for text transformation to other formats. It is desirable to implement support for an intermediate format based on XML, which will make it possible to use standard XML parsing and transformation libraries on the source content. While MediaWiki's native parser exports to XHTML- transitional, the conversion from wiki syntax to XHTML is a lossy one: information about the templates used, the parameters for extensions and images, and so on, is not preserved. This makes many conversions impossible, because the information needed for the conversion is not present.</paragraph> <paragraph>It is therefore planned to develop software that converts MediaWiki-articles to an XHTML-based representation. XHTML is well suited to derive other formats like PDF or ODF.</paragraph> <paragraph>As much semantic information from the wiki source text as possible will be preserved by using XHTML features such as namespaces.</paragraph> <paragraph>The transformation to an XHTML-based format that preserves semantic information will enable a vast number of uses by programmers, and will also allow a long-term transition to XML as a backend storage format for wiki articles.</paragraph> <paragraph>Development is assigned to <namedurl caption="http://pediapress.com"> PediaPress</namedurl> and funded by the <namedurl caption="http://col.org"> Commonwealth of Learning</namedurl> </paragraph> </section><section level="2"> <node>Current Status </node> <paragraph>An initial alpha code release is available as part of the <namedurl caption="http://code.pediapress.com"> mwlib</namedurl> python MediaWiki library. (see <namedurl caption="http://code.pediapress.com/hg/mwlib/file/tip/mwlib/xhtmlwriter.py"> xhtmlwriter.py</namedurl>). Feel free to use and comment on it.</paragraph><paragraph>Although this code is still lacking some features it may be a good starting point to develop alternative XML output formats.</paragraph> <paragraph>There is a <namedurl caption="http://groups.google.com/group/mwlib"> google group</namedurl> for support and discussion of mwlib and derived applications.</paragraph><paragraph>See <namedurl caption="http://code.pediapress.com/wiki/wiki/mwlib"> this page for installation instructions</namedurl>.</paragraph> </section><section level="2"> <node>Current Implementation </node> <paragraph>The initial implementation is based on XHTML1.0 transitional extended by <namedurl caption="http://en.wikipedia.org/wiki/Microformats"> Microformats</namedurl> where necessary. </paragraph><paragraph>This is to support the <namedurl caption="http://meta.wikimedia.org/wiki/Help:HTML_in_wikitext"> presentational HTML4.01 Elements allowed in wikitext</namedurl> by MediaWiki.</paragraph><paragraph>A future implementation could be based on XHTML1.1 strict plus MathML.</paragraph> <paragraph>You may want to have a look at the proposed XML-Format <link target="Extension:XML Bridge/MWXHTML">Extension:XML Bridge/MWXHTML</link> </paragraph><paragraph>The XML is generated based on the parse-tree generated by the <namedurl caption="http://code.pediapress.com"> mwlib</namedurl> MediaWiki-markup parsing library.</paragraph> </section><section level="2"> <node>Development & Evaluation </node> <paragraph>The xhtmlwriter.py is part of the <namedurl caption="http://code.pediapress.com"> mwlib python library</namedurl>. See <namedurl caption="http://code.pediapress.com"> this page</namedurl> for installation instructions.</paragraph><paragraph>There is a xml-server app in the sandbox directory, which acts as a Mediawiki (which must support the new API) proxy, converting wikitext to xhtml as you browse.</paragraph> </section><section level="2"> <node>Long Term Goal </node> <paragraph>... is to have a solid XML-Export/Import that allows to replace the MediaWiki-Markup with a XML-representation, this may coincide with WYSIWIG-editing in MediaWiki.</paragraph> <paragraph>Steps toward this goal: <itemlist> <item prefix="*"> initial release of XML-Exporter code </item> <item prefix="*"> develop XML->mw-markup converter so one can convert back and forth </item> <item prefix="*"> discuss and incrementally improve the xml-markup </item> <item prefix="*"> discuss whether usage of certain html-styling and template usage can be labeled deprecated</item> <item prefix="*"> check edits and notify users if using wrong or deprecated markup </item> <item prefix="*"> fix or remove all broken/deprecated markup,html-styling,inappropriate template usage</item> <item prefix="*"> switch to xml</item> </itemlist> </paragraph> </section><section level="2"> <node>Other considered Implementation Options </node> <section level="3"> <node>MediaWiki specific XML Language </node> <paragraph> <namedurl caption="http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages"> Don’t Invent XML Languages</namedurl> - at least for now.</paragraph> </section><section level="3"> <node>DocBook </node> <paragraph>DocBook is a very large markup language. A more abbreviated version, Simplified DocBook, removes a number of redundant elements. DocBook NG schema (customizable namespaces) is under development. </paragraph> <paragraph>Conclusion: DocBook is overly complicated while still lacking features in order to fully support a lossless representation of MW markup.</paragraph> </section><section level="3"> <node>XHTML </node> <paragraph>XHTML is well supported by many applications and libraries. XHTML can be mixed with other namespaces (xhtml remains if stripped). Currently MW-markup allows to mix in HTML and even css styles. Therefore a lossless XML representation would need to support a subset of the XHTML specification.</paragraph> <paragraph>MW-markup expresses semantics (e.g. sections) which are not supported by XHTML1.0. Hence pure XHTML is not sufficient. </paragraph> </section><section level="3"> <node>XHTML + Additional namespace </node> <paragraph>We considered to combine XHTML with a proprietary MediaWiki specific namesspace (xmlns:mwx) using the best of two worlds (compatibility with existing tools and lossless representation).</paragraph> <paragraph>For e.g. a category-link could be written as:</paragraph> <paragraph> <xsource caption="source" lang="xml"><a href="Kategorie:Extensions" mwx:linktype="category">Extensions</a></xsource> </paragraph><paragraph>Other suitable namespaces can be included like <namedurl caption="http://www.w3.org/Math/"> MathML</namedurl> </paragraph><paragraph>This still requires to invent and add a new XML-Language which is considered harmful(see above).</paragraph> </section><section level="3"> <node>XHTML + Microformats </node> <paragraph>Use Microformats to semantically annotate generated XHTML.</paragraph> <paragraph> <xsource caption="source" lang="xml"> <div class="mwx.section" title="some heading"> <h2>some heading</h2> <a href="SomePage" class="mwx.link.internal">some page within the same wiki</a> </div> </xsource> </paragraph><paragraph> <namedurl caption="http://microformats.org/wiki?title=mediawiki-mark-up-issues&amp;redirect=no"> Discussion on microformats in MediaWikis</namedurl> </paragraph><paragraph>See the planned implementation: <link target="Extension:XML_Bridge/MWXHTML">Extension:XML_Bridge/MWXHTML</link> </paragraph> </section> </section><section level="2"> <node>Open Issues </node> <section level="3"> <node>Templates </node> <paragraph> <itemlist> <item prefix="*"> Currently it seem impossible to correctly mark all uses of templates within the XML output.</item> </itemlist> </paragraph> </section> </section><section level="2"> <node>Related Projects </node> <paragraph> <itemlist> <item prefix="*"> <link target="Extension:Wiki2xml">Extension:Wiki2xml</link> (abandoned)</item><item prefix="*"> <link target="DocBook_XML_export">DocBook_XML_export</link> (never started)</item><item prefix="*"> <link target="Extension:Open_Office_Export">Extension:Open_Office_Export</link> (abandoned)</item><item prefix="*"> <link target="Extension:Data Transfer">Extension:Data Transfer</link> </item><item prefix="*"> <namedurl caption="http://cnx.org/help/CNXMLLanguage"> Connexions XML Language</namedurl> </item><item prefix="*"> <namedurl caption="http://ilps.science.uva.nl/WikiXML/xmlformat.php"> WikiXML: XML format</namedurl> </item><item prefix="*"> <namedurl caption="http://www.riehle.org/wp-content/uploads/2008/01/a5-junghans.pdf"> An XML Interchange Format for Wiki Creole 1.0</namedurl> </item><item prefix="*"> <namedurl caption="http://web.archive.org/web/20060704163408/doc-book.sourceforge.net/homepage/"> DocBook Wiki</namedurl> </item><item prefix="*"> <namedurl caption="http://meta.wikimedia.org/wiki/XHTML"> about XHTML produced by MediaWiki</namedurl> </item><item prefix="*"> <namedurl caption="http://meta.wikimedia.org/wiki/Wikitext_standard"> Wikitext Standard</namedurl> ... describe and formalize a 1.0 version of the Wikitext language, based on what is used currently. (last edit: 29 June 2007)</item><item prefix="*"> <namedurl caption="http://www.usemod.com/cgi-bin/mb.pl?WikiMarkupStandard"> WikiMarkup Standard</namedurl> discusses ways to allow visitors from one wiki engine to edit pages on other wikis without having to learn their WikiSyntax. (last edit: May 10, 2008 <strong>active discussion</strong>)</item><item prefix="*"> <namedurl caption="http://web.archive.org/web/20060716015316/http://hula-project.org/Wiki_Conversion"> page collecting info on "wiki conversion"</namedurl> (last modified 15:40, 21 Jun 2006.)</item><item prefix="*"> <strong> <link target="Wikipedia_DTD">Wikipedia DTD</link> (last real edit 9 April 2006)</strong> </item> </itemlist> </paragraph> </section><section level="2"> <node>See also </node> <paragraph> <itemlist> <item prefix="*"> <namedurl caption="http://en.wikipedia.org/wiki/MediaWiki#Limitations"> MediaWiki Limitations</namedurl> </item><item prefix="*"> <link target="Alternative parsers">Alternative parsers</link> </item><item prefix="*"> <namedurl caption="http://blogs.sun.com/lebo/entry/who_isn_t_using_a"> Can you roundtrip OpenOffice.org to MediaWiki and back again?</namedurl> </item><item prefix="*"> <url caption="http://code.pediapress.com/" /> </item><item prefix="*"> <namedurl caption="http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages"> Don’t Invent XML Languages</namedurl> </item><item prefix="*"> <namedurl caption="http://www.codinghorror.com/blog/archives/001116.html"> Is HTML a Humane Markup Language?</namedurl> </item><item prefix="*"> <namedurl caption="http://freshmeat.net/projects/xmldiff/"> XML Diff</namedurl> </item><item prefix="*"> <link target="WYSIWYG_editor">WYSIWYG_editor</link> </item> </itemlist> </paragraph><paragraph> <categorylink target="Data_extraction_extensions">Data_extraction_extensions</categorylink> </paragraph> </section> </article><!-- Article 'Extension:XML_Bridge': 10 children Paragraph '': 2 children Table '' {'style': {u'float': u'right', u'min-width': u'20%', u'background-color': u'white'}, u'class': u'ext-infobox ext-status-unstable'}: 14 children Caption '' {}: 4 children u' ' Strong '': 1 children Link '': 1 children u'Manual on MediaWiki Extensions' BreakingReturn u'br': 0 children Strong '': 1 children Link '': 1 children u'List of MediaWiki Extensions' Row '' {u'class': u'ext-header'}: 1 children Cell '' {u'colspan': 2, 'style': {u'padding-top': u'0.5em'}}: 5 children ImageLink '': 1 children u'Crystal Clear app error.png' Span u'span': 1 children u'XML Bridge' BreakingReturn u'br': 0 children u' Release status: unstable ' CategoryLink '': 1 children u'unstable extensions' Row '' {}: 0 children Row '' {}: 0 children Row '' {}: 2 children Cell '' {'style': {u'vertical-align': u'top'}}: 1 children SpecialLink '': 1 children Strong '': 1 children u'Description' Cell '' {}: 1 children u' Converts MediaWiki markup to XHTML ' Row '' {}: 0 children Row '' {}: 0 children Row '' {}: 0 children Row '' {}: 0 children Row '' {}: 2 children Cell '' {'style': {u'vertical-align': u'top'}}: 1 children SpecialLink '': 1 children Strong '': 1 children u'Download' Cell '' {}: 1 children URL u'http://code.pediapress.com/wiki/wiki/mwlib': 0 children Row '' {}: 0 children Row '' {}: 0 children Row '' {}: 0 children Row '' {}: 0 children CategoryLink '': 1 children u'All extensions' Section '': 6 children Node '': 1 children u'Introduction ' Paragraph '': 1 children u"Wiki syntax, due to its lack of formalization and \u201cad hoc\u201d nature, is not well- suited for text transformation to other formats. It is desirable to implement support for an intermediate format based on XML, which will make it possible to use standard XML parsing and transformation libraries on the source content. While MediaWiki's native parser exports to XHTML- transitional, the conversion from wiki syntax to XHTML is a lossy one: information about the templates used, the parameters for extensions and images, and so on, is not preserved. This makes many conversions impossible, because the information needed for the conversion is not present." Paragraph '': 1 children u'It is therefore planned to develop software that converts MediaWiki-articles to an XHTML-based representation. XHTML is well suited to derive other formats like PDF or ODF.' Paragraph '': 1 children u'As much semantic information from the wiki source text as possible will be preserved by using XHTML features such as namespaces.' Paragraph '': 1 children u'The transformation to an XHTML-based format that preserves semantic information will enable a vast number of uses by programmers, and will also allow a long-term transition to XML as a backend storage format for wiki articles.' Paragraph '': 4 children u'Development is assigned to ' NamedURL u'http://pediapress.com': 1 children u' PediaPress' u' and funded by the ' NamedURL u'http://col.org': 1 children u' Commonwealth of Learning' Section '': 5 children Node '': 1 children u'Current Status ' Paragraph '': 5 children u'An initial alpha code release is available as part of the ' NamedURL u'http://code.pediapress.com': 1 children u' mwlib' u' python MediaWiki library. (see ' NamedURL u'http://code.pediapress.com/hg/mwlib/file/tip/mwlib/xhtmlwriter.py': 1 children u' xhtmlwriter.py' u'). Feel free to use and comment on it.' Paragraph '': 1 children u'Although this code is still lacking some features it may be a good starting point to develop alternative XML output formats.' Paragraph '': 3 children u'There is a ' NamedURL u'http://groups.google.com/group/mwlib': 1 children u' google group' u' for support and discussion of mwlib and derived applications.' Paragraph '': 3 children u'See ' NamedURL u'http://code.pediapress.com/wiki/wiki/mwlib': 1 children u' this page for installation instructions' u'.' Section '': 6 children Node '': 1 children u'Current Implementation ' Paragraph '': 3 children u'The initial implementation is based on XHTML1.0 transitional extended by ' NamedURL u'http://en.wikipedia.org/wiki/Microformats': 1 children u' Microformats' u' where necessary. ' Paragraph '': 3 children u'This is to support the ' NamedURL u'http://meta.wikimedia.org/wiki/Help:HTML_in_wikitext': 1 children u' presentational HTML4.01 Elements allowed in wikitext' u' by MediaWiki.' Paragraph '': 1 children u'A future implementation could be based on XHTML1.1 strict plus MathML.' Paragraph '': 2 children u'You may want to have a look at the proposed XML-Format ' Link '': 1 children u'Extension:XML Bridge/MWXHTML' Paragraph '': 3 children u'The XML is generated based on the parse-tree generated by the ' NamedURL u'http://code.pediapress.com': 1 children u' mwlib' u' MediaWiki-markup parsing library.' Section '': 3 children Node '': 1 children u'Development & Evaluation ' Paragraph '': 5 children u'The xhtmlwriter.py is part of the ' NamedURL u'http://code.pediapress.com': 1 children u' mwlib python library' u'. See ' NamedURL u'http://code.pediapress.com': 1 children u' this page' u' for installation instructions.' Paragraph '': 1 children u'There is a xml-server app in the sandbox directory, which acts as a Mediawiki (which must support the new API) proxy, converting wikitext to xhtml as you browse.' Section '': 3 children Node '': 1 children u'Long Term Goal ' Paragraph '': 1 children u'... is to have a solid XML-Export/Import that allows to replace the MediaWiki-Markup with a XML-representation, this may coincide with WYSIWIG-editing in MediaWiki.' Paragraph '': 2 children u'Steps toward this goal: ' ItemList '': 7 children Item '': 1 children u' initial release of XML-Exporter code ' Item '': 1 children u' develop XML->mw-markup converter so one can convert back and forth ' Item '': 1 children u' discuss and incrementally improve the xml-markup ' Item '': 1 children u' discuss whether usage of certain html-styling and template usage can be labeled deprecated' Item '': 1 children u' check edits and notify users if using wrong or deprecated markup ' Item '': 1 children u' fix or remove all broken/deprecated markup,html-styling,inappropriate template usage' Item '': 1 children u' switch to xml' Section '': 6 children Node '': 1 children u'Other considered Implementation Options ' Section '': 2 children Node '': 1 children u'MediaWiki specific XML Language ' Paragraph '': 2 children NamedURL u'http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages': 1 children u' Don\u2019t Invent XML Languages' u' - at least for now.' Section '': 3 children Node '': 1 children u'DocBook ' Paragraph '': 1 children u'DocBook is a very large markup language. A more abbreviated version, Simplified DocBook, removes a number of redundant elements. DocBook NG schema (customizable namespaces) is under development. ' Paragraph '': 1 children u'Conclusion: DocBook is overly complicated while still lacking features in order to fully support a lossless representation of MW markup.' Section '': 3 children Node '': 1 children u'XHTML ' Paragraph '': 1 children u'XHTML is well supported by many applications and libraries. XHTML can be mixed with other namespaces (xhtml remains if stripped). Currently MW-markup allows to mix in HTML and even css styles. Therefore a lossless XML representation would need to support a subset of the XHTML specification.' Paragraph '': 1 children u'MW-markup expresses semantics (e.g. sections) which are not supported by XHTML1.0. Hence pure XHTML is not sufficient. ' Section '': 6 children Node '': 1 children u'XHTML + Additional namespace ' Paragraph '': 1 children u'We considered to combine XHTML with a proprietary MediaWiki specific namesspace (xmlns:mwx) using the best of two worlds (compatibility with existing tools and lossless representation).' Paragraph '': 1 children u'For e.g. a category-link could be written as:' Paragraph '': 1 children Source u'source': 1 children u'<a href="Kategorie:Extensions" mwx:linktype="category">Extensions</a>' Paragraph '': 2 children u'Other suitable namespaces can be included like ' NamedURL u'http://www.w3.org/Math/': 1 children u' MathML' Paragraph '': 1 children u'This still requires to invent and add a new XML-Language which is considered harmful(see above).' Section '': 5 children Node '': 1 children u'XHTML + Microformats ' Paragraph '': 1 children u'Use Microformats to semantically annotate generated XHTML.' Paragraph '': 1 children Source u'source': 1 children u'\n <div class="mwx.section" title="some heading">\n <h2>some heading</h2>\n \n <a href="SomePage" class="mwx.link.internal">some page within the same wiki</a>\n \n </div>\n' Paragraph '': 1 children NamedURL u'http://microformats.org/wiki?title=mediawiki-mark-up-issues&redirect=no': 1 children u' Discussion on microformats in MediaWikis' Paragraph '': 2 children u'See the planned implementation: ' Link '': 1 children u'Extension:XML_Bridge/MWXHTML' Section '': 2 children Node '': 1 children u'Open Issues ' Section '': 2 children Node '': 1 children u'Templates ' Paragraph '': 1 children ItemList '': 1 children Item '': 1 children u' Currently it seem impossible to correctly mark all uses of templates within the XML output.' Section '': 2 children Node '': 1 children u'Related Projects ' Paragraph '': 1 children ItemList '': 13 children Item '': 2 children Link '': 1 children u'Extension:Wiki2xml' u' (abandoned)' Item '': 2 children Link '': 1 children u'DocBook_XML_export' u' (never started)' Item '': 2 children Link '': 1 children u'Extension:Open_Office_Export' u' (abandoned)' Item '': 1 children Link '': 1 children u'Extension:Data Transfer' Item '': 1 children NamedURL u'http://cnx.org/help/CNXMLLanguage': 1 children u' Connexions XML Language' Item '': 1 children NamedURL u'http://ilps.science.uva.nl/WikiXML/xmlformat.php': 1 children u' WikiXML: XML format' Item '': 1 children NamedURL u'http://www.riehle.org/wp-content/uploads/2008/01/a5-junghans.pdf': 1 children u' An XML Interchange Format for Wiki Creole 1.0' Item '': 1 children NamedURL u'http://web.archive.org/web/20060704163408/doc-book.sourceforge.net/homepage/': 1 children u' DocBook Wiki' Item '': 1 children NamedURL u'http://meta.wikimedia.org/wiki/XHTML': 1 children u' about XHTML produced by MediaWiki' Item '': 2 children NamedURL u'http://meta.wikimedia.org/wiki/Wikitext_standard': 1 children u' Wikitext Standard' u' ... describe and formalize a 1.0 version of the Wikitext language, based on what is used currently. (last edit: 29 June 2007)' Item '': 4 children NamedURL u'http://www.usemod.com/cgi-bin/mb.pl?WikiMarkupStandard': 1 children u' WikiMarkup Standard' u' discusses ways to allow visitors from one wiki engine to edit pages on other wikis without having to learn their WikiSyntax. (last edit: May 10, 2008 ' Strong '': 1 children u'active discussion' u')' Item '': 2 children NamedURL u'http://web.archive.org/web/20060716015316/http://hula-project.org/Wiki_Conversion': 1 children u' page collecting info on "wiki conversion"' u' (last modified 15:40, 21 Jun 2006.)' Item '': 1 children Strong '': 2 children Link '': 1 children u'Wikipedia DTD' u' (last real edit 9 April 2006)' Section '': 3 children Node '': 1 children u'See also ' Paragraph '': 1 children ItemList '': 8 children Item '': 1 children NamedURL u'http://en.wikipedia.org/wiki/MediaWiki#Limitations': 1 children u' MediaWiki Limitations' Item '': 1 children Link '': 1 children u'Alternative parsers' Item '': 1 children NamedURL u'http://blogs.sun.com/lebo/entry/who_isn_t_using_a': 1 children u' Can you roundtrip OpenOffice.org to MediaWiki and back again?' Item '': 1 children URL u'http://code.pediapress.com/': 0 children Item '': 1 children NamedURL u'http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages': 1 children u' Don\u2019t Invent XML Languages' Item '': 1 children NamedURL u'http://www.codinghorror.com/blog/archives/001116.html': 1 children u' Is HTML a Humane Markup Language?' Item '': 1 children NamedURL u'http://freshmeat.net/projects/xmldiff/': 1 children u' XML Diff' Item '': 1 children Link '': 1 children u'WYSIWYG_editor' Paragraph '': 1 children CategoryLink '': 1 children u'Data_extraction_extensions' --> </mwlibxml>
