Help:Export

Wiki pages can be exported in a special XML format to import>Special:MyLanguage/Help:Import|upload import into another MediaWiki installation (if this function is enabled on the destination wiki, and the user is a sysop there) or use it elsewise for instance for analysing the content.

See also meta>m:Syndication feeds|m:Syndication feeds for exporting other information but pages and  on importing pages.

How to export
There are at least four ways to export pages:


 * Paste the name of the articles in the box in export>Special:Export|Special:Export or use .


 * The backup script   dumps all the wiki pages into an XML file.   only works on MediaWiki 1.5 or newer.  You need to have direct access to the server to run this script.  Dumps of Wikimedia projects are regularly made available at https://dumps.wikimedia.org/.


 * Note: you might need to configure AdminSettings.php in order to run dumpBackup.php successfully. See MediaWiki for more information.


 * There is a OAI-PMH-interface to regularly fetch pages that have been modified since a specific time. For Wikimedia projects this interface is not publicly available; see </>.  OAI-PMH contains a wrapper format around the actual exported articles.


 * Use the [<tvar|url>http://pywikipediabot.sourceforge.net/</> Python Wikipedia Robot Framework]. This won't be explained here

By default only the current version of a page is included.

Optionally you can get all versions with date, time, user name and edit summary.

Optionally the latest version of all templates called directly or indirectly are also exported.

Additionally you can copy the SQL database.

This is how dumps of the database were made available before MediaWiki 1.5 and it won't be explained here further.

Using 'Special:Export'
To export all pages of a namespace, for example.

1. Get the names of pages to export
I feel an example is better because the description below feels quite unclear.


 * 1) Go to allpages>Special:Allpages</>|Special:Allpages and choose the desired article/file.

<span class="mw-headline" id="2._Perform_the_export">2. Perform the export [ edit ]

 * Go to Special:Export and paste all your page names into the textbox, making sure there are no empty lines.


 * Click 'Submit query'


 * Save the resulting XML to a file using your browser's save facility.

and finally...

Scroll to the bottom to check for error messages.
 * Open the XML file in a text editor.

Now you can use this XML file to import>Special:MyLanguage/Help:Import</>|perform an import.

Exporting the full history
A checkbox in the export>Special:Export</>|Special:Export interface selects whether to export the full history (all versions of an article) or the most recent version of articles.

A maximum of 100 revisions are returned; other revisions can be requested as detailed in <tvar|1></>.

Export format
The format of the XML file you receive is the same in all ways.

It is codified in XML Schema at <tvar|url>https://www.mediawiki.org/xml/export-0.10.xsd</>

This format is not intended for viewing in a web browser.

Some browsers show you pretty-printed XML with "+" and "-" links to view or hide selected parts.

Alternatively the XML-source can be viewed using the "view source" feature of the browser, or after saving the XML file locally, with a program of choice.

If you directly read the XML source it won't be difficult to find the actual wikitext.

If you don't use a special XML editor "<" and ">" appear as &amp;lt; and &amp;gt;, to avoid a conflict with XML tags; to avoid ambiguity, "&amp;" is coded as "&amp;amp;".

In the current version the export format does not contain an XML replacement of wiki markup (see Wikipedia DTD for an older proposal).

You only get the wikitext as you get when editing the article.

DTD
Here is an unofficial, short Document Type Definition version of the format.

If you don't know what a DTD is just ignore it.

Processing XML export
Many tools can process the exported XML.

If you process a large number of pages (for instance a whole dump) you probably won't be able to get the document in main memory so you will need a parser based on SAX or other event-driven methods.

You can also use regular expressions to directly process parts of the XML code.

This may be faster than other methods but not recommended because it's difficult to maintain.

Please list methods and tools for processing XML export here:


 * Parse Mediawiki Dump (crates.io) is a Rust crate to parse XML dumps. Parse Wiki Text (crates.io) is a Rust crate to parse wiki text into a tree of elements.


 * Parse::MediaWikiDump is a perl module for processing the XML dump file.


 * meta>m:Processing MediaWiki XML with STX</>|m:Processing MediaWiki XML with STX - Stream based XML transformation


 * The meta>m:IBM History flow project</>|m:IBM History flow project can read it after applying a small Python program, <tvar|py>export-historyflow-expand.py</>.

Details and practical advice
<tvar|ns> </>
 * To determine the namespace of a page you have to match its title to the prefixes defined in


 * Possible restrictions are
 * ( protected pages )

Why to export
Why not just use a dynamic database download?

Suppose you are building a piece of software that at certain points displays information that came from Wikipedia.

If you want your program to display the information in a different way than can be seen in the live version, you'll probably need the wikicode that is used to enter it, instead of the finished html.

Also if you want to get all of the data, you'll probably want to transfer it in the most efficient way that's possible.

The Wikimedia servers need to do quite a bit of work to convert the wikicode into html.

That's time consuming both for you and for the Wikimedia servers, so simply spidering all pages is not the way to go.

To access any article in XML, one at a time, link to:

Special:Export/Title_of_the_article