Manual:Importing external content

Existing sites are difficult to migrate to MediaWiki's structure. "Wikifying" existing content from text files, HTML websites, or even office documents can be automated, but you'll have to write appropriate scripts on your own, and almost always must edit manually.

Learn wikitext first, because you will want to be sure to do all manual edits correctly, once, using every notable feature. You should consider installing Semantic Bundle and other extensions that extend markup before devoting a lot of manual effort to what may end up being an unsupportable set of conventions. For instance, Semantic Bundle deals well with page & object properties.

There are no general-user ready-to-run scripts available for data imports that anyone supports on more than a case-by-case basis. Any wiki farm or MediaWiki administrator usually knows some ways to convert data to SQL or to supportable files, and a number of commercial forks of MediaWiki (like BlueSpice) claim to offer some additional facilities for commonly used formats. These are commercial efforts and documenting them is beyond the scope of this manual.

Unlike proprietary CMS like Hyperwave or HTML editors like Microsoft FrontPage, MediaWiki (and most open source software except Wordpress) includes few import filters. With tens of millions of some of the most heavily accessed & trusted content in the world already in MediaWiki format, it is generally up to those maintaining data in incompatible formats to make it accessible in MediaWiki, not the other way around. MediaWiki is focused on presenting its own wikitext effectively & in every language and device, it is fundamentally not focused on old or obsolete database import. It is not contemplated to "sync" MediaWiki to anything but other MediaWiki based sites. Conversion to MediaWiki should be one-time & data maintained in its format thereafter, with a few exceptions discussed below.

One notable glaring lack is any ability to easily mirror LDAP, SMB or NFS directories into wiki pages, the best workaround at present is to link an HTML page on the same server since HTML browsers usually make pages on the fly for this.

Mediawiki and wikitext-based compatibles
To do a one-time import from another MediaWiki wiki, please see Manual:Importing XML dumps, Manual:Importing revisions, and Manual:Restoring a wiki from backup. XML dumps are required, as to import an older mediawiki's SQL requires significant expertise with MySQL etc. To restore an old SQL version, it's best to install the original mediawiki it was for, restore the database in phpmyadmin, and then forward-upgrade the code. A riskier method is to install a new mediawiki, rename its database in phpmyadmin, import the older one, rename it to match the name used in the newer install expected in its code, and carefully restore priveleges from the renamed DB to match. Then run mw-config update (not the maintenance script, it may not work) and export the MediaWiki wiki to XML properly as backup.

Live mirroring
There is at present no facility for current MediaWiki wikis similar to the old GetWiki 1.0 live XML importing which mirrored another (usually Wikipedia) mediawiki & used its page content as default unless edited at the new (mirroring) MediaWiki. This feature however can be weakly simulated with frequent import of XML dumps for pages that don't exist on an importing wiki, ignoring versions of pages that do or over-writing them from a backup just prior to importing the XML dump.

Wikipedia did not support the GetWiki approach to live mirroring for load reasons and perhaps to avoid early forks of "community". These are historical and do not prevent pursuing this approach for current extensions. The feature would be immensely useful for intranet purposes, for instance, having more and less secure versions of the same wiki content, with extra details added for the more trusted users.

Ironically it's easier to embed mediawiki content in Wordpress than in mediawiki itself. Some users of both like PRwatch/SourceWatch make use of this approach.

So-called "embedded wikis" are never mediawiki-based due to this lack of capability, developers who want to embed wiki into other social media or websites almost never use mediawiki.

Jamwiki
Jamwiki stores data in a variety of forms including flat file. Converting a Jamwiki database to flat file should produce entirely compatible pages to mediawiki, which can be uploaded with any bot script or manually, since the format is identical to mediawiki's

Incompatible (non-wikitext) wikis
Mediawiki's wikitext markup is by far the most commonly used in the world & supports by far the most natural languages & character sets. There are some projects purporting to set wiki markup "standards" but in practice mediawiki/jamwiki format is so well supported no "universal wiki" converter is contemplated. The vast majority of conversions are one-way to abandon older wikis, which now have mostly niche uses. Availability of wiki farms (wikia, refarata, etc), bitnami installers, pre-configured cloud images, good wiki-based support on shared hosts (especially Dreamhost), more powerful desktops & LAN servers, and improved XML based backup & restore tools, have made it possible to maintain a mediawiki without special expertise, obsoleting most other wiki.

Converting content from a UseMod Wiki
Prior to MediaWiki (Wikipedia Software Phase III and Phase II), Wikipedia ran on the UseMod Wiki software written by Clifford Adams. UseModWiki is a Perl script which uses a database of text files to generate a WikiWiki site. It usually runs as a CGI script in response to web requests, but can be called directly by other Perl programs.

The storage format of UseMod Wiki is well documented.

Converting content from a PHPWiki
If you only have a few pages to convert, and the content isn't sensitive, you might want to try WebForce's online markup converter.

For larger PHPWikis, Isaac Wilcox has written a Perl script to do the conversion. It converts all the commonly used markup (still not 100% of markup, but most PHPWikis will only need minor tweaks after conversion; patches are welcome). It's written for the MediaWiki 1.4.x database schema, though updating it to handle 1.5.x should be fairly easy (again, patches welcome).

The above script works well, but the schema has changed quite a bit since it was written. I found it easier to install the last stable 1.4.x version, import my data, then upgrade MediaWiki. The script did an excellent job of preserving almost all of the formatting.

Also see PhpWiki conversion for a solution that uses "sed".

Another solution (combination of already mentioned ones): User:Atrox/Phpwiki2Mediawiki.

Converting MoinMoin format to MediaWiki format
There are various scripts for this, all dodgy. See MoinMoin.

Converting WackoWiki to MediaWiki
There is WackoWiki converter (developed for http://freesource.info/ migration to http://altlinux.org/), however it will need additional tweaking before use.

Converting TikiWiki format to MediaWiki format
You can convert TikiWiki pages to MediaWiki format using this script.

Converting GoogleCode Wiki to MediaWiki
There is example of migrating site to mediawiki, with remaining storage of pages in svn - http://ahuman.org, code in http://usvn.ahuman.org/svn/ahwiki/tools.

It allows to store pages in both formats - .gw (googlecode) and .mw (mediawiki), and scripts to support bidirectional svn - mediawiki transfer.

Converting WikiSpaces format to MediaWiki.
See Wikispaces.

Converting content from tabular (row/column) formats
Commons methods of converting tables and charts are explained on that page. This includes LibreOffice Calc, Excel, OpenOffice.org formats, and etc. - it's been proposed to merge that page with this one.

Most simple tabular formats can be exported to "comma-separated" CSV files so these are commonly

Linux
On Linux csv2wiki imports CSV format.

from/on Windows CSV text file
If you are using Windows you can try csv2other. It produces an output file with .txt extension containing code for a wiki table.

from/on Windows directory/folder listing
Dir2html creates simple HTML pages from Windows directories, so that these may be treated like any other HTML when imported below:

Converting content from HTML

 * Pandoc has an online demo. It has command line tool, integration to Python/Ruby and written in Haskell.
 * HTML2Mediawiki Java library with online demo.

Older tools

 * https://tools.wmflabs.org/magnustools/html2wiki.php can convert HTML tables into MediaWiki table syntax
 * HTML-WikiConverter-0.68 Perl module
 * MwImporter, a php script for importing entire websites; it uses html2wiki and other MediaWiki maintenance scripts to import entire directories of static html and image files while preserving relative links, etc.
 * The Html2Wiki extension. The (unmaintained) extension is a wrapper around pandoc.

Converting content from a MS-Word document
Microsoft Office Word Add-in For MediaWiki saves documents from Microsoft Office Word straight into MediaWiki.

LibreOffice also does a good job of reading MS Word and a usable job of exporting as MediaWiki wikitext.

Converting content from plain text files
You can use the importTextFiles.php maintenance script.

Converting content from other sources
If you are able and willing to do some scripting by yourself, it is possible to import almost any existing textual content with a documented file format into MediaWiki.

Example: CIA World Factbook 2002
As an example, there is the public domain data from the CIA World Factbook 2002 which was imported into the MediaWiki Wikitravel.

This is a one-time script; most paths and coding are hard-coded, and lots of the code is for parsing the CIA World Factbook print pages, but it might serve as a good example of what can be done.

Importing content in Windows PowerShell
Manual:Importing XML dumps describes various tools to import XML dumps of wiki pages, including the Special:Import wiki page.