Manual:Importing external content

From MediaWiki.org

Jump to: navigation, search

For existing sites it is always a tough task to migrate to a Wiki structure; the process of wikifying existing content from text files, HTML websites, or even office documents can be automated, but you'll have to write appropriate scripts on your own. As far as we know, there are no general-user ready-to-run scripts available for this purpose. Opposed to commercial Content Management Systems like Hyperwave or HTML editors like Microsoft FrontPage, MediaWiki and other Open Source WikiWiki software does not include import filters. There are some exceptions which will be discussed in the following sections.

Contents

[edit] Wikis

[edit] Converting content from a UseMod Wiki

Prior to MediaWiki (Wikipedia Software Phase III and Phase II), the Wikipedia utilized the UseMod Wiki software written by Clifford Adams. UseModWiki is a Perl script which uses a database of text files to generate a WikiWiki site. Its primary access method is through CGI via the Web, but can be called directly by other Perl programs. To convert an existing UseMod Wiki site, there is a script available in the /maintenance subdirectory of your MediaWiki source folder.

The storage format of UseMod Wiki is well documented: DataBase.

[edit] Converting content from a PHPWiki

If you only have a few pages to convert, and the content isn't sensitive, you might want to try WebForce's online markup converter.

For larger PHPWikis, Isaac Wilcox has written a Perl script to do the conversion. It converts all the commonly used markup (still not 100% of markup, but most PHPWikis will only need minor tweaks after conversion; patches are welcome). It's written for the Mediawiki 1.4.x database schema, though updating it to handle 1.5.x should be fairly easy (again, patches welcome).

The above script works well, but the schema has changed quite a bit since it was written. I found it easier to install the last stable 1.4.x version, import my data, then upgrade mediawiki. The script did an excellent job of preserving almost all of the formatting.

Also see PhpWiki conversion for a solution that uses "sed".

[edit] Converting JSPWiki format to MediaWiki format

You can use jspwiki2mediawiki.pl to convert JSPWiki pages to MediaWiki format.
The basis for this tool is php2mediawiki by Isaac Wilcox. php2mediawiki provided a convenient basis for this converter and the modifications added to it were introduced to support the conversion of the JSPWiki format.

[edit] Converting TracWiki format to MediaWiki format

You can use tracwiki2mediawiki.pl to convert TracWiki pages to MediaWiki format.
The basis for this tool is php2mediawiki by Isaac Wilcox. php2mediawiki provided a convenient basis for this converter and the modifications added to it were introduced to support the conversion of the TracWiki format.

[edit] Converting MoinMoin format to MediaWiki format

There are various scripts for this, all dodgy. See mw:MoinMoin.

[edit] Converting WackoWiki to MediaWiki

There is WackoWiki converter (developed for http://freesource.info/ migration to http://altlinux.org/), however it will need additional tweaking before use.

[edit] Converting TikiWiki format to MediaWiki format

You can convert TikiWiki pages to MediaWiki format using this script.

[edit] Converting content from a CSV text file

to do

[edit] Converting content from HTML text file

If you have only a HTML excerpt or a few pages to convert, you might want to try Diberri's html2wiki converter, which uses HTML::WikiConverter Perl module from CPAN. For larger collection of files one should probably use the module itself.

There are probably other HTML to Wiki markup converters.

See the section below for importing into MediaWiki.

[edit] Converting content from a MS-Word document

Try: Word2MediaWikiPlus.

The en:OpenOffice.org office suite also does a good job of reading MS Word and a usable job of exporting as MediaWiki wikitext.

[edit] Converting content from other sources

If you are able and willing to do some scripting by yourself, it is possible to import almost any existing textual content with a documented file format into MediaWiki.

[edit] Importing content from a powershell

If you are also using your mediawiki for internal documentation or as a team knowledge base you might want to automate page creation from a shell. You can use a powershell script that works with HTTP - so it should work from most of your servers without firewall problems.

Mediwiki offers a Special page where you can import pages. But this is not a webservice API or something like that - it's an html form with an xml file upload. So you have to create a mediawiki-compliant xml file and fake an html form submission with the correct parameters ... you can find the script at slash4 powershell mediawiki script

[edit] CIA World Factbook 2002

As an example there might be the following script of some interest, which imports the public domain data from the CIA World Factbook 2002 into MediaWiki. It has been written by Evan Prodromou (E-Mail: evan(at)wikitravel(dot)org) for his own use on Wikitravel - The free, complete, up-to-date and reliable world-wide travel guide. It is licensed under the GNU General Public License and is available for download.

Please note that it's a one-time script; most paths and stuff are hard-coded, and lots of the code is for parsing the CIA World Factbook print pages, but it might serve as a good example of what can be done.

The script can be found at http://wikitravel.org/en/Wikitravel:CIA_World_Factbook_2002_import.