Manual:Importing external content
|It has been suggested that this page or section be merged with commons:Commons:Convert tables and charts to wiki code or image files. (Discuss)|
For existing sites it is always a tough task to migrate to a Wiki structure; the process of wikifying existing content from text files, HTML websites, or even office documents can be automated, but you'll have to write appropriate scripts on your own. As far as we know, there are no general-user ready-to-run scripts available for this purpose. Opposed to commercial Content Management Systems like Hyperwave or HTML editors like Microsoft FrontPage, MediaWiki and other Open Source WikiWiki software does not include import filters. There are some exceptions which will be discussed in the following sections.
- 1 Wikis
- 1.1 Converting content from a UseMod Wiki
- 1.2 Converting content from a PHPWiki
- 1.3 Converting JSPWiki format to MediaWiki format
- 1.4 Converting TracWiki format to MediaWiki format
- 1.5 Converting MoinMoin format to MediaWiki format
- 1.6 Converting WackoWiki to MediaWiki
- 1.7 Converting TikiWiki format to MediaWiki format
- 1.8 Converting GoogleCode Wiki to MediaWiki
- 2 Converting content from a CSV text file
- 3 Converting content from HTML
- 4 Converting content from a MS-Word document
- 5 Converting content from plain text files
- 6 Converting content from other sources
- 7 Importing content in Windows PowerShell
Converting content from a UseMod Wiki
Prior to MediaWiki (Wikipedia Software Phase III and Phase II), Wikipedia ran on the UseMod Wiki software written by Clifford Adams. UseModWiki is a Perl script which uses a database of text files to generate a WikiWiki site. It usually runs as a CGI script in response to web requests, but can be called directly by other Perl programs.
The storage format of UseMod Wiki is well documented.
Converting content from a PHPWiki
For larger PHPWikis, Isaac Wilcox has written a Perl script to do the conversion. It converts all the commonly used markup (still not 100% of markup, but most PHPWikis will only need minor tweaks after conversion; patches are welcome). It's written for the Mediawiki 1.4.x database schema, though updating it to handle 1.5.x should be fairly easy (again, patches welcome).
The above script works well, but the schema has changed quite a bit since it was written. I found it easier to install the last stable 1.4.x version, import my data, then upgrade mediawiki. The script did an excellent job of preserving almost all of the formatting.
Also see PhpWiki conversion for a solution that uses "sed".
Another solution (combination of already mentioned ones): User:Atrox/Phpwiki2Mediawiki.
Converting JSPWiki format to MediaWiki format
You can use jspwiki2mediawiki.pl to convert JSPWiki pages to MediaWiki format.
The basis for this tool is php2mediawiki by Isaac Wilcox. php2mediawiki provided a convenient basis for this converter and the modifications added to it were introduced to support the conversion of the JSPWiki format.
Converting TracWiki format to MediaWiki format
You can use tracwiki2mediawiki.pl to convert TracWiki pages to MediaWiki format.
The basis for this tool is php2mediawiki by Isaac Wilcox. php2mediawiki provided a convenient basis for this converter and the modifications added to it were introduced to support the conversion of the TracWiki format.
Converting MoinMoin format to MediaWiki format
There are various scripts for this, all dodgy. See MoinMoin.
Converting WackoWiki to MediaWiki
Converting TikiWiki format to MediaWiki format
Converting GoogleCode Wiki to MediaWiki
It allows to store pages in both formats - .gw (googlecode) and .mw (mediawiki), and scripts to support bidirectional svn - mediawiki transfer.
Converting content from a CSV text file
If you are using Windows you can try csv2other. It produces an output file with .txt extension containing code for a wiki table.
Converting content from HTML
- The Html2Wiki extension. The extension relies on Pandoc, which is a command-line document conversion tool. The extension allows users to import HTML content directly into the wiki, including images. Import entire websites, or complete web pages that you save from the browser (such as Google Docs).
- Pandoc has an online demo. It has command line tool, integration to Python/Ruby and written in Haskell.
- HTML2Mediawiki Java library with online demo.
- https://tools.wmflabs.org/magnustools/html2wiki.php can convert HTML tables into MediaWiki table syntax
- HTML-WikiConverter-0.68 Perl module
- MwImporter, a php script for importing entire websites; it uses html2wiki and other MediaWiki maintenance scripts to import entire directories of static html and image files while preserving relative links, etc.
Converting content from a MS-Word document
Microsoft Office Word Add-in For MediaWiki saves documents from Microsoft Office Word straight into MediaWiki.
LibreOffice also does a good job of reading MS Word and a usable job of exporting as MediaWiki wikitext.
Converting content from plain text files
You can use the importTextFiles.php maintenance script.
Converting content from other sources
If you are able and willing to do some scripting by yourself, it is possible to import almost any existing textual content with a documented file format into MediaWiki.
Example: CIA World Factbook 2002
This is a one-time script; most paths and coding are hard-coded, and lots of the code is for parsing the CIA World Factbook print pages, but it might serve as a good example of what can be done.
Importing content in Windows PowerShell
Manual:Importing XML dumps describes various tools to import XML dumps of wiki pages, including the Special:Import wiki page.