Manual:Importing external content

From MediaWiki.org
Jump to: navigation, search

Existing sites are difficult migrate to mediawiki structure: to "wikify" existing content from text files, HTML websites, or even office documents can be automated, but you'll have to write appropriate scripts on your own, and almost always must edit manually.

Learn Wikitext first because you will want to be sure to do all manual edits correctly, once, using every notable feature. You should consider installing Semantic Bundle and other extensions that extend markup before devoting a lot of manual effort to what may end up being an unsupportable set of conventions. For instance, Semantic Bundle deals well with page & object properties.

There are no general-user ready-to-run scripts available for data imports that anyone supports on more than a case-by-case basis. Any wiki farm or mediawiki administrator usually knows some ways to convert data to SQL or to supportable files, and a number of commercial forks of mediawiki (like BlueSpice) claim to offer some additional facilities for commonly used formats. These are commercial efforts and documenting them is beyond the scope of this manual.

Unlike proprietary CMS like Hyperwave or HTML editors like Microsoft FrontPage, mediawiki (and most open source software except Wordpress) includes few import filters. With tens of millions of some of the most heavily accessed & trusted content in the world already in mediawiki format, it is generally up to those maintaining data in incompatible formats to make it accessible in mediawiki, not the other way around. Mediawiki is focused on presenting its own wikitext effectively & in every language and device, it is fundamentally not focused on old or obsolete database import. It is not contemplated to "sync" mediawiki to anything but other mediawiki based sites. Conversion to mediawiki should be one-time & data maintained in its format thereafter, with a few exceptions discussed below.

One notable glaring lack is any ability to easily mirror LDAP, SMB or NFS directories into wiki pages, the best workaround at present is to link an HTML page on the same server since HTML browsers usually make pages on the fly for this.

Mediawiki & Wikitext-based compatibles[edit]

Re one-time import from another mediawiki see Manual:importing XML dumps, Manual:importing revisions, Manual:restoring a wiki from backup. XML dumps are required, as to import an older mediawiki's SQL requires significant expertise with MySQL etc. To restore an old SQL version, it's best to install the original mediawiki it was for, restore the database in phpmyadmin, and then forward-upgrade the code. A riskier method is to install a new mediawiki, rename its database in phpmyadmin, import the older one, rename it to match the name used in the newer install expected in its code, and carefully restore priveleges from the renamed DB to match. Then run mw-config update (not the maintenance script, it may not work) and export the mediawiki to XML properly as backup.

live mirroring[edit]

There is at present no facility for current mediawikis similar to the old GetWiki 1.0 live XML importing which mirrored another (usually Wikipedia) mediawiki & used its page content as default unless edited at the new (mirroring) mediawiki. This feature however can be weakly simulated with frequent import of XML dumps for pages that don't exist on an importing wiki, ignoring versions of pages that do or over-writing them from a backup just prior to importing the XML dump.

Wikipedia did not support the GetWiki approach to live mirroring for load reasons and perhaps to avoid early forks of "community". These are historical and do not prevent pursuing this approach for current extensions. The feature would be immensely useful for intranet purposes, for instance, having more and less secure versions of the same wiki content, with extra details added for the more trusted contributor.

Ironically it's easier to embed mediawiki content in Wordpress than in mediawiki itself. Some users of both like PRwatch/SourceWatch make use of this approach.

So-called "embedded wikis" are never mediawiki-based due to this lack of capability, developers who want to embed wiki into other social media or websites almost never use mediawiki.

Jamwiki[edit]

Jamwiki stores data in a variety of forms including flat file. Converting a Jamwiki database to flat file should produce entirely compatible pages to mediawiki, which can be uploaded with any bot script or manually, since the format is identical to mediawiki's

incompatible (non-Wikitext) wikis[edit]

Mediawiki's wikitext markup is by far the most commonly used in the world & supports by far the most natural languages & character sets. There are some projects purporting to set wiki markup "standards" but in practice mediawiki/jamwiki format is so well supported no "universal wiki" converter is contemplated. The vast majority of conversions are one-way to abandon older wikis, which now have mostly niche uses. Availability of wiki farms (wikia, refarata, etc), bitnami installers, pre-configured cloud images, good wiki-based support on shared hosts (especially Dreamhost), more powerful desktops & LAN servers, and improved XML based backup & restore tools, have made it possible to maintain a mediawiki without special expertise, obsoleting most other wiki.

Converting content from a UseMod Wiki[edit]

Prior to MediaWiki (Wikipedia Software Phase III and Phase II), Wikipedia ran on the UseMod Wiki software written by Clifford Adams. UseModWiki is a Perl script which uses a database of text files to generate a WikiWiki site. It usually runs as a CGI script in response to web requests, but can be called directly by other Perl programs.

The storage format of UseMod Wiki is well documented.

Converting content from a PHPWiki[edit]

If you only have a few pages to convert, and the content isn't sensitive, you might want to try WebForce's online markup converter.

For larger PHPWikis, Isaac Wilcox has written a Perl script to do the conversion. It converts all the commonly used markup (still not 100% of markup, but most PHPWikis will only need minor tweaks after conversion; patches are welcome). It's written for the Mediawiki 1.4.x database schema, though updating it to handle 1.5.x should be fairly easy (again, patches welcome).

The above script works well, but the schema has changed quite a bit since it was written. I found it easier to install the last stable 1.4.x version, import my data, then upgrade mediawiki. The script did an excellent job of preserving almost all of the formatting.

Also see PhpWiki conversion for a solution that uses "sed".

Another solution (combination of already mentioned ones): User:Atrox/Phpwiki2Mediawiki.

Converting JSPWiki format to MediaWiki format[edit]

You can use jspwiki2mediawiki.pl to convert JSPWiki pages to MediaWiki format.
The basis for this tool is php2mediawiki by Isaac Wilcox. php2mediawiki provided a convenient basis for this converter and the modifications added to it were introduced to support the conversion of the JSPWiki format.

Converting TracWiki format to MediaWiki format[edit]

You can use tracwiki2mediawiki.pl to convert TracWiki pages to MediaWiki format.
The basis for this tool is php2mediawiki by Isaac Wilcox. php2mediawiki provided a convenient basis for this converter and the modifications added to it were introduced to support the conversion of the TracWiki format.

Converting MoinMoin format to MediaWiki format[edit]

There are various scripts for this, all dodgy. See MoinMoin.

Converting WackoWiki to MediaWiki[edit]

There is WackoWiki converter (developed for http://freesource.info/ migration to http://altlinux.org/), however it will need additional tweaking before use.

Converting TikiWiki format to MediaWiki format[edit]

You can convert TikiWiki pages to MediaWiki format using this script.

Converting GoogleCode Wiki to MediaWiki[edit]

There is example of migrating site to mediawiki, with remaining storage of pages in svn - http://ahuman.org, code in http://usvn.ahuman.org/svn/ahwiki/tools.

It allows to store pages in both formats - .gw (googlecode) and .mw (mediawiki), and scripts to support bidirectional svn - mediawiki transfer.

Converting content from tabular (row/column) formats[edit]

Commons [https://commons.wikimedia.org/wiki/Commons:Convert_tables_and_charts_to_wiki_code_or_image_filesmethods of converting tables and charts are explained] on that page. This includes LibreOffice Calc, Excel, OpenOffice.org formats, and etc. - it's been proposed to merge that page with this one.

Most simple tabular formats can be exported to "comma-separated" CSV files so these are commonly

Linux[edit]

On Linux [https://www.linux.com/news/two-handy-mediawiki-extensions csv2wiki] [https://www.organicdesign.co.nz/Csv2wiki.pl] imports CSV format.

Windows[edit]

from/on Windows CSV text file[edit]

If you are using Windows you can try csv2other. It produces an output file with .txt extension containing code for a wiki table.

from/on Windows directory/folder listing[edit]

[https://www.addictivetips.com/windows-tips/how-to-convert-a-directory-into-html-index-file-or-create-a-sitemap/ Dir2html] creates simple HTML pages from Windows directories, so that these may be treated like any other HTML when imported below:

Converting content from HTML[edit]

  • The Html2Wiki extension. The extension relies on Pandoc, which is a command-line document conversion tool. The extension allows users to import HTML content directly into the wiki, including images. Import entire websites, or complete web pages that you save from the browser (such as Google Docs).
  • Pandoc has an online demo. It has command line tool, integration to Python/Ruby and written in Haskell.
  • HTML2Mediawiki Java library with online demo.

Older tools[edit]

Converting content from a MS-Word document[edit]

Word2MediaWikiPlus. (Deprecated)

Microsoft Office Word Add-in For MediaWiki saves documents from Microsoft Office Word straight into MediaWiki.

LibreOffice also does a good job of reading MS Word and a usable job of exporting as MediaWiki wikitext.

Converting content from plain text files[edit]

You can use the importTextFiles.php maintenance script.

Converting content from other sources[edit]

If you are able and willing to do some scripting by yourself, it is possible to import almost any existing textual content with a documented file format into MediaWiki.

Example: CIA World Factbook 2002[edit]

As an example, there is the public domain data from the CIA World Factbook 2002 which was imported into the MediaWiki Wikitravel.

This is a one-time script; most paths and coding are hard-coded, and lots of the code is for parsing the CIA World Factbook print pages, but it might serve as a good example of what can be done.

Importing content in Windows PowerShell[edit]

Manual:Importing XML dumps describes various tools to import XML dumps of wiki pages, including the Special:Import wiki page.