Extension:Data Transfer

Data Transfer is an extension to MediaWiki that allows users to both export and import data from and to the wiki, with export done in XML format and import possible in both XML, CSV and some spreadsheet formats. The extension makes light use of the Semantic MediaWiki extension, but it does not require the presence of SMW. Still, in spirit the extension could be considered a member of the Semantic MediaWiki "family", since it has the same data-centric approach to wiki content used by SMW and spinoff extensions like Semantic Forms.

It should be noted that Data Transfer is not an ideal solution for backing up one's wiki, or transferring wiki pages from one MediaWiki site to another; for that, the much better solution is to use MediaWiki's built-in "Special:Export" and "Special:Import" pages.

Code and download
You can download the Data Transfer code, in .zip format, here.

You can also download the code directly via Git from the MediaWiki source code repository. From a command line, you can call the following:

To view the code online, including version history for each file, you can go here.

Installation
After you've obtained a 'DataTransfer' directory (either by extracting a compressed file or downloading via Git), place this directory within the main MediaWiki 'extensions' directory. Then, in the file 'LocalSettings.php' in the main MediaWiki directory, add the following line:

By default, the importing of files is allowed only for administrators/sysops. If you want other groups to be able to import files, you can add additional lines to LocalSettings.php to allow that. This line, for example, will allow all users to import files:

To allow anyone reading the wiki to import files, you could add the following (though it's not usually recommended):

Exporting data
Data Transfer defines a special page, "Special:ViewXML", that lets users view (and thus save) the pages in any combination of the wiki's categories and namespaces in XML form. The fields and values in the XML are taken from the fields and values in any template calls contained in the page; any non-template text is put into one or more "free text" tags. In addition, an "ID" field is also displayed for every page, using MediaWiki's internal "article ID" for that page; this is done so that outside systems can track a page with a more fixed identifier than its name (which can change often). The XML contains only the current state of any page: information on authors and dates modified, and information on previous versions of each page, are not recorded.

Two formats for export are supported: the first, or standard one, contains tags of the form  and. The second, or "simplified" one, contains tags of simply the form  and.

Special:ViewXML can also be used to generate XML for individual pages, by adding a "&titles=" parameter to the URL, like "&titles=Page 1|Page 2|Page 3".

By default, the "free text" (non-template) part of a page is parsed by the MediaWiki parser, so that wikitext gets converted into HTML; whereas the values within template calls are not. To disable parsing for the free text, add the following to LocalSettings.php:

Conversely, to add parsing for template field values, add the following:

Importing data
Data Transfer defines three special pages, "Special:ImportXML", "Special:ImportCSV" and "Special:ImportSpreadsheet", that let users with administrator privileges upload XML, CSV and assorted spreadsheet files, respectively. Once uploaded, the data is turned into pages in the wiki (or, if pages with those names already existed in the wiki, new versions of those pages).

The XML import requires the standard, i.e. non-simplified, XML format that "ViewXML" produces, although with several differences: the "ID" attribute for each page should not be present, and tags called "Category" or "Namespace" (in whatever language the wiki is in) should not be present.

For CSV import, the file must be truly a CSV file (i.e., separated by commas, as opposed to semicolons or anything else). If the file contains non-ASCII characters it must be encoded in either UTF-8 or UTF-16 (the latter being simply called "Unicode" in some Windows programs). Especially if you're using Mac OS, note that the file's line breaks should contain "line feeds" ("\n") as opposed to just "carriage returns" ("\r"). The top row must contain the name of each column. One of the columns must contain the title of each page, and so its column name must be "Title" (in whatever language the wiki is in). Another column can contain all the free, non-template text in the page: the title of this column must be "Free Text" (again, in the language of the wiki). Any other column must represent the contents of a single field of a single template call; the name of such a column should be of the form "template-name[field-name]" (whitespace allowed). There is no need to separately specify the names of the template(s) called in the page.

A brief tutorial on the CSV format: if a value contains a comma, you must enclose it in double quotes. If a field containing one or more double quotes needs to be enclosed in double quotes, those double quotes should be escaped as double double quotes. An empty field can either be left empty, or contain a double double quote. You can see here for the full CSV specification.

Here is an example of a CSV file that can be parsed by Data Transfer: Title,Cheese[Country],Cheese[Texture],Free Text Mozarella,Italy,Semi-soft,It's good on pizzas! Cheddar,England,Hard/semi-hard,"Often sharp, but not always." Gorgonzola,Italy,"buttery or firm, crumbly","salty, with a ""bite"" from its blue veining" Stilton,,"",needs more data

For the spreadsheet import, Data Transfer requires the presence of the PHPExcel library, which does the actual spreadsheet processing. PHPExcel can handle spreadsheet files in formats including .xls, .xlsx, .ods, Gnumeric, and even PDF and HTML. The titles of the columns should be the same as for CSV files.

Authors
Data Transfer was mostly written by Yaron Koren, reachable at yaron57 -at- gmail.com. The spreadsheet import functionality was written by Stephan Gambke.

Version history
Data Transfer is currently at version 0.6.1. See the entire version history.

Common problems

 * The import actions are structured as MediaWiki "jobs". This means that the page creations will not be done immediately, and may take minutes, hours or even longer to complete. Normally, jobs get activated every time a page is viewed on the wiki; to speed up the process (or slow it down), you change the number of jobs run when a page is viewed; the default is 1. For information on how to change it, see the $wgJobRunRate page. To have the wiki run all jobs immediately, execute the script runJobs.php.
 * Tied in with the above, there is a bug in MediaWiki 1.22, where jobs that create or modify pages simply do not get run. If you're running MediaWiki 1.22, it is recommended to add this call in LocalSettings.php, which should restore the correct behavior:

Customizing the export XML
You can specify that any specific page not be included in the XML produced, by adding the category tag " " to that page. You can also add this tag to a template, to exclude any page that uses that template from the XML.

If you have the Semantic MediaWiki extension installed, you can also specify that the XML for any specific page should also contain the XML for all its "children" pages; i.e., all the pages that have a specific property pointing to it. You can do this by adding the SMW property tag " Has XML grouping::property-name " to any page; in any XML displayed for that page, the XML of any page that has that property pointing to that page will be displayed within that page's XML tag. This semantic markup can also be added to templates, guaranteeing the same handling for all pages that use that template. "Has XML grouping" can also be used cumulatively. In a wiki holding information about continents, countries and cities, for example, a tag like Has XML grouping::Has continent could be added to the "Continent" template, and a tag like Has XML grouping::Has country could be added to the "Country" template. Assuming every country has a "Has continent" tag, and every city had a "Has country" tag, the XML for a single continent would contain the XML for each country within continent, and in turn would contain the XML for each city within each country in that continent.

Sites that use Data Transfer
Here is one listing of active wikis that use Data Transfer.

Bugs and feature requests
You should use the Semantic MediaWiki mailing list, semediawiki-user, for any questions, suggestions or bug reports about Data Transfer. If possible, please add "[DT]" at the beginning of the subject line, to clarify the subject matter.

Contributing patches to the project
If you found some bug and fixed it, or if you wrote code for a new feature, please either do a Git commit of it, or create a patch by going to the main "DataTransfer" directory, and typing:

git diff > descriptivename.patch

If you create a patch, please send it, with a description, to Yaron Koren.

Translating
Translation of Data Transfer is done through translatewiki.net. The translation for this extension can be found here. To add language values or change existing ones, you should create an account on translatewiki.net, then request permission from the administrators to translate a certain language or languages on this page (this is a very simple process). Once you have permission for a given language, you can log in and add or edit whatever messages you want to in that language.