Extension:Data Transfer

Description
Data Transfer is an extension to MediaWiki that allows users to both export and import data from and to the wiki, with export done in XML format and import possible in both XML and CSV formats.. The extension makes light use of the Semantic MediaWiki extension, but it does not require the presence of SMW. Still, in spirit the extension could be considered a member of the Semantic MediaWiki "family", since it has the same data-centric approach to wiki content used by SMW and spinoff extensions like Semantic Forms.

It should be noted that Data Transfer is not an ideal solution for backing up one's wiki, or transferring wiki pages from one MediaWiki site to another; for that, the much better solution is to use MediaWiki's built-in "Special:Export" and "Special:Import" pages.

Code and download
You can download the Data Transfer code in either one of these two compressed files:


 * data_transfer_0.3.4.tar.gz
 * data_transfer_0.3.4.zip

You can also download the code directly via SVN from the MediaWiki source code repository, at http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/DataTransfer/. From a command line, you can call the following:

To view the code online, including version history for each file, you can go here.

Installation
After you've obtained a 'DataTransfer' directory (either by extracting a compressed file or downloading via SVN), place this directory within the main MediaWiki 'extensions' directory. Then, in the file 'LocalSettings.php' in the main MediaWiki directory, add the following line:

By default, the importing of files is allowed only for administrators/sysops. If you want other groups to be able to import files, you can add additional lines to LocalSettings.php to allow that. This line, for example, will allow all users to import files:

To allow anyone reading the wiki to import files, you could add the following (though it's not usually recommended):

Exporting data
Data Transfer defines a special page, "Special:ViewXML", that lets users view (and thus save) the pages in any combination of the wiki's categories and namespaces in XML form. The fields and values in the XML are taken from the fields and values in any template calls contained in the page; any non-template text is put into one or more "free text" tags. In addition, an "ID" field is also displayed for every page, using MediaWiki's internal "article ID" for that page; this is done so that outside systems can track a page with a more fixed identifier than its name (which can change often). The XML contains only the current state of any page: information on authors and dates modified, and information on previous versions of each page, are not recorded.

Two formats for export are supported: the first, or standard one, contains tags of the form  and. The second, or "simplified" one, contains tags of simply the form  and.

Importing data
Data Transfer defines two special pages, "Special:ImportXML" and "Special:ImportCSV", that let users in the "administrators" group upload XML and CSV files, respectively; the data is turned into pages in the wiki (or, if pages with those names already existed in the wiki, new versions of those pages).

The XML import requires the standard, i.e. non-simplified, XML format that "ViewXML" produces, although with several differences: the "ID" attribute for each page should not be present, and tags called "Category" or "Namespace" (in whatever language the wiki is in) should not be present.

For CSV import, the file must be truly a CSV file (i.e., separated by commas, as opposed to semicolons or anything else). If the file contains non-ASCII characters it must be encoded in either UTF-8 or UTF-16 (the latter being simply called "Unicode" in some Windows programs). The top row must contain the name of each column. One of the columns must contain the title of each page, and so its column name must be "Title" (in whatever language the wiki is in). Another column can contain all the free, non-template text in the page: the title of this column must be "Free Text" (again, in the language of the wiki). Any other column must represent the contents of a single field of a single template call; the name of such a column should be of the form "template-name[field-name]". There is no need to separately specify the names of the template(s) called in the page.

Note: the import actions are structured as MediaWiki "jobs", to ensure that the system is not overloaded if the user wants to do many at the same time. This means that a large set of imports will not be done immediately, and may take minutes, hours or even longer to complete. Jobs get activated every time a page is viewed on the wiki; to speed up the process (or slow it down), you change the number of jobs run when a page is viewed; the default is 1. For information on how to change it, see the $wgJobRunRate page.

Languages supported
Data Transfer has full support for English, and partial support for Afrikaans, Arabic, Bulgarian, Taiwanese Chinese, Czech, Danish, Dutch, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Khmer, Lithuanian, Malayalam, Marathi, Pashto, Polish, Portuguese, Russian, Spanish, Swedish, Tagalog, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Uyghur, Vietnamese and many other languages.

Authors
Data Transfer was written by Yaron Koren, reachable at yaron57 -at- gmail.com.

Version
Data Transfer is currently at version 0.3.4.

The version history is:
 * 0.1 - February 19, 2008 - Initial version
 * 0.1.1 - February 21, 2008 - Several small improvements; language support added for Arabic, Dutch, French, Galician, German, Luxembourgish, Norwegian Bokmål, Portuguese, Slovak, Swedish, Upper Sorbian and Vietnamese
 * 0.1.2 - February 27, 2008 - ID field added for every page; improved conversion of spaces to underscores; language support added for Bulgarian, Taiwanese Chinese, Khmer, Russian and Telugu
 * 0.1.3 - March 7, 2008 - Language support added for Japanese and Seeltersk, and improved for other languages
 * 0.1.4 - April 3, 2008 - Language support added for Catalan, Czech, Danish, Esperanto, Greek, Hungarian, Marathi, Norwegian Nynorsk, Pashto, Polish, Serbian Cyrillic, Tajik and Tetum
 * 0.1.5 - April 14, 2008 - Fixed language-value handling for MediaWiki versions before 1.11; language supported for Hindi, Silesian and Tamil
 * 0.1.6 - April 23, 2008 - Fixed handling of pages with no category; language support added for Manx and Ossetic
 * 0.1.7 - May 9, 2008 - Language support added for Afrikaans, Javanese, Kinaray-a, Malayalam and Volapük
 * 0.1.8 - June 20, 2008 - Language support added for Aragonese, West Frisian, Low German, Hawaiian, Indonesian, Ripuarian, Sundanese and Turkish
 * 0.1.9 - July 9, 2008 - Support added for changes in SMW 1.2; special page and language values now autoloaded; language support added for Belarusian, Lithuanian, Eastern Mari and Rotuman
 * 0.1.10 - October 27, 2008 - Support added for changes in SMW 1.4; language support added for Egyptian Arabic, Croatian, Erzya, Swiss German, Gothic, Ancient Greek, Interlingua, Italian, Mapudungun, Nahuatl, Romanian, Spanish, Thai and Ukrainian
 * 0.2 - April 7, 2009 - Importing of XML files added; minor bug fixes; language support added for Amharic, Bosnian, Simplified Chinese, Cornish, Finnish, Hebrew, Irish, Limburgish, Lower Sorbian, Mirandese, Brazilian Portuguese, Tagalog, Tarantino and Uyghur
 * 0.2.1 - May 18, 2009 - Hook added for Admin Links extension; language support added for Pennsylvania Dutch
 * 0.3 - May 26, 2009 - Importing of CSV files added
 * 0.3.1 - June 10, 2009 - Fix for CSV files with non-UTF-8 encoding
 * 0.3.2 - July 9, 2009 - Improved handling of SMW 1.4.2; removed handling for previous versions of SMW
 * 0.3.3. - July 27, 2009 - Added dropdown for setting encoding of file in 'ImportCSV'; only non-UTF-8 files get UTF-8 encoded
 * 0.3.4 - July 31, 2009 - Fix for handling of UTF-16-encoded files

Customizing the XML
You can specify that any specific page not be included in the XML produced, by adding the category tag " " to that page. You can also add this tag to a template, to exclude any page that uses that template from the XML.

If you have the Semantic MediaWiki extension installed (you must have version 1.4.2 or later for this to work), you can also specify that the XML for any specific page should also contain the XML for all its "children" pages; i.e., all the pages that have a specific property pointing to it. You can do this by adding the SMW property tag " Has XML grouping::property-name " to any page; in any XML displayed for that page, the XML of any page that has that property pointing to that page will be displayed within that page's XML tag. This semantic markup can also be added to templates, guaranteeing the same handling for all pages that use that template. "Has XML grouping" can also be used cumulatively. In a wiki holding information about continents, countries and cities, for example, a tag like Has XML grouping::Has continent could be added to the "Continent" template, and a tag like Has XML grouping::Has country could be added to the "Country" template. Assuming every country has a "Has continent" tag, and every city had a "Has country" tag, the XML for a single continent would contain the XML for each country within continent, and in turn would contain the XML for each city within each country in that continent.

Sites that use Data Transfer
Here are some sites that use Data Transfer:


 * Awaycity - Maps for expatriates
 * Discourse DB
 * EPSA - European Public Sector Award 2007
 * Melpedia
 * Verwaltungskooperation - Cooperation in Public Administration
 * All sites on Referata

Bugs and feature requests
You should use the Semantic MediaWiki mailing list, semediawiki-user, for any questions, suggestions or bug reports about Data Transfer. If possible, please add "[DT]" at the beginning of the subject line, to clarify the subject matter.

Contributing patches to the project
If you found some bug and fixed it, or if you wrote code for a new feature, please create a patch by going to the main "DataTransfer" directory, and typing:

svn diff > descriptivename.patch

Then send this patch, with a description, to Yaron Koren.

Translating
Translation of Data Transfer is done through translatewiki.net. The translation for this extension can be found here. To add language values or change existing ones, you should create an account on translatewiki.net, then request permission from the administrators to translate a certain language or languages on this page (this is a very simple process). Once you have permission for a given language, you can log in and add or edit whatever messages you want to in that language.