Extension:External Data

The External Data extension allows MediaWiki pages to retrieve, filter, and format structured data from one or more sources. These sources can include external URLs, regular wiki pages, uploaded files, files on the local server, databases and LDAP directories.

Parser functions
The extension has the following : one tag: and six Lua functions:
 * #get_web_data - retrieves CSV, GFF, JSON, XML, HTML or free-form data from a URL and assigns it to variables that can be accessed on the page.
 * #get_soap_data - retrieves data from a URL via the SOAP protocol.
 * #get_file_data - retrieves data from a file on the local server, in the same formats as #get_web_data.
 * #get_db_data - retrieves data from a database.
 * #get_ldap_data - retrieves data from an LDAP server.
 * #get_program_data - retrieves data returned by a program run server-side.
 * #external_value - displays the value of any such variable.
 * #for_external_table - cycles through all the values retrieved for a set of variables, displaying the same "container" text for each one.
 * #store_external_table - cycles through a table of values, storing them as semantic data via the Semantic MediaWiki extension, by mimicking a call to SMW's #subobject function for each row.
 * #display_external_table - cycles through all the values retrieved for a set of variables, displaying each "row" using a template.
 * #clear_external_data - erases the current set of retrieved data.
 * pair that shows raw external data without any wiki postprocessing.

Clearing data
You can also clear all external data that has already been retrieved, so that it doesn't conflict with calls to retrieve external data further down the page. The most likely case in which this is useful is when data is retrieved and displayed in a template that is called more than once on a page. To clear the data, just call " ". Note that the ":" has to be there at the end of the call, or else MediaWiki will ignore the parser function.

There is no way to clear the values for only one field; #clear_external_data erases the entire set of data.

Storing data
You can also use External Data to store a table of data that has been retrieved; you can do this using the storage capabilities of either the Semantic MediaWiki or Cargo extensions. Once the data has been stored, it can then be queried, aggregated, displayed etc. on the wiki by that extension.

Semantic MediaWiki
If you store data with Semantic MediaWiki, you should note a common problem, which is that the data stored by SMW does not get automatically updated when the data coming from the external source changes. The best solution for this, assuming you expect the data to change over time, is to create a cron job to call the SMW maintenance script "rebuildData.php" at regular intervals, such as once a day; that way, the data is never more than a day old.

To store a table of data using SMW, you can use the #store_external_table function. This function works as a hybrid of the #for_external_table function and the #subobject function, defined in the Semantic MediaWiki extension. Unlike with #subobject, the first parameter is the name of a property that will link from the subobject to the page it's on. You can see a demonstration of this function on the page Fruits semantic data; the call to #store_external_table on that page looks like:
 * 1) store_external_table loops over each row, and uses variables, in the same way as #for_external_table.

Cargo
There is no special parser function for storing data via Cargo; instead you should simply use #display_external_table, and include Cargo storage code within the template called by that function. You can see an example of Cargo-based storage using #display_external_table here; it uses this template, and you can see the resulting data here.

Scribunto/Lua
Since version 2.2, External Data defines Lua functions that match the functionality of its six "accessor" parser functions, so that wikis that have the Scribunto extension installed can call these functions directly in order to access and display outside data.

The following functions are defined:

The Lua functions accept the same parameters as parser functions, but please note the following:


 * Technically, there is only one parameter; it is known in Lua as a table, and its keys correspond to the parser function parameters.
 * Comma-separated lists like  can be replaced with Lua tables; so that both   and   will work.
 * If XML format is used, an external variable  is set, which contains XML data preserviing, with some limitations, the whole structure of the original XML document. It can be referred to in the   argument, and the corresponding internal variable will be a nested Lua table.
 * If JSON format is used, an external variable  is set, which contains JSON data preserviing the whole structure of the original JSON document. It can be referred to in the   argument, and the corresponding internal variable will be a nested Lua table.
 * "Valueless" parameters like  can be supplied both as numbered and named:   and   are both valid.
 * Parameters whose name contains a space, like, need to be surrounded with quotes and brackets, like  , unless they are valueless, in which case quotes are enough.

Each Lua function returns two values:


 * 1) A table of external data. Unlike with the parser functions, it will be "row-based", i.e. a numbered array of records with named fields corresponding to external variables. If external data is not fetched, nil will be returned.
 * 2) * If there is only one value for some external variable (it will be in the first record), it will be duplicated as a named field of the returned table, as it is highly likely that it belongs to the rowset as a whole rather than its first row; so that it can be accessed both as  and  ,
 * 3) A numbered table of error messages. If there were no errors, nil will be returned.

Unlike parser functions, external data is only returned to calling Lua module and not stored on the page to be retrieved later by, etc.

Example:

Common problems

 * If the call to #get_web_data or #for_external_table isn't returning any data, and the page being accessed is large, it could be because the call to retrieve is getting timed out. You should set the flag in your LocalSettings.php file (which represents a number of seconds) to some number greater than 25, its default value. You could call, for instance:




 * If the data being accessed has changed, but the wiki page accessing it still shows the old data, it is because that page is being cached by MediaWiki. There are several solutions to this: if you are an administrator, you can hit the "refresh" tab above the page, which will purge the cache. You can also easily disable caching for the entire wiki; see here for how. Finally, if you wait long enough (typically no more than 24 hours), the page will get refreshed on its own and display the new data.


 * If you host a private wiki locally but use a dynamic IP service to access it, your wiki will connect to itself through your public IP and not through localhost or 127.0.0.1 (or an IPv6 equivalent). In such a case, your wiki is not allowed to query itself so the examples given here will work when data are hosted on a different server but not if they are hosted on your wiki. A workaround is to use the extension Extension:NetworkAuth which allows you to automatically authenticate your router/box/modem to access your wiki. Note: the security of this approach is not guaranteed.


 * If the extension is not correctly handling non-ASCII characters, the problem might be that your PHP instance lacks the mbstring extension - make sure that it is installed.


 * To query data from another wiki that uses Semantic MediaWiki, it is recommended to use the Special:Ask page, rather than one of SMW's API actions, to construct the URL that will be passed in to #get_web_data, since the API will not output data in a syntax that External Data can use. To construct the URL, go to Special:Ask, create the desired query, then copy the URL from the "Download queried results in CSV format" link.

Version history
External Data is currently at version 2.4.1. See the entire version history.

Bugs and feature requests
The best place to report bugs is on Phabricator - see How to report a bug. The project that should be specified is MediaWiki-extensions-ExternalData.

You can also put any questions, suggestions or bug reports about External Data at the talk page for this extension. Or you can write to the MediaWiki mailing list, mediawiki-l. (If you write to the mailing list, please include "External Data" somewhere in the subject line.)

You can also send specific code patches to Yaron Koren, at yaron57@gmail.com.

Translating
Translation of External Data is done through translatewiki.net. The translation for this extension can be found here. To add language values or change existing ones, you should create an account on translatewiki.net, then request permission from the administrators to translate a certain language or languages on this page (this is a very simple process). Once you have permission for a given language, you can log in and add or edit whatever messages you want to in that language.