Extension:External Data

The External Data extension allows MediaWiki pages to retrieve, filter, and format structured data from one or more sources. These sources can include external URLs, regular wiki pages, uploaded files, files on the local server, databases and LDAP directories.

Parser functions
The extension has the following : one tag: and six Lua functions:
 * #get_web_data - retrieves CSV, GFF, JSON, XML, HTML or free-form data from a URL and assigns it to variables that can be accessed on the page.
 * #get_soap_data - retrieves data from a URL via the SOAP protocol.
 * #get_file_data - retrieves data from a file on the local server, in the same formats as #get_web_data.
 * #get_db_data - retrieves data from a database.
 * #get_ldap_data - retrieves data from an LDAP server.
 * #get_program_data - retrieves data returned by a program run server-side.
 * #external_value - displays the value of any such variable.
 * #for_external_table - cycles through all the values retrieved for a set of variables, displaying the same "container" text for each one.
 * #store_external_table - cycles through a table of values, storing them as semantic data via the Semantic MediaWiki extension, by mimicking a call to SMW's #subobject function for each row.
 * #display_external_table - cycles through all the values retrieved for a set of variables, displaying each "row" using a template.
 * #clear_external_data - erases the current set of retrieved data.
 * pair that shows raw external data without any wiki postprocessing.

Retrieving data
Data can be retrieved from these sources:


 * a web page containing structured data (including a page on the wiki itself)
 * a SOAP service
 * a database
 * an LDAP server
 * a file or directory
 * some program output.

#get_ldap_data - retrieve data from LDAP directory
The parser function #get_ldap_data allows retrieval of data from external LDAP directories. This function executes LDAP queries and assigns the results to local variables that can then be used with the #external_value function.

A note about security: - If you are going to use #get_ldap_data you should think hard about the security implications. Configuring an LDAP server in LocalSettings.php will allow anyone with edit access to your wiki to run queries against that server. You should use a domain user that has the minimum permissions for what you are trying to achieve. Wiki users could run queries to extract all sorts of information about your domain. You should know what you are doing before enabling this function.

Configuration
The PHP extension  must be enabled. You need to configure each LDAP server in LocalSettings.php. Add the following stanza for each server:

Where:


 * domain is a label to be used when calling #get_ldap_data
 * myDomainuser and myDomainPassword are credentials used to bind to the LDAP server
 * [basedn] is the base DN used for the search.

Example:

Usage
To query the LDAP server, add this call to a wiki page:

Where:


 * domain is the label used in LocalSettings.php
 * filter is the LDAP filter used for the search
 * data is the mappings of LDAP attributes to local variables
 * if all is not added, the query will retrieve only one result.

An example that retrieves a user from with Win2003/AD, using a userid passed to a template:

#get_program_data - retrieve data returned by a program run server-side
The parser function #get_ldap_data allows retrieval of data returned by a program run server-side. Every such program has to be confgured at  as in the example below:

After the program is configured so, it can be invoked thus: and then the retrieved data (SVG in this case) can be shown with pair, which will prevent any wiki postprocessing.

A simplified syntax is availble in tag emulation mode: pair.

A simpler example, involving only text processing, is below: and

Although programs are run in a restricted environment by, wiki admin should exercise great caution while configuring programs to make them callable with #get_program_data.

Program's output is cached in the table  as configured by the parser function parameters: and configuration settings:

A set of tested examples can be found here and (with working output) here.

Displaying data
Once you have retrieved the data onto the page, from any source, there are two ways to display it on the page:  and.

Displaying individual values
If this call retrieved a single value for each variable specified, you can call the following:

As an example, this page contains the following text:

.
 * Germany borders the following countries:
 * Germany has population.
 * Germany has area.
 * Its capital is.

The page gets data from this URL, which contains the following text:

"357,050 km²","Austria,Belgium,Czech Republic,Denmark,France,Luxembourg,Netherlands,Poland,Switzerland",Berlin,"82,411,001"

The page then uses #external_value to display the 'bordered countries' and 'population' values; although it uses the #arraymap function, defined by the Page Forms extension, to apply some transformations to the 'bordered countries' value (you can ignore this detail if you want).

By default, #external_value displays an error message if it is called for a variable that has not been set, or if the specified data source is inaccessible, or the data source does not contain any data; and there is no fallback/default value. You can disable the error message by adding the following to LocalSettings.php:

To prevent any further wiki processing of external data, for example, when it is SVG produced by get_program_data, you can use pair.

Displaying a table of values
The data returned by #get_web_data or #get_db_data (#get_ldap_data without the  parameter doesn't support this feature) can also be a "table" of data (many values per field), instead of just a single "row" (one value per field). In this case, you can display it using one of either the functions #for_external_table or #display_external_table.

#for_external_table
This URL contains information similar to that above, but for a few countries instead of just one. Calling #get_web_data with this URL, with the same format as above, will set the local variables to contain arrays of data, rather than single values. You can then call #for_external_table, which has the following format:

...where "expression" is a string that contains one or more variable names, surrounded by triple brackets. This string is then displayed for each retrieved "row" of data.

For an example, this page contains a call to #get_web_data for the URL mentioned above, followed by this call:

The call to #for_external_table holds a single row of a table, in wiki-text; it's surrounded by wiki-text to create the top and bottom of the table. The presence of " | " is a standard MediaWiki trick to display pipes from within parser functions. There are much easier calls to #for_external_table that can be made, if you just want to display a line of text per data "row", but an HTML table is the standard approach.

There's one other interesting feature of #for_external_table, which is that it lets you modify specific values. You can URL-encode values by calling them with instead of just , and similarly you can HTML-encode values by calling them with.

As an example of the former, if you wanted to show links to Google searches on a set of terms retrieved, you could call:

This is required because standard parser functions can't be used within #for_external_table - so the following, for example, will not work:

#display_external_table
This function is called as:
 * 1) display_external_table is similar in concept to #for_external_table, but it passes the values in each row to a template, which handles the display.

An explanation of the parameters:
 * - the name of the template into which each "row" of data will be passed
 * - the data mappings between external variable and local template parameter; much like the  parameters for the other functions
 * - the separator used between one template call and the next; default is a newline. (To include newlines in the delimiter value, use "\n".)
 * - a template displayed before the results set, only if there are any results
 * - a template displayed after the results set, only if there are any results

For example, to display the data from the previous example in a table as before, you could create a template called "Country info row", that had the parameters "Country name", "Countries bordered", "Population" and "Area", and then call the following:

The template "Country info row" should then contain wikitext like the following:

Clearing data
You can also clear all external data that has already been retrieved, so that it doesn't conflict with calls to retrieve external data further down the page. The most likely case in which this is useful is when data is retrieved and displayed in a template that is called more than once on a page. To clear the data, just call " ". Note that the ":" has to be there at the end of the call, or else MediaWiki will ignore the parser function.

There is no way to clear the values for only one field; #clear_external_data erases the entire set of data.

Storing data
You can also use External Data to store a table of data that has been retrieved; you can do this using the storage capabilities of either the Semantic MediaWiki or Cargo extensions. Once the data has been stored, it can then be queried, aggregated, displayed etc. on the wiki by that extension.

Semantic MediaWiki
If you store data with Semantic MediaWiki, you should note a common problem, which is that the data stored by SMW does not get automatically updated when the data coming from the external source changes. The best solution for this, assuming you expect the data to change over time, is to create a cron job to call the SMW maintenance script "rebuildData.php" at regular intervals, such as once a day; that way, the data is never more than a day old.

To store a table of data using SMW, you can use the #store_external_table function. This function works as a hybrid of the #for_external_table function and the #subobject function, defined in the Semantic MediaWiki extension. Unlike with #subobject, the first parameter is the name of a property that will link from the subobject to the page it's on. You can see a demonstration of this function on the page Fruits semantic data; the call to #store_external_table on that page looks like:
 * 1) store_external_table loops over each row, and uses variables, in the same way as #for_external_table.

Cargo
There is no special parser function for storing data via Cargo; instead you should simply use #display_external_table, and include Cargo storage code within the template called by that function. You can see an example of Cargo-based storage using #display_external_table here; it uses this template, and you can see the resulting data here.

Scribunto/Lua
Since version 2.2, External Data defines Lua functions that match the functionality of its six "accessor" parser functions, so that wikis that have the Scribunto extension installed can call these functions directly in order to access and display outside data.

The following functions are defined:

The Lua functions accept the same parameters as parser functions, but please note the following:


 * Technically, there is only one parameter; it is known in Lua as a table, and its keys correspond to the parser function parameters.
 * Comma-separated lists like  can be replaced with Lua tables; so that both   and   will work.
 * If XML format is used, an external variable  is set, which contains XML data preserviing, with some limitations, the whole structure of the original XML document. It can be referred to in the   argument, and the corresponding internal variable will be a nested Lua table.
 * If JSON format is used, an external variable  is set, which contains JSON data preserviing the whole structure of the original JSON document. It can be referred to in the   argument, and the corresponding internal variable will be a nested Lua table.
 * "Valueless" parameters like  can be supplied both as numbered and named:   and   are both valid.
 * Parameters whose name contains a space, like, need to be surrounded with quotes and brackets, like  , unless they are valueless, in which case quotes are enough.

Each Lua function returns two values:


 * 1) A table of external data. Unlike with the parser functions, it will be "row-based", i.e. a numbered array of records with named fields corresponding to external variables. If external data is not fetched, nil will be returned.
 * 2) * If there is only one value for some external variable (it will be in the first record), it will be duplicated as a named field of the returned table, as it is highly likely that it belongs to the rowset as a whole rather than its first row; so that it can be accessed both as  and  ,
 * 3) A numbered table of error messages. If there were no errors, nil will be returned.

Unlike parser functions, external data is only returned to calling Lua module and not stored on the page to be retrieved later by, etc.

Example:

Common problems

 * If the call to #get_web_data or #for_external_table isn't returning any data, and the page being accessed is large, it could be because the call to retrieve is getting timed out. You should set the flag in your LocalSettings.php file (which represents a number of seconds) to some number greater than 25, its default value. You could call, for instance:




 * If the data being accessed has changed, but the wiki page accessing it still shows the old data, it is because that page is being cached by MediaWiki. There are several solutions to this: if you are an administrator, you can hit the "refresh" tab above the page, which will purge the cache. You can also easily disable caching for the entire wiki; see here for how. Finally, if you wait long enough (typically no more than 24 hours), the page will get refreshed on its own and display the new data.


 * If you host a private wiki locally but use a dynamic IP service to access it, your wiki will connect to itself through your public IP and not through localhost or 127.0.0.1 (or an IPv6 equivalent). In such a case, your wiki is not allowed to query itself so the examples given here will work when data are hosted on a different server but not if they are hosted on your wiki. A workaround is to use the extension Extension:NetworkAuth which allows you to automatically authenticate your router/box/modem to access your wiki. Note: the security of this approach is not guaranteed.


 * If the extension is not correctly handling non-ASCII characters, the problem might be that your PHP instance lacks the mbstring extension - make sure that it is installed.


 * To query data from another wiki that uses Semantic MediaWiki, it is recommended to use the Special:Ask page, rather than one of SMW's API actions, to construct the URL that will be passed in to #get_web_data, since the API will not output data in a syntax that External Data can use. To construct the URL, go to Special:Ask, create the desired query, then copy the URL from the "Download queried results in CSV format" link.

Version history
External Data is currently at version 2.4.1. See the entire version history.

Bugs and feature requests
The best place to report bugs is on Phabricator - see How to report a bug. The project that should be specified is MediaWiki-extensions-ExternalData.

You can also put any questions, suggestions or bug reports about External Data at the talk page for this extension. Or you can write to the MediaWiki mailing list, mediawiki-l. (If you write to the mailing list, please include "External Data" somewhere in the subject line.)

You can also send specific code patches to Yaron Koren, at yaron57@gmail.com.

Translating
Translation of External Data is done through translatewiki.net. The translation for this extension can be found here. To add language values or change existing ones, you should create an account on translatewiki.net, then request permission from the administrators to translate a certain language or languages on this page (this is a very simple process). Once you have permission for a given language, you can log in and add or edit whatever messages you want to in that language.