Extension:DataTransclusion

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual - list
Crystal Clear action run.png
DataTransclusion

Release status: beta

Implementation Parser function
Description Fetches data records from a database or web API and formats them using wiki templates
Author(s) Daniel Kinzler for Wikimedia Deutschland e.V.
MediaWiki 1.16.* or greater
License GPL/LGPL
Download Template:WikimediaDownload/svn
log

Translate the DataTransclusion extension if possible

Check usage and version matrix; code metrics

The DataTransclusion extension allows individual data records to be retrieved from an external source such as a database or a web API. The fields from the data record are then passed to a template as template parameters for rendering. The design of DataTransclusion aims to make the mechanism flexible, so it can easily be adopted to different application, while remaining safe and secure to use even on large scale public wikis.

Usage[edit | edit source]

To use a record from a given data source, you need to know the name the data source was given in the configuration, and a template that will accept the fields in the records from that data source as parameters. Consider for example the following wiki markup:

{{#record:Book|openlibrary|isbn=0451526538}} 

In this example, there are three parameters, two of which are positional parameters, which must always be present in exactly this order:

  • the first parameter is the template to use. In the example, "Book" refers to the page Template:Book. The special value #dump causes a table of field names and values to be generated. This is useful to see which fields are actually returned by the data source.
  • the second parameter is the name of the data source. "openlibrary" refers to an entry in the $wgDataTransclusionSources configuration variable (see below).
  • the parameter isbn is the name of a lookup key that was configured for this data source. In this case, the data source contains information about books, and it was configured to allow look-ups using a book's ISBN.
  • any further parameters will be passed through to the template, unless a field of the same name exists in the data record retrieved from the data source. In that case, the value from the data record overrides the value from the {{#record}} call.

If the data source openlibrary was configured to provide the fields author and title, then in the template Template:Book these fields can be accessed using the normal triple-brace syntax for template parameters, i.e. {{{author}}} and {{{title}}} respectively.

Data sources may specify additional information fields that will always be the same for records from that source. These values may indicate e.g. the name or license of the source itself. In this case, {{{library}}} might have been configured to return The Open Library Project. This is especially useful when using the same template with data from different data sources, e.g. different bibliographical databases.

Note that data transclusion influences MediaWiki's parser cache - if a data source specifies that records from that source are cacheable for one hour, then pages containing data transcluded from that source will not be cached for longer than one hour.

Installation[edit | edit source]

  1. Download and extract the DataTransclusion extension
  2. Place the DataTransclusion directory in the extensions directory of your wiki installation.
  3. Activate the extension in your LocalSettings.php file using the line require_once("$IP/extensions/DataTransclusion/DataTransclusion.php");.
  4. Configure data sources using the $wgDataTransclusionSources configuration variable. See below for details.

Configuration[edit | edit source]

All configuration of the DataTransclusion extension is done using the $wgDataTransclusionSources variable in LocalSettings.php. $wgDataTransclusionSources is an associative array, with one entry for each data source. The logical name of the data source is the key for each entry, the value is itself an associative array of configuration options for that data source. Which configuration options can be used depends on which module is used for the respective data source. Some options however apply to all modules (or are used by the DataTransclusion core itself):

class
the name of the PHP class that implements the module to be used to access this data source. REQUIRED. Built-in modules are:
  • DBDataTransclusionSource for accessing records in a local database. See below for details.
  • WebDataTransclusionSource for accessing records via a simple web API. See below for details.
keyFields
list of fields that can be used as the key for fetching a record. REQUIRED.
optionNames
names of option that can be specified in addition to a key, to refine the output. Optional.
fieldNames
names of all fields present in each record. Fields not listed here will not be available on the wiki, even if they are returned by the data source. If not given, this defaults to $spec['keyFields'] + array_keys($spec['fieldInfo'] ).
fieldInfo
Associative array mapping logical field names to additional information for using and interpreting these fields. Different data sources may allow different hints for each field. The following hints are known for all types of data sources:
type
specifies the data types for the field: 'int' for integers, 'float' or 'decimal' for decimals, or 'string' for string fields. Serialization types 'json', 'wddx' and 'php' are also supported. Defaults to 'string'.
normalization
normalization to be applied for this field, when used as a query key. This may be a callable, or an object that supports the function normalize(), or a regular expression for patterns to be removed from the value. For normalizing identifiers, you may want to use array('ValueNormalizers', 'strip_punctuation').
sourceInfo
associative array of information about the data source that should be made available on the wiki. This information will be present in the record arrays as passed to the template. This is intended to allow information about source, license, etc to be shown on the wiki. Note that DataTransclusionSource implementations may provide extra information in the source info on their own. Per default, the entry source-name in the sourceInfo is set to the data source's logical name.
cacheDuration
the number of seconds a result from this source may be cached for. If not set, results are assumed to be cacheable indefinitely. This setting determines the expiry time of the parser cache entry for pages that show data from this source. If cache is set, cacheDuration also determines the expiry time of ObjectCache entries for records from this source.
cache
the object cache to use. Should be given as one of the CACHE_* constants normally used with $wgMainCacheType, usually CACHE_DB. Default is CACHE_NONE. Can also be an ObjectCache instance. The given object cache will be used to cache records from this data source for the number of seconds specified by the cacheDuration option.
transformer
a record transformer specification. This may be an instance of RecordTransformer, or an associative array specifying a record transformer which can then be created using RecordTransformer::newRecordTransformer. In that case, the value for the class field must be the class name of the desired RecordTransformer implementation. Other entries in that array are specific to the individual transformers. RecordTransformers provided per default are FlattenRecord and XPathFlattenRecord.

DBDataTransclusionSource[edit | edit source]

The DBDataTransclusionSource implements access to records in a local database (on the same server as the wiki database).

Additional configuration options:

query
the SQL query for fetching records. May not contain a GROUP or LIMIT clause (use querySuffix for that). The WHERE clause is automatically generated from the requested key/value pair. If query already contains a WHERE clause, the condition for the desired key/value pair is appended using AND. Note that subqueries are not supported reliably. REQUIRED.
querySuffix
additional clauses to be added after the WHERE clause. Useful mostly to specify GROUP BY (ORDER BY and LIMIT are pointless, since the query should always return only a single record).
keyTypes
associative arrays specifying the data types for the key fields (as given in keyFields). Array keys are the field names, the associated values specify the type as 'int' for integers, 'float' or 'decimal' for decimals, or 'string' for string fields.
keyFields
like for DataTransclusionSource, this is a list of fields that can be used as the key for fetching a record. However, it's not required for DBDataTransclusionSource: if not provided, all keys listed in keyTypes will be allowed as key fields.
fieldInfo
like for DataTransclusionSource, an assiciative array mapping logical field names to additional information for using and interpreting these fields. DBDataTransclusionSource supports the following additional entries in this array:
dbfield
the field's name in the database table, if different from the logical name.
serialized
format if the field contains a serialized structure as a blob. If deserialzation yields an array, it is merged with the data record. Supported formats are 'json', 'wddx' and 'php' for php serialized objects.

WebDataTransclusionSource[edit | edit source]

The WebDataTransclusionSource implements access to records via a simple RESTful web API (generally, JSON or WDDX fetched by a HTTP GET request).

url
base URL for building urls for retrieving individual records. If the URL contains placeholders of the form {xxx}, these get replaced by the respective key or option values. Otherwise, the key/value pair and options get appended to the URL as a regular URL parameter (preceeded by ? or &, as appropriate). REQUIRED.
dataFormat
Serialization format returned from the web service. Supported values are 'php' for PHP serialization format, 'json' for JavaScript syntax, and 'wddx' for XML-based list/dicts. Default is 'php'.
dataPath
"path" to the actual data in the structure returned from the HTTP request. This is only used if no transformer is set. The syntax of the path is the one defined for the dataPath parameter for the FlattenRecord transformer. REQUIRED if no transformer is defined.
errorPath
"path" to error messages in the structure returned from the HTTP request. This is only used if no transformer is set. The syntax of the path is the one defined for the dataPath parameter for the FlattenRecord transformer. REQUIRED if no transformer is defined.
httpOptions
array of options to pass to Http::get. For details, see Http::request.
timeout
seconds before the HTTP request times out. If not given, the timeout value given in httpOption is used. If both are not set, 5 seconds are assumed.

FlattenRecord[edit | edit source]

FlattenRecord is a record transformer, which can be used with the transformer option of a data transclusion source. It xtracts individual field values from a complex nested structure of arrays, such as typically are retruned from web APIs or document based databases.

FlattenRecord uses "pathes" to descibe which information in the nested array structure to access. Pathes are given as indexed arrays (lists), where each entry in the path navigates one step "down into" this structure. Each entry can be either a string (for a lookup in an associative array), and int (an index in a list), or a "meta-key" of the form @@N, where N is an integer. A meta-key refers to the Nth entry in an ordered associative array: @1 would be "bar" in array( 'x' => "foo", 'y' => "bar" ). Note: this is arcane and ideosyncratic. It should probably be changed to use XPath.

FlattenRecord is configured using an option array, similar to the ones used by transclusion sources. It supports the following fields:

class
the class of the transformer, in this case, always "FlattenRecord".
fieldPathes
an associative array giving a "path" for each field which points to the actual field values inside the record, that is, the structure that $spec['dataPath'] resolved to. Used by the transform() method.
dataPath
"path" to the actual data in the response structure, for use by the extractRecord() method. REQUIRED.
errorPath
"path" to error messages in the response structure, for use by the extractError() method. If an entry is found at the given position in the response structure, the request is assumed to have failed. If not given, the request is assumed to have been successful as long as dataPath can be resolved to a data item.

XPathFlattenRecord[edit | edit source]

XPathFlattenRecord extends FlattenRecord to handle arbitrary XML using XPath. This is applicable only if the data transclusion source uses an XML DOM as internal representation for raw data, as is the case with WebDataTransclusionSource, of the dataFormat option is set to 'xml'.

In addition to the options supported by the FlattenRecord class, XPathFlattenRecord accepts some additional options, and changes the convention for others.

fieldPathes
like for FlattenRecord, but using W3C XPath syntax.
dataPath
like for FlattenRecord, but using W3C XPath syntax.
errorPath
like for FlattenRecord, but using W3C XPath syntax.

Hint: this can be used to include RDF-Structured data as provided by Extension:Rdf, Extension:SemanticMediaWiki or meta:DBpedia.

See also[edit | edit source]

  • Extension:External Data - while DataTransclusion uses Templates for formatting data records, External Data loads property values into variables that can be used throughout the wiki page.