Jump to content

Extension:External Data/Configuration

From mediawiki.org

External Data configuration settings consist of two parts: configuration for data sources and a few other settings.

Data sources

[edit]

Most of the extension settings regulate calls to {{#*_external_table:}}/{{#external_value:}} in standalone mode, or {{#get_*_data:}} parser functions in compatibility mode, or the corresponding Lua functions, configuring so called data sources. There are at least two data sources relevant to any parser function call; a more specific data source overriding a more universal one; and the last ('*') containing global fallback settings.

Settings for the data sources are stored in the associative array $wgExternalDataSources. Its keys are names of data sources and values are arrays containing settings for the data source.

The relevant data sources for the data retrieval parser functions (and their Lua analogues) and the relevant settings are:

Function $wgExternalDataSources index
(parser function parameter)
Settings Description
first then then last
Any data retrieving function Data source ID
(source, etc.)
'*' params An array of additional parameters to the data retrieving function used to substitute wildcards (like $param$) in configuration settings. Array member like 'param' => 'default' will set a default 'default'value for 'param', while 'param' will make 'param' required.
param filters An array of validators for additional parameters. A validator may be a string with a delimited regular expression that should match on the parameter value, or a callable that should return true. It is important to valdate parameters that are substituted for wildcards to prevent injections.
hidden If set to true, this data source can only be called with {{#get_external_data:source=id|...}} or {{#get_external_data:id|...}}, and the error messages will be suppressed, as if suppress error has been passed to the parser function.
{{#get_web_data:}}
mw.ext.externalData.getWebData()
URL
(url)
host
(from
url)
second
level
domain
(from
url)
'*' replacements Replacements in the URLs
allowed urls A whitelist of URLs
encodings A list of charsets to try
allow ssl Whether to allow SSL
options HTTP options
{{#get_soap_data:}}
mw.ext.externalData.getSoapData()
throttle key Throttle key
throttle interval Interval between two throttled calls, in seconds
always use stale cache Always allow stale cache
min cache seconds Cache for at least so many seconds
{{#get_file_data:}}
mw.ext.externalData.getFileData()
File ID
(file)/
directory ID
(directory)
'*' path File or directory path
depth Allowed directory iteration depth
{{#get_ldap_data:}}
mw.ext.externalData.getLdapData()
LDAP domain
(domain)
'*' server LDAP server
user LDAP user
password LDAP password
base dn Base DN
{{#get_db_data:}}
mw.ext.externalData.getDbData()
Database
connection
ID (db)
'*' server Database server
type Database type
name Database name
user Database user
password User password
directory SQLite directory
flags Database flags
prefix Table prefix
prepared Prepared statement(s). If a string, this is the only prepared statement for the connection, and the query parameter in the wikitext is not needed.

If an associative array, there are several prepared statements, indexed by query. Each of them can be, in turn, a string containing a prepared statement with no parameters or only string parameters, or an array of the form [ 'query' => 'SELECT ...', 'types' => 'si' /* or other parameters types */ ]

types Parameter types for the prepared statements
cache seconds Cache MongoDB result for so many seconds
{{#get_program_data:}}
mw.ext.externalData.getProgramData()
Program ID
(program)
'*' command Shell command
input Parameter name that shall be fed into program's standard input
temp Name of the temporary file to be used instead of standard output
limits Resource limits
env Environment variables
ignore warnings Ignore warnings that a successfully executed program may send to stderr
preprocess A callable that preprocesses program's standard input
postprocess A callable that postprocesses program's standard output
name Program name for Special:Version
program url Program website fot Special:Version
version Program version for Special:Version
version command A shell command that outputs program version for Special:Version
tag Tag for the tag emulation mode
throttle key Throttle key
throttle interval Interval between two throttled calls, in seconds
always use stale cache Always allow stale cache
min cache seconds Cache for at least so many seconds

Remember that {{#*_external_table:}}/{{#external_value:}} in standalone mode or {{#get_external_data:}} can replace any of the {{#get_*_data:}} parser functions, as well as mw.ext.externalData.getExternalData() can replace any of the mw.ext.externalData.get*Data() Lua functions.

Any parameter of {{#get_*_data:}} can be omitted, provided it is set in the corresponding $wgExternalDataSources['…'] array. The obvious exception is the parameter used as the key to $wgExternalDataSources: url, directory, file, domain, db, program and source (see below).

The parameter source can replace url, directory, file, domain, db and program, provided that the corresponding $wgExternalDataSources['source'] contains all the settings necessary to choose and initialise the proper connector. Furthermore, if the value of source does not contain equal signs, source = can be omitted, i.e., this parameter can be passed anonymously.

Any configuration setting can include wildcards surrounded by dollar signs, like this: 'url' => 'https://raw.githubusercontent.com/lipis/flag-icons/main/flags/4x3/$iso2$.svg'. These wildcards will be substituted from additional parameters to {{#get_*_data:}}. The additional parameters should be declared as required or receive a default value in $wgExternalData['…']['params'], e.g.: $wgExternalData['…']['params'] = [ 'iso' ];. It is important that a validator is set up for these parameters: $wgExternalData['…']['param filters'] = [ 'iso' => '/^[a-z]{2}$/' ];. This mechanism allows formation of shell commands used by server-side programs.

With 'hidden' => true, wiki admin can define hidden data sources, the very nature of which is hidden from wiki user. Example of such a source:

$wgExternalDataSources['flags'] = [
	'url' => 'https://raw.githubusercontent.com/lipis/flag-icons/main/flags/4x3/$iso2$.svg',
	'params' => [ 'iso2' ],
	'param filters' => [ 'iso2' => '/^[a-z]{2}$/' ],
	'format' => 'text',
	'hidden' => true
];

Is such a source is defined, the following wikitext will show the SVG code for Israeli flag:

{{#get_external_data: flags | iso2 = il }}
{{#external_value:__text}}

Hidden data sources can only be called with {{#*_external_table:}}/{{#external_value:}} in standalone mode, or {{#get_external_data:source=(id)|...}}, or {{#get_external_data:(id)|...}}, not using other specific {{#get_*_data:}} functions and there identifiers like db, domain, etc.

The default value for $wgExternalDataSources is:

$wgExternalDataSources = [
	'*' => [
		'min cache seconds' => 3600,
		'always use stale cache' => false,
		'throttle key' => '$2nd_lvl_domain$',
		'throttle interval' => 0,
		'replacements' => [],
		'allowed urls' => [],
		'options' => [ 'timeout'=> 'default' ],
		'encodings' => [ 'ASCII', 'UTF-8', 'Windows-1251', 'Windows-1252', 'Windows-1254', 'KOI8-R', 'ISO-8859-1' ],
		'params' => [],
		'param filters' => []
	]
];

Other settings

[edit]

Other settings are:

  • $wgExternalDataVerbose = true; — show error message, if an internal variable is not set. Note also, that {{#external_value:}} allows passing a default/fallback value as its second parameter,
  • $wgExternalDataAllowGetters = true; switches on the compatibility mode, under which the {{#get_…_data:}} data retrival functions are still available, as well as mw.ext.getExternalData.getData() functions other than mw.ext.getExternalData.getExternalData().
    Without the compatibility mode, the only way of accessing external data is the standalone mode of {{#…_external_table:}} and mw.ext.getExternalData.getExternalData().
    When wikipages are parsed with Parsoid, only the standalone mode or using Lua guarantee that the data is fetched prior to its display,
  • $wgExternalDataIntegratedConnectors = [...]; — a set of rules regulating the choice of class to handle connection to a data source depending on parameters of the {{#…_external_data:}} working in standalone mode. Injecting a new rule calling a new class extending EDConnectorBase allows to add new functionality to this extension,
  • $wgExternalDataConnectors = [...]; — an additiona set of rules regulating the choice of class to handle connection to a data source depending on the parser function invoked and its parameters in compatibility mode,
  • $wgExternalDataParsers = [...]; — a set of rules regulating the choice of text parser to convert text, returned by an external service, to variables. Injecting a new rule calling a new class extending EDParserBase allows to add new functionality to this extension.

For default values of these three variables refer to the file extension.json.