Extension:External Data/Local programs

From mediawiki.org

As of version 3.0, you can use External Data to retrieve data returned by a program run server-side. There are two ways to do this: by calling one of the standard parser (or Lua functions), or by calling a custom tag, defined with the 'tag' field in the data source definition, which handles both the retrieval and display of the data (the tag emulation mode). One advantage of the latter approach is that it allows for outputting raw HTML (most importantly, SVG); see the "Tag emulation mode" below for how to do it.

For the former approach: starting with version 3.2, the recommended way to retrieve program data is to use one of the display functions (#external_value, #for_external_table, etc.), passing in the necessary parameters for the data retrieval, most notably "source=". You can also retrieve program data by calling either #get_program_data or #get_external_data. In all of these cases, you must specify the information for the program in the variable $wgExternalDataSources in LocalSettings.php.

For any of these parser functions, you can also call its corresponding Lua function.

Simple example[edit]

A simple example, involving only text processing, is below:

// apt-cache show
$wgExternalDataSources['apt-cache show'] = [
    'command'       => 'apt-cache show $package$',
    'params'        => [ 'package' ],
    'param filters' => [ 'package' => '/^[\w-]+$/' ]
];

and

{{#get_program_data:
    program = apt-cache show <!-- The parameter 'program' can be passed as 'source' or even anonymously, provided there are no equal signs in it -->
  | package = graphviz-doc
  | data = key=1,value=2
  | format = CSV
  | delimiter = :
 }}
 {| class="wikitable"
 ! Key !! Value {{#for_external_table:<nowiki/>
 {{!}}-
 {{!}} {{{key}}} {{!}}{{!}} {{{value}}}
 }}
 |}

Tag emulation mode[edit]

Below is a more complicated example using tag emulation mode:

// 'lilypond' will be the name under which this program can be invoked by {{#get_program_data:}}.
$wgExternalDataSources['lilypond'] = [
    'name'            => 'LilyPond', // (optional) the name of the program for Special:Version.
    'program url'     => 'http://lilypond.org/', // (optional) program home page for Special:Version.
    'version command' => 'lilypond -v', // (optional) Shell comand that will return program version for Special:Version. The results are cached.
    'version'         => 'GNU LilyPond 2.20.0', // (optional) Explicitly set program version for Special:Version. Use only if $edgExeVersionCommand is not an option.
    'limits'          => [ 'memory' => 0, 'time' => 0, 'walltime' => 0, 'filesize' => 0 ], // (optional) Limits override for the program. Use with caution.
    'env'             => [ 'KEY' => 'value' ], // (optional) Environment variables for the program. Parameters will be substitued in the values as well as in the shell command itself.
    'command'         => 'lilypond -dbackend=svg -dpaper-size="$size$" -dsafe -dcrop -o $tmp$ -', // The shell command that receives user input as stdin and outputs its result as stdout (or into a temporary file). $size$ will be replaced with the value of the size argument to the parser function.
    'params'          => [ 'size' => 'a4' ], // Parameters to the parser function with their default values. If there is a numeric key, then the value is the name of a required parameter.
    'param filters'   => [ 'size' => '/^\w+$/' ], // Callables and regular expressions that are used to validate parameter values. Should be as restrictive as possible.
    'input'           => 'notes', // Name of the parser function parameter that will be sent to program's standard input.
    'preprocess'      => null, // (optional) A callable used to preprocess the standard input for the program. There are two pre-defined functions: EDConnectorExe::wikilinks4dot() and EDConnectorExe::wikilinks4uml().
    'temp'            => '$tmp$.cropped.svg', // (optional) Name of the temporary file used instead of standard output (not recommended).
    'ignore warnings' => true, // (optional) Ignore warnings that a program may send to stderr.
    'postprocess'     => null, // (optional) A callable used to postprocess program's standard output. There is one pre-defined function EDConnectorExe::innerXML().
    'tag'             => 'score' // (optional) Bind the program to this tag to emulate the behaviour of some MediaWiki extensions (tag emulation mode).
];

Usage[edit]

After a program is configured, it can be invoked in tag emulation mode: ‎<score size="a5">\paper {
indent = 0\mm
line-width = 110\mm
oddHeaderMarkup = ""
evenHeaderMarkup = ""
oddFooterMarkup = ""
evenFooterMarkup = ""
}
\relative c' { f d f a d f e d cis a cis e a g f e e d c}‎</score>
.

This mode outputs raw unwikified data and is suitable for embedding SVG.

Parameters[edit]

All of the parsing-related parameters that #get_web_data supports (|format=, |delimiter=, |use xpath=, etc.) can be used for #get_program_data as well; see Parsing data .

The caching-related parameters (|cache seconds= and |use stale cache=) and settings that #get_web_data supports can be used for #get_program_data as well; see Caching data .

So can be throttling configuration settings : 'throttle key' and 'throttle interval'.

Limitations[edit]

Due to the way that Shell framework quotes each component of a shell command, piping cannot be used. A shell script should be created to wrap a pipeline.

Caution[edit]

Although programs are run in a restricted environment by Shell::command(), wiki admins should exercise great caution while configuring programs to make them callable with #get_program_data.

A set of tested examples can be found here and (with working output) here.