Extension:Pipes

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual - list
Crystal Clear action run.png
Pipes

Release status: experimental

Implementation Data extraction
Description Pass data through a series of data processing nodes
Author(s) Steve Bleazard
Latest version 0.9.10 (9 Oct 08)
MediaWiki 1.10
License GPL
Download http://sourceforge.net/projects/mediawikipipes/

Translate the Pipes extension if possible

Check usage and version matrix; code metrics


WARNING: This extension has serious security implications. Do not use on an unrestricted public access site. See Security Implications for more details. In addition this is an experimental module and the installation requires a reasonable understanding of Unix.

This extension implements external data access and processing through a series of configurable nodes that form a Pipe. The extension was inspired by Unix as well as Yahoo!'s Pipes. The general idea is to support server side Mashups so that data can be processed from multiple sources and presented on a single page. The net effect is to create a Wiki based application framework.

The current implementation use Perl as its scripting language although other languages could be implemented. For Perl based page updates see CMS::MediaWiki

Warning[edit | edit source]

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See Version 2 of the GNU General Public License for more details.

Installation[edit | edit source]

Use the following procedure to install the Pipes extention:

  1. Download the latest version of MediaWiki Pipes from SourceForge and unpack.
  2. Read the README for any version specific information
  3. Copy Pipes.style.css to the end of each of the skins main.css files - these can be found in the directories under the skins directory in the MediaWiki root. For a quick test just update the monobook/main.css as this is the default skin.
  4. Copy the Pipes.php file into the MediaWiki extensions directory and make sure the file has permissions of at least 444
  5. If possible, create a chrooted area to run the Pipes in. See creating a chroot area for more details.
  6. Install the perl module MWPipeTools.pm in the site_perl perl library area, which will be in the chroot area if this option is used. Set the permissions of the installed file to 444.
  7. Make sure suid perl (sperl) is installed
  8. Installing runpipe.pl
    • Edit runpipe.pl and change the perl #! path as appropriate, in most cases this will not be necessary.
    • If required, change the location of the configuration file (/usr/local/etc/runpipe.cfg by default) defined by the CFGFILE variable.
    • Install in a suitable location, usually /usr/local/sbin, as runpipe. Note that the runpipe script must be outside of the chroot area.
    • Change the installed runpipe script to have ownership and group of root. This is only compulsory if the chroot option is used. Set the permissions to 555. When using the chroot option, set the permissions to 4555 (suid to root).
    • See the configuring runpipe section for configuration information.
  9. Create the database tables.
  10. Complete the Pipes configuration and activation. See the configuration section

Creating database tables[edit | edit source]

This process requires write access to the wiki database. If your wiki database tables are prefixed (they are not by default) then first edit the files with the sql extention and add the prefix to the table names. Create the tables with the following command, after CD'ing to the Pipes distribution directory:

cat *.sql | mysql -D WIKIDB -u DBUSER -p

Where,

WIKIDB
Name of the wiki database, wikidb by default.
DBUSER
User with write access to the WIKIDB database

Creating a chroot area[edit | edit source]

Creating a chroot jail is highly system dependent. The jail creation sctipt, lnxmkjail.sh, included in the MediaWiki Pipes distribution works for Ubuntu server and RedHat Enterprise Linux 3. It should also work, possibly with a few changes, with most Linux variants.

Using lnxmkjail.sh[edit | edit source]

sh ./lnxmkjail.sh --perl=/user/bin/perl --target=/usr/local/mw --mwuser=www-data

parameters

--perl=<path> (required)
Specifies the path to the perl binary that will be copied into the jail. This is all that is needed as the required system and perl library files are derived from the command itself.
--target=<path> (required)

path is the directory where the jail will be built. Note that the script creates a single directory jail in the specified directory.

--mwuser=<user> (defaults to mwuser)
The user name of the user running the MediaWiki web server
--extrauser=<space seperated list of user names>

Includes the specified users in the /etc/passwd of the jail. The users must exist in the system /etc/passwd.

--extragroup=<space seperated list of group names>

Includes the specified groups in the /etc/group of the jail. The groups must exist in the system /etc/group.

Note that if LANG or possible one of LANGUAGE or LC_ALL is set then you may get perl warnings about falling back to locale "C" when the new jail is tested at the end of the process.

Configuring runpipe[edit | edit source]

runpipe looks for it's configuration in /usr/local/etc/runpipe.cfg by default. The following values are supported (with the default value in brackets):

RUNPIPE_PERL
The perl path (/usr/bin/perl). This is the path in the chrooted jail area, if jail is used.
RUNPIPE_UID
The user ID to run the script as, can be a user name (nobody)
RUNPIPE_GID
The group ID to run the script as, can be a group name (nobody)
RUNPIPE_MAXERR
Maximum amount of output to standard error, in bytes. Once reached the ouput is truncated (64K)
RUNPIPE_MAXOUT
Maximum amount of output to standard out, in bytes. Once reached the ouput is truncated (64K)
RUNPIPE_MAXTIME
Maximum number of seconds of execution time in real time. Note that real time is used because it is the user experience that counts, not the actual amount of resource. (30)

e.g.

 RUNPIPE_PERL=/usr/bin/perl
 RUNPIPE_UID=nobody
 RUNPIPE_GID=nogroup
 RUNPIPE_MAXERR=65000
 RUNPIPE_MAXOUT=2000000
 RUNPIPE_MAXTIME=30

php.ini[edit | edit source]

It may also be necessary to change the PHP resource limits in php.ini. See the PHP manual under php.ini directives for more information. In general the following may need to be adjusted:

max_execution_time
Maximum execution time of each script, in seconds
max_input_time
Maximum amount of time each script may spend parsing request data
memory_limit
Maximum amount of memory a script may consume (16MB)

Configuration[edit | edit source]

The Pipes extension uses a number of LocalSettings.php configuration settings to control behaviour.

Disable cache[edit | edit source]

As pages will be dynamic caching makes little sense and in fact may result in really strange behaviour. Insert in LocalSettings.php:

 # Disable cache
 $wgEnableParserCache = false;
 $wgCachePages = false;

Operational Control Settings[edit | edit source]

The following settings are required and control the operation of the Pipes module:

 $wgPipeJail = "/usr/local/pipes/jail";
 $wgPipeScriptPath="/scripts";
 $wgRunPipe = "/usr/local/sbin/runpipe";
 $wgPipeMultiProcess = false;


They are used as follows:

$wgPipeJail
The path to the chrooted area that the pipe will be run in. Set to '/' to disable chroot usage
$wgPipeScriptPath
The path to the directory used to place the temporary script files. This directory must exist and be writable by the wiki web server. The path is relative to the jail root when this option is used. Defaults to /scripts, which is automatically created by the supplied jail creation scripts.
$wgRunPipe
Path to the runpipe script used to execute the pipes
$wgPipeMultiProcess
Currently, unless you are developing the Pipes extension, set this to false. This controls the way the pipe is executed - as a single program when false, or as a series of linked processes when true. Setting this to true may improve performance on a multi-core processor but only if the Pipes are defined properly.

Data Sources[edit | edit source]

The data sources are configured via the $wgPipeDataCofnig associative array. Here is an example LocalSettings.php entry:

 $wgPipeDataConfig = array(
   "perl" => "/usr/local/bin/perl",
   "files" => array(
     "errors" => "/data/errors",
     "giants" => "/data/giants",
     "runts" => "/data/runts"
   ),
   "databases" => array(
     "db1" => array(
       "type" => "mysql",
       "user" => "dbuser",
       "password" => "dbuser",
       "server" => "mydbserver.my.dom",
       "database" => "db1db",
       "port" => "3306",
       "tables" => array("table1", "table2", "table3")
     )
   )
 );

Data Source Configuration[edit | edit source]

The following $wgPipeDataConfig keys are processed by the extension:

perl
The path to the perl instance that will be run by the Pipes.php extension to execute the Pipe script. Note that where a chrooted environment is used this will be relative to the chroot directory.
files
An associative array with the keys being the name the pipe definitions will use to refer to the file and the values specifying the path (relative to the chroot directory if chroot is used) of the file.
databases
An associative array with the keys being the name the pipe definitions will use to refer to the database and the values the corresponding configuration data for the database.

Database Configuration[edit | edit source]

The database configuration entries each require the following key/value pairs:

Parameter Description
type Type of database: currently Mysql and Oracle are supported.
user Username to access database with. This user should only have read access to the database tables that the Pipes will access. All other access should be removed.
password Password to access database with.
server FQDN or IP address of server to connect to
port Port on server to access database on
database Name of database on the server to access. If this value is missing the key used to access this entry is used instead. Note that for Oracle this is the SID value.
tables An array listing the tables that the database allows access to, this is used to catch spelling errors and not for security


Note that for the database interface to work the Perl libraries must include the appropriate DBD drivers and, especially in the case of oracle, the necessary operating environment needs to be set up. Note that to pass environment variables to the chrooted execution environment it is necessary to edit runpipe and fix the %ENV variable - use static values only, do not use existing values from the environment unless you know exactly what you are doing.

Activation[edit | edit source]

Add the following lines to LocalSettings.php:

 # use the Pipes extension
 include_once("extensions/Pipes.php");

Usage[edit | edit source]

Assuming the following are all defined on the page "TestPipe" and that the files used in the tests are already defined in LocalSettings.php. For these examples the files contain data as one entry per line in the format

 device,interface,count

eg.

 rtr1,int5/1,12

Define some file pre-processing elements:

 <pipe-define name="PortGiants">
   <pipe-input name="GIANTS" type="file" fname="giants" action="split" pat="," />
   <pipe-action type="perl">
   foreach $p (@GIANTS) { $p->[2] > 1   && push(@err, $p); }
   </pipe-action>
   <pipe-result name="PortGiants" var="@err" />
 </pipe-define>
 <pipe-define name="PortRunts">
   <pipe-input name="RUNTS" type="file" fname="runts" action="split" pat="," />
   <pipe-action type="perl">
     foreach $p (@RUNTS) { $p->[2] > 1   && push(@err, $p); }
   </pipe-action>
 <pipe-result name="PortRunts" var="@err" /></pipe-define>
 <pipe-define name="PortErrors">
   <pipe-input name="ERRORS" type="file" fname="errors" action="split" pat="," />
   <pipe-action type="perl">
   foreach $p (@ERRORS) { $p->[2] > 1   && push(@err, $p); }
   </pipe-action>
   <pipe-result name="PortErrors" var="@err" />
 </pipe-define>

A pipe that analyses the data for errors

 <pipe-define name="PortLargeErrors">
   <pipe-input name="ERRORS" type="pipe" src="TestPipe.PortErrors" />
   <pipe-action type="perl">
   foreach $p (@ERRORS) { $p->[2] > 10   && push(@err, $p); }
   </pipe-action>
   <pipe-result var="@err" />
 </pipe-define>

The following pipe then combines the output from two pipes generating an array of entries where both GIANTS and RUNTS are greater than 0.

 <pipe-define name="PortRuntGiantError">
   <pipe-input name="GIANTS" type="pipe" src="TestPipe.PortGiants" />
   <pipe-input name="RUNTS" type="pipe" src="TestPipe.PortRunts" />
   <pipe-action type="perl">
   foreach $p (@RUNTS) { $port{$p->[0]}{$p->[1]}[0] = $p->[2]; }
   foreach $p (@GIANTS) { $port{$p->[0]}{$p->[1]}[1] = $p->[2]; }
   foreach $d (sort keys %port)
   {
     foreach $i (sort keys %{$port{$d}}) 
       { $port{$d}{$i}[0] != 0   && $port{$d}{$i}[1] != 0 && push(@RGERR, [ $d, $i, @{$port{$d}{$i}} ]); }
   }
   </pipe-action>
   <pipe-result var="@RGERR" />
 </pipe-define>

Similarly this uses the combined output with another pipe to generate a single stream where all three of GIANTS, RUNTS and ERRORS are none zero:

 <pipe-define name="PortSignificantError">
   <pipe-input name="ERRSIZE" type="pipe" src="TestPipe.PortRuntGiantError" />
   <pipe-input name="LERR" type="pipe" src="TestPipe.PortLargeErrors" />
   <pipe-action type="perl">
   foreach $p (@ERRSIZE) { @{$port{$p->[0]}{$p->[1]}}[0,1] = @{$p}[2,3]; }
   foreach $p (@LERR) { $port{$p->[0]}{$p->[1]}[2] = $p->[2]; }
   foreach $d (sort keys %port)
   {
     foreach $i (sort keys %{$port{$d}})
     {
       if ($port{$d}{$i}[0] != 0   && $port{$d}{$i}[1] != 0 && $port{$d}{$i}[2] != 0)
         { push(@ERR, [ $d, $i, @{$port{$d}{$i}} ]); }
     }
   }
   </pipe-action>
   <pipe-result var="@ERR" />
 </pipe-define>

The next pipe is a utility pipe, taking data in via the Interface2CardPortIn binding and outputting the result to the Interface2CardPortOut binding. It must be used in conjunction with a <pipe-execute> element. See later.

 <pipe-define name="Interface2CardPort">
   <pipe-input name="INDATA" type="bind" src="Interface2CardPortIn" />
   <pipe-action type="perl">
   foreach (@INDATA) { $_->[1] =~ /^int(\d+)\/(\d+)$/ and push(@$_, ($1, $2)); push(@OUTDATA, $_); }
   </pipe-action>
   <pipe-result name="Interface2CardPortOut" var="@OUTDATA" />
 </pipe-define>

Using the output from one pipe and passing it though the utility pipe. Note that in the <pipe-bind-in…> another variable could have been used if the data was pre-processed by the pipe:

 <pipe-define name="PortSignificantErrorInt">
   <pipe-input name="ERR" type="pipe" src="TestPipe.PortSignificantError" />
   <pipe-execute src="TestPipe.Interface2CardPort">
     <pipe-bind-in name="Interface2CardPortIn" var="@ERR" />
     <pipe-bind-out name="Interface2CardPortOut" var="XERR" />
   </pipe-execute>
   <pipe-result var="@XERR" />
 </pipe-define>

Report Generation - this pipe actually generates the wiki text that will be rendered.

 <pipe-define name="PortSignificantErrorReport">
   <pipe-input name="ERR" type="pipe" src="TestPipe.PortSignificantErrorInt" />
   <pipe-action type="perl">
   foreach (@ERR) { $_->[0] eq $PARAM{dev}   && push(@OUT, $_); }
   print Table2Wiki(\@OUT, {tblattr => 'cellspacing="0" cellpadding="5" border="1"',
                            hdr => [ "Dev", "Port", "Card", "Int", "Runts", "Giants", "Large" ],
                            hdrattr => 'align="left" style="background-color: rgb(204, 204, 204);"',
                            oddattr => 'style="background-color: rgb(255, 255, 204);"',
                            columns => [0, 1, 5, 6, 2, 3, 4],
                           });
   </pipe-action>
 </pipe-define>

Run Test Pipe, this version runs the Pipe directly, using pipe-run's built in wiki generation.

 <pipe-run name="TestPipe.PortSignificantErrorInt" >
   <pipe-format type="Tabler" >
   TblAttr cellspacing="0" cellpadding="5" border="1"
   Hdr Dev Int Card Port Runts Giants Errors
   HdrAttr align="left" style="background-color: rgb(204, 204, 204);"
   OddAttr style="background-color: rgb(255, 255, 204);"
   Columns 0 1 5 6 2 3 4
   </pipe-format>
 </pipe-run>

Run the report pipe version, running the same pipe as the previous run but via the report generation pipe. However, in this case the dev parameter is passed, causing the reporting pipe to only report on the specified device

 <pipe-run name="TestPipe.PortSignificantErrorReport" >
   <pipe-param name="dev" value="rtr1" />
 </pipe-run>

Overview[edit | edit source]

Pipes are XML based and as such a number of rules that must be followed:

  • parameters must be surrounded by double quotes (")
  • < cannot be used expect in the XML syntax and must be replaced by &lt;
  • Outside of the XML wrapping &gt; will be replaced by > and &apos; will be replaced by '. These may be required in some cases
  • The > symbol can appear in the perl code but, as noted earlier, < must be replaced by &lt;
  • Elements that have no closing element (<br> in HTML is an example) must be ended with />. eg <br />

Defining Pipes[edit | edit source]

Pipes have the general format

<pipe-define name="NAME" ...options...>
[ pipe processing elements ]
</pipe-define>

The pipe processing elements consist of one or more of the elements defined below. They are grouped into a number of areas:

  • Input data specification with the <pipe-input…> element
  • Script definition with the <pipe-action…> element
  • Pipe execution with the <pipe-execute…> element
  • Output specification with the <pipe-result…> element
  • Debugging options with the <pipe-debug…> element

In general the order of these elements is not important. However, for <pipe-action…> and <pipe-execute…> the order is important: If the <pipe-action…> comes before the <pipe-execute…> then the script will be run before the <pipe-execute…> specified pipe is executed otherwise it will be after. There may be multiple <pipe-action…> and <pipe-execute…> elements. However, all the <pipe-action…> elements are combined into a single script and executed together. The <pipe-execute…> elements are executed in order with the <pipe-action…> script run at the point of the first <pipe-action…> definition.

pipe definition options[edit | edit source]

The following options are valid in the pipe-define element

  • name - name of the pipe. This will be used to refer to the pipe - this [NAMESPACE:]PAGETITLE.NAME is a unique reference to a pipe

Input streams - pipe-input[edit | edit source]

The pipe-input element defines where a pipe gets the input data and how this maps to script variables. The following attributes are defined for the element:

  • name - name of the script variable the data will be bound to. This attribute is required.
  • type - the type if input element. This required element may be one of:
    • file for a file local to the wiki installation
    • data for directly specified data with the element
    • db for a pre-defined database interface (see Data Source configuration)
    • dbconnect for a direct pre-defined database interface (see Data Source configuration)
    • pipe is another pipe
    • bind is used for pipes that are accessed via pipe-execute and defines the binding to the formal parameter

Multiple input streams from any mix of sources are supported as long as each one is bound to a different variable. The following sections detail the options for each of the input types

pipe-input type="file"[edit | edit source]

The following parameters are valid for the file input type

  • fname - name of file to access. This is defined in the wiki configuration. See Data Sources for more details. This parameter is required.
  • action - set to split to request that the input is split using the pattern specified in the pat attribute. This attribute is optional
  • pat - specifies the split pattern when action="split" is specified

The result of this input type is that the script variable specified in the pipe-input element, treated as an array, is filled with the contents of the file, one line per element. If action=split is specified the array will contain array references and not lines. Thus,

<pipe-input name="FOO" type="file" fname="foofile" />

With foofile containing

a,1
b,2

Will result in @FOO = ("a,1", "b,2"). The following

<pipe-input name="FOO" type="file" fname="foofile" action="split" pat="," />

With the same data will result in @FOO = (["a", 1], ["b", 2]);

pipe-input type="data"[edit | edit source]

The following parameters are valid for the data input type

  • split - specifies the split pattern used to split the fields in the input. If missing the lines are not split. An empty string ("") splits on white space.

The result of this input type is that the script variable specified in the pipe-input element, treated as an array, is filled with the contents of the elements data, one line per element. If split="..." is specified the array will contain array references and not lines. Thus,

<pipe-input name="FOO" type="data" split="" />
a 1
b 2
<pipe-input>

Will result in @FOO = (["a", 1], ["b", 2]);

pipe-input type="db"[edit | edit source]

The following parameters are valid for the db input type

  • dbname - name of database to access. This is defined in the wiki configuration. See Data Sources for more details. This parameter is required.
  • tname - the table name to access in the database - this is required
  • cond - the conditional component of the SQL statement. In the absence of the sql attribute the SQL will be set to select * from tname where cond. This attribute is optional and if missing the SQL statement becomes select * from tname.
  • sql - a complete SQL statement to use. This replaces the automatically generated statement if present. If the element contents are not empty the value of this attribute is set to the contents.

With the cond and sql attributes the parameter reference $PARAM{...} is replaced by the corresponding value from the pipe-param or pipe-param-post element. See Passing parameters - pipe-param

The results of the query are placed in the variable specified in the pipe-input element (treating as an array). The net result will be an array of array references. Thus,

<pipe-input name="FOO" type="db" dbname="mydb" tname="mytable" />

Will access mydb table mytable using select * from mytable. The following

<pipe-input name="FOO" type="db" dbname="mydb" tname="mytable" cond="name='bar'" />

Will access mydb table mytable using select * from mytable where name=’bar’. The same result is obtained with

<pipe-input name="FOO" type="db" dbname="mydb" tname="mytable" sql="select * from mytable where name='bar'" />

and similarly

<pipe-input name="FOO" type="db" dbname="mydb" tname="mytable" >
select * from mytable where name='bar'
</pipe-input >

pipe-input type="dbconnect"[edit | edit source]

The following parameters are valid for the db input type

  • dbname - name of database to access. This is defined in the wiki configuration. See Data Sources for more details. This parameter is required.

This input method provides a direct connection to the back-end database, returning the DBI handle in the specified scalar variable. The script can then make use of this handle to access the database. The method is provided to allow strong control over the database access allowing complex scripted queries to be created. Thus,

<pipe-input name="DBH" type="dbconnect" dbname="mydb" />

Will create a connection to mydb and return the DBI handle in the variable DBH. Then the <pipe-action...> can make use of this handle:

<pipe-action type="perl" >
$sql = "select * from mytable where name='bar'";
$DBH->prepare($sql);
...
</pipe-action >

pipe-input type="pipe"[edit | edit source]

The following parameters are valid for the pipe input type

  • src - name of pipe to execute. This specifies the name of the pipe to run in the format [NAMESPACE:]PAGETITLE.PIPENAME. If NAMESPACE: is missing it defaults to (main).
  • data - specifies the name of the output data item to bind to. This defaults to PIPENAME, which is also the default for pipe-result.

The result of this input type is that the script variable specified in the pipe-input element is set to the output value from the specified pipe. Pipes support compound data types being passed from pipe to pipe and as such the documentation for the pipe being run needs to be consulted to determine type and contents of it's output variable

<pipe-input name="FOO" type="pipe" src="Pipes:My Pipe Page.My Pipe" />

With Pipes:My Pipe Page.My Pipe outputing an array (@RESULT) will set @FOO to the contents of the array. If Pipes:My Pipe Page.My Pipe output a hash (%RESULT) then %FOO would be set to the contents of this hash. The following

<pipe-input name="FOO" type="pipe" src="Pipes:My Pipe Page.My Pipe" data="OutHash" />

Will set %FOO to the contents of the Pipe Pipes:My Pipe Page.My Pipe result variable with the name OutHash (assuming the output value from the pipe is a hash!). See Output stream - pipe-result for more details on specifying output variables.

pipe-input type="bind"[edit | edit source]

The following parameters are valid for the bind input type

  • src - name of the binding variable to connect to. This is in effect the procedure side of a call using parameter passing by name.

The result of this input type is that the script variable specified in the pipe-input element is set to the value passed to the pipe in the variable specified in the src attribute. Thus,

<pipe-input name="FOO" type="bind" src="indata" />

will result in FOO containing the value passed in indata. The type will depend on the type of data passed in indata. See Input parameter passing - pipe-bind-in for more details on passing parameters to a pipe.

Data processing - pipe-action[edit | edit source]

pipe-action specifies the script to run to perform the processing part of the script. While optional, except for pipes that perform data formatting with the built in formatting elements, it is required for the pipe to do anything useful. The following attributes are supported

  • type - type of script. Currently only "perl" is supported. This attribute is required.

Eg,

<pipe-action type="perl" >
  foreach (@INDATA) { $_->[0] =~ /^int(\d+)\/(\d+)$/ and push(@$_, ($1, $2)); push(@OUTDATA, $_); }
</pipe-action >

During the execution of the pipe, as well as the specified input binding variables, the PARAM hash is available and contains the parameters passed by the <run-pipe...> through the <pipe-param...> and <pipe-param-post...> elements. Thus, in the following example, parameter dev is accessed as $PARAM{dev} in any of the pipes that are execute.

<pipe-run name="TestPipe.PortSignificantErrorReport" >
  <pipe-param name="dev" value="rtr1" />
  <pipe-param name="mode" value="full" />
</pipe-run>

Output stream - pipe-result[edit | edit source]

pipe-result specifies the output or result of the pipes execution. The following attributes are supported:

  • var - the variable that contains the result. The type of variable ($@%) must be specified. Variables can be compound although code references will not work correctly.
  • name - name of the output variable. Defaults to the name of the pipe (which ties in with <pipe-input type="pipe".../>)

The result is that the results of the pipe are available via name. Thus, for the pipe Pipes:My Page.My Pipe

<pipe-result var="@XERR" />

Will make @XERR the result of the pipe's execution available via the variable My Pipe. A using pipe would then bind to the output using:

<pipe-input name="ERR" type="pipe" src="Pipes:My Page.My Pipes" />

Specifying an alternate variable allows multiple output from a single pipe:

<pipe-result var="@DATA" />
<pipe-result var="@XERR" name="Errors" />

However only pipe-execute can make use of multiple output variables. See Building pipe chains - pipe-execute. This is an implementation limitation

Building pipe chains - pipe-execute[edit | edit source]

The following attributes are valid for the pipe-execute element:

  • src - name of pipe to execute. This specifies the name of the pipe to run in the format [NAMESPACE:]PAGETITLE.PIPENAME. If NAMESPACE: is missing it defaults to (main). This attribute is required

Data is passed in to the pipe using the pipe-bind-in child element of the pipe-execute element and similarly results are obtained by using the pipe-bind-out child element. Thus the normal structure of a pipe-execute element is:

<pipe-execute src="Page.Pipe">
  <pipe-bind-in name="PipeIn" var="@INVAR" />
  <pipe-bind-out name="PipeOut" var="OUTVAR" />
</pipe-execute>

Input parameter passing - pipe-bind-in[edit | edit source]

pipe-bind-in binds a perl variable to the pipe input variable (specified in the <pipe-input type="pipe"...>) element of the pipe being executed. The following attributes are valid for the pipe-bind-in element:

  • name - name of pipe input variable to bind to. This attribute is required
  • var - name of perl variable to bind name to. This attribute is required
  • data - value to set contents of variable name to. If set will be used to populate var allowing static content to be bound to a pipe parameter. The contents of the element itself, if non-blank, will be used for this attribute. Note that how the data is copied to the variable depends on the perl type of the variable name. For a scalar the value is just the plain data. For an array the value it is the data split on new line. For a hash the data is split on newline and each line treated as key <space> value. Note that this parameter is not required when the element has contents - variable name will be populated automatically from the elements contents.

Multiple pipe-bind-in elements are permitted and will create multiple bindings.

Thus, in the following the perl variable @ERR will be bound to the pipe variable PipeIn:

<pipe-execute src="My Page.My Pipe">
  <pipe-bind-in name="PipeIn" var="@ERR" />
  <pipe-bind-out name="PipeOut" var="XERR" />
</pipe-execute>

In the following the Perl variable $VAL's value will be set to some data and bound to the pipe variable PipeIn:

<pipe-execute src="My Page.My Pipe">
  <pipe-bind-in name="PipeIn" var="$VAL" data="some data" />
  <pipe-bind-out name="PipeOut" var="XERR" />
</pipe-execute>

Similarly, in the following the Perl variable $VAL's value will be set to 1 2 3 and bound to the pipe variable PipeIn:

<pipe-execute src="My Page.My Pipe">
  <pipe-bind-in name="PipeIn" var="$VAL">1 2 3</pipe-bind-in>
  <pipe-bind-out name="PipeOut" var="XERR" />
</pipe-execute>

Output parameter passing - pipe-bind-out[edit | edit source]

pipe-bind-out binds a perl variable to the pipe output variable (specified in the <pipe-result ...> element of the pipe being executed. The following attributes are valid for the pipe-bind-out element:

  • name - name of pipe output variable to bind to. If the name of the variable is not specified in the <pipe-result...> of the executed pipe then it defaults to the name of the pipe. This attribute is required
  • var - name of perl variable to bind name to. This attribute is required.

Thus, in the following the perl variable @CERR will be bound to the pipe variable PipeOut:

<pipe-execute src="My Page.My Pipe">
  <pipe-bind-in name="PipeIn" var="@ERR" />
  <pipe-bind-out name="PipeOut" var="XERR" />
</pipe-execute>

When this element is executed the pipe My Page.My Pipe will be run with the pipe parameter PipeIn set to the value of @ERR. The results will be placed in @XERR.

Debugging Options[edit | edit source]

Debugging pipes can be difficult with seeing the input and output data. To help with this process the pipe-define element accepts a pipe-debug child element that controls the reporting level. The output from debugging will be appended to the page, after the normal output. The format of the elemnt is:

<pipe-debug class="CLASS" />

pipe-debug[edit | edit source]

The pipe-debug element accepts the following attributes:

  • class - specifies the class of debugging output to produce. One of:
    • all - generate all classes of output
    • datain - dump the pipe input data
    • dataout - dump the pipe output data
    • pipe - debug the pipe flow, showing the execution of each of the pipes


Running a pipe[edit | edit source]

Pipes are executed using the pipe-run element. These elements have the format:

<pipe-run name="NAME" ...options...>
[ pipe execution elements ]
</pipe-run>

Where pipe exection elements are one or more of the following elements:

  • <pipe-param…> - used to pass parameters to the pipe.
  • <pipe-param-post…> - used to pass post and form data to the pipe.
  • <pipe-param-post-re…> - used to pass post and form data to the pipe based on a regular expression.
  • <pipe-param-post-magic…> - used to pass page context magic variables to the pipe.
  • <pipe-run-if-set…> - execution based on a parameter or post value being set or not
  • <pipe-format…> - formatting of the pipe results
  • <pipe-debug-global…> - global debugging

eg.

<pipe-run name="TestPipe.PortSignificantErrorReport" >
  <pipe-param name="dev" value="cr12-ln16" />
  <pipe-param name="mode" value="full" />
  <pipe-param-post name="report" />
  <pipe-run-if-set name="report" />
</pipe-run>

Pipe execution - pipe-run[edit | edit source]

The pipe-run element executes a specified pipe. The following attributes are defined:

  • name - specifies the name of the pipe to run. This must be the full name including namespace (if it's not MAIN), page name and pipe name in the format [{NAMESPACE}:]PAGE.PIPE.
  • dumpcode - set to yes to dump the generated perl code
  • dumpwiki - set to yes to dump the wiki text output by the pipe
  • dumprequest - set to yes to dump the values passed to the script via POST. This can be used to help debug FORM based pipes.

The results of the pipe execution are passed to the wiki text rendering engine. As a result any valid wiki text, including extensions, may be used in the final output. The standard error output from the pipe is appended to the final page as a pre formatted block, as is the output from any of the selected dumpxxx attributes.

The execution time and amount of output that a pipe can generate is controlled by the installation parameters. By default these are 64k of wiki text, 64k of error output and 30 seconds real time execution. Note that long running pipes are a bad idea for web pages as the users will often get bored after 15 seconds and give up! In general large amounts of processing should be performed off-line with only high quality data accessed and processed by pipes.

Passing parameters - pipe-param[edit | edit source]

The pipe-param element allows values to be passed to the Pipe. In Perl this is performed via the variable %PARAM. The following attributes are accepted

  • name - specifies the name of the parameter
  • value - specifies the value

value may be missing but in which case the element must have content that will become the value.

<pipe-param name="reporttable" value="yes" />

Will pass the value yes in the variable reporttable to the script. In Perl this will be compiled to setting $PARAM{reporttable} to yes. The following shows the content version of the same:

<pipe-param name="reporttable">yes</pipe-param>

Passing forms data - pipe-param-post[edit | edit source]

The pipe-param-post element allows values to be passed to the Pipe via a form. In Perl this is performed via the variable %PARAM. The default value is specified in the same way as <pipe-param...>. The following attributes are accepted

  • name - specifies the name of the parameter
  • value - specifies the default value if the form value is missing

value may be missing in which case there will be no default value. If the element has content then these will be used as the default value. Thus,

<pipe-param-post name="reporttable" />

Will attempt to access reporttable from the request data with no default value.

<pipe-param-post name="reporttable" value="yes" />

Will pass the value of reporttable from the request with a default value of yes if the variable is not present. In Perl this will set $PARAM{reporttable} to the value. The following shows the content version of the same:

<pipe-param-post name="reporttable">yes</pipe-param>

Passing forms data - pipe-param-post-re[edit | edit source]

The pipe-param-post-re element is similar to pipe-param-post but passes the values of parameters whose name matches a specified pattern to the Pipe. The following attributes are accepted

  • name - specifies a perl regular expression that is used to match post parameter names

This element is important for check boxes associated with tables of values where the name of the check box variable is automatically created. Thus,

<pipe-param-post-re name="^del_" />

Will copy all post parameters that start with del_. In Perl this will set $PARAM{del_...} to the value.

Passing MediaWiki magic word values - pipe-param-magic[edit | edit source]

The pipe-param-magic element allows values to be passed to the Pipe from MediaWiki's magic wprds. In Perl this is performed via the variable %PARAM. The value is the string to pass with all {{...}} replaced by the value of the corresponding MediaWiki magic words.

  • name - specifies the name of the parameter
  • value - string to replace magic words in

Thus,

<pipe-param-magic name="revision" value="Revision is {{REVISIONID}} " />

Will set $PARAM{revison} to the 'Revsion is 1' assuming the page revision is 1.

conditional execution - pipe-run-if-set[edit | edit source]

When pipes and forms are used together it is often necessary to prevent pipe execution until a parameter has been entered. To make this possible the <pipe-run-if-set.../> element is used to test a specific parameter and only execute the pipe if it has been set. The format of the elemnt is:

<pipe-run-if-set name="PARAM" />

Where, PARAM is the name of the parameter to test. Multiple <pipe-run-if-set.../> elements may be present and all must be satisfied for the pipe to execute.

Formatting the output - pipe-format[edit | edit source]

The pipe-run element runs the specified pipe and expects it to output the wiki-text. However, in some cases it may be desirable to run a pipe that does not output wiki text but returns the results instead. To allow this type of execution the pipe-format element exists.

The following formatting options exists:

  • Texter - converts output to simple text
  • Tabler - Creates a table

The General format of the pipe-format element is

<pipe-format type="TYPE" bind="VAR">
...format parameters...
</pipe-format>

Where,

  • type is one of the above types
  • VAR is the name of the variable to bind to in the pipes results. This is optional and defaults to the name of the pipe being run.
  • formatting parameters are type specific and detailed in the following sections

pipe-format type="Tabler"[edit | edit source]

The Tabler formatting type creates a table from the pipe output. The element content is used to drive the formatting process. The content must be in the format param " " value. The following params are defined:

  • TblAttr - the table wiki attributes
  • Hdr - A space separated list of table column headers. To include a space in a column header enclose the whole header in double quotes (")
  • HdrAttr - the header wiki attributes
  • EvenAttr - attributes for even rows in the table
  • OddAttr - attributes for odd rows in the table
  • Columns - space separated list of column numbers to include in the output (0 based). This can be used to re-order the columns in the table.

Example:

<pipe-format type="Tabler" >
  TblAttr cellspacing="0" cellpadding="5" border="1"
  Hdr GPIN EMPLID UBS_FIRST_NAME UBS_LAST_NAME
  HdrAttr align="left" style="background-color: rgb(204, 204, 204);"
  OddAttr style="background-color: rgb(255, 255, 204);"
  Columns 0 3 2 7
</pipe-format>

pipe-format type="Texter"[edit | edit source]

The Texter formatter converts the output, which must be an array, into a string joined with a supplied string. he element content is used to drive the formatting process. The content must be in the format param " " value. The following params are defined:

  • Join - the string to use to separate the values, defaults to space
  • Columns - columns to extract from the array. Defaults to all.
<pipe-format type="Texter" >
Columns 1
Join ,
</pipe-format>

Note that if columns is missing then all columns will be joined. The default join string is space. The data passed to Texter can be a simple list, an array of array references or an array of hash references. In the latter case the values will be the columns.

Using Tabler Directly[edit | edit source]

It is possible to use Tabler functionality directly in the action element of a pipe. To do so in Perl, call the Tabler function as follows:

$out = Table2Wiki(\@data, \%config);

In which the array @data contains the table data to be output as a array of array refs.  %config contains the Tabler configuration information: The keys are the lower case versions of the Tabler configuration items, with all but the hdr and columns being strings. The hdr and columns values are array references. Eg,

$out = Table2Wiki(\@ERR, {tblattr => 'cellspacing="0" cellpadding="5" border="1"',
                          hdr => [ "Int", "Port", "Runts", "Giants", "Large"],
                          hdrattr => 'align="left" style="background-color: rgb(204, 204, 204);"',
                          oddattr => 'style="background-color: rgb(255, 255, 204);"',
                          columns => [4, 5, 1, 2, 3]});
print $out;

global debugging - pipe-debug-global[edit | edit source]

Debugging can be enabled globally in using the <pipe-debug-global... /> element. In this case, the specified debugging is applied to all pipes. The format of the elemnt is:

<pipe-debug-global class="CLASS" />

For a list of valid classes, see the Pipe Debug section

Security Implications[edit | edit source]

General Security[edit | edit source]

Database passwords are not secure
Due to the way the back end implementation works the final Perl source code can be read by even a moderate programmer. In fact, for debugging purposes the code can be dumped by the Pipe writer. This means any database passwords are not secure. In general, the database should be protected via it's own security and, as appropriate, a firewall such as IPTables.
Files are protected via OS security
The file linkage is more a convenience than anything else. The Perl code can easily just open the file and read/write. This means that the files must have proper access control. Under Unix this means read only access and ownership by a different user to the one the Pipes are run as, otherwise the script can just change the permissions.
Any executable accessible by the Pipe code can be run
This has significant implications not least of which is that it would make turning your web server into a mail spam generator very easy.
Any perl code can be run
This basically means that your server can be turned into a spam bot in a half a dozen lines of code. Mix this with execution of a mail transfer agent and you have a spammers dream. This can be mitigated with a firewall, even IPTables or similar, by blocking all outbound traffic except existing sessions
Resource usage controls are very crude
A moderately clever user could create a pipe the did password cracking, for example, and with a small amount of cleverness get around the per pipe time limitations.

All in all this adds up to a number of implications for any web site using pipes:

  1. Do not connect any such server directly to the internet.
  2. Make sure all users are strongly authenticated and identified. In general a Web server based authentication process is preferred over MediaWiki's built in authentication.
  3. Make sure any databases and database servers are properly protected and that the Pipes user has minimum access

Linux specific mitigation[edit | edit source]

Under Linux, and in general most Unix variants, the implementation makes a number of attempts to protect the host environment from accidental as well as deliberate damage.

Scripts are run in a restricted environment
The Linux version runs the final Pipe script in a chrooted environment with no write access to the local disk. This prevents the disk being filled up but may prevent some perl modules from working
Output is size limited
To prevent huge amounts of output the script is limited to a specific output size, 64K by default. Error output is similarly controlled. These controls are performed outside of the script and cannot be easily subverted.
Execution time is limited
To prevent resource hogging as well as ensure a good user experience the execution time (real not CPU) is limited to a set time. By default this is 10 seconds.

Additional Linux Security[edit | edit source]

In addition the following can improve the security:

Block all outbound and inbound traffic except inbound to port 80, outbound established sessions and to/from specific machines
This will prevent the web server being turned into a spam bot or web server hacking tool. This work's best if the server is restricted to providing the Pipes enabled Wiki service.



FAQ[edit | edit source]

Why!?[edit | edit source]

Pipes was born out of a need to process and generate reports on data coming from many different sources. Traditionally, this would have been solved by developing a fat database application. The problem with this approach is the development time and resources required. Also, often new data sources are added and old ones change, requiring updates to the software.

Another big issue with the traditional approach is data ownership. Often, especially in large corporations, there are well defined data owners with there own data requirements. This makes developing the fat application more difficult as more data owners need to buy into the project.

Enter Pipes. The idea is to separate the three parts of a data processing application: Data gathering, data processing and report generation. The data gathering in Pipes is the configuration of the backend data access. Ideally, all data should be aggregated close to the Wiki server via connectors that standardise the data and insulate the Wiki applications from schema changes as well as database migrations etc. The rendering of the final report is significantly simplified by the Wiki, allowing wikitext and extensions to be used to generate the final output.

The processing is where Pipes adds the value: Basically, they are processing components that can be glued together. So a database of widgets produced in each factory can be passed through a factory to country component and then a summary component which finally renders as a pie chart of the number of widgets per country. All the components are re-usable as they each perform a fairly generic task: pulling widget data from a database, mapping factory to country and summarising data.

The net result is applications can be developed very quickly with minimum effort.

Why Perl?[edit | edit source]

Mainly because I do most of my data processing in Perl and providing a MediaWiki interface to this type of work was my main goal. PHP would have been the other obvious choice, but I don't speak it with sufficient fluency to do some of the tricks I needed to. Also, I am not sure that runpipe.pl could have been written in PHP, especially when the security features are included.

What is the performance like?[edit | edit source]

Pipe definitions are compiled at save time so there is little overhead for syntax checking and validation. Pipe code generation is fast as it simply glues bits of code together into a script and executes. Execution wise, each pipe-run incurs the overhead of starting a separate process, this isn't large but having 100's of pipe-run's on a page will have a performance impact. In general the script itself will have the most impact on performance.

The following guidelines should help avoiding significant performance issues:

  • Pre-process data requiring large amounts of crunching. This will have the biggest single impact on performance in any realistic applications. For example, in capacity planning applications pre-creating averages, min-max values etc, and storing them in separate tables will allow the pipe application to leverage this pre-processing and dramatically speed up the pipe execution.
  • Generate tables with more than a couple of rows in a single pipe-run rather than having a pipe-run perl table entry
  • Hashes are your friend and almost as fast as scalar's. Use them instead of loops searching an array.
  • When processing large arrays use map and grep rather than foreach, lots of interpreter overhead is avioded
  • Where loops are executed a large number of times keep the code in the loop to a minimum and use the following speed freak tricks as appropriate:
    • turning an array into a hash: @x{@x} = (1) x @x;
    • processing every element of an array: grep(<bit of code>, @x). Eg grep($c++ if $_ eq "yes", @x);
    • reversing a hash mapping: %r = reverse %h;

Do Pipes work under SELinux?[edit | edit source]

Currently, no. This means that Fedora 8, and possibly later, need to have SELinux disabled (setenforce 0) at present. It should be possible to configure the MAC settings to allow Pipes to write and execute the code but some care would be needed to ensure it adds any value beyond the chroot jail. It's also important to note that securing the host is only half of the issue. See Security Implications.