Extension:ParserFunctions/String functions

This extension defines an additional set of parser functions that operate on strings.

Note: These functions are currently not installed at the Wikimedia projects.

Functions
This module defines these functions: len, pos, rpos, sub, pad, replace, explode, urlencode, and urldecode</tt>.

Note: Some parameters of these functions can be limited through global settings to ensure the functions operate in O(n) time complexity, and are therefore safe against DoS attacks. See section Limits below.

#len:
The #len function returns the length of the given string. The syntax is:

The return value is always a number of characters in the string. If no string is specified, the return value is zero.

Note: Trailing spaces are not counted. Example:  returns 8.

Note: This function is safe with utf-8 multibyte characters. Example:  returns 8.

#pos:
The #pos function returns the position of a given needle within the string. The syntax is:

The offset parameter, if specified, tells a starting position where this function should begin searching.

If the needle is found, the return value is a zero-based integer of the first position within the string. If the needle is not found, the function returns an empty string.

Note: This function is case sensitive.

Note: The maximum allowed length of the needle is limited through $wgStringFunctionsLimitSearch global setting.

Note: This function is safe with utf-8 multibyte characters. Example:  returns 4.

#rpos:
The #rpos function returns the last position of a given needle within the string. The syntax is:

If the needle is found, the return value is a zero-based integer of its last position within the string. If the needle is not found, the function returns -1.

Tip: When using this to search for the last delimiter, add +1 to the result to retreive position after the last delimiter. This also works when the delimiter is not found, because "-1 + 1" is zero, which is the beginning of the given value.

Note: This function is case sensitive.

Note: The maximum allowed length of the needle is limited through $wgStringFunctionsLimitSearch global setting.

Note: This function is safe with utf-8 multibyte characters. Example:  returns 4.

Note: When this extension is running on PHP 4, the needle can only be a single character. If a string is used as the needle, only the first character of that string will be used. The needle may be a string of more than one character as of PHP 5.0.0.

#sub:
The #sub function returns a substring from the given string. The syntax is:

The start parameter, if positive (or zero), specifies a zero-based index of the first character to be returned. Example:  returns cream</tt>.

If the start parameter is negative, it specifies how many characters from the end should be returned. Example:  returns eam</tt>.

The length parameter, if present and positive, specifies the maximum length of the returned string. Example:  returns cre</tt>.

If the length parameter is negative, it specifies, how many characters will be omitted from the end of the string. Example:  returns cr</tt>.

Note: If the length parameter is zero, it is not used for truncation at all. Example:  returns cream</tt>.

Note: If start denotes a position beyond the truncation from the end by negative length parameter, an empty string will be returned. Example:  returns an empty string.

The return value is a substring from the string, or an empty string.

Note: This function is safe with utf-8 multibyte characters. Example:  returns žlina</tt>.

#pad:
The #pad function returns the given string extended to a given width. The syntax is:

The length parameter specifies the desired length of the returned string.

The padstring parameter, if specified, specifies a pattern to be used to fill the missing space. It may be a single character, which will be used as many times as necessary; or a string which will be concatenated as many times as necessary and then trimed to the required length. Example:  returns XxXxXxXIce</tt>.

If the padstring is not specified, spaces are used for padding.

The direction parameter, if specified, can be one of these values:
 * left</tt> - the padding will be on the left side of the string. Example:  returns xxIce</tt>.
 * right</tt> - the padding will be on the right side of the string. Example:  returns Icexx</tt>.
 * center</tt> - the string will be centered in the returned string. Example:  returns xIcex</tt>.

If the direction is not specified, the padding will be on the left side of the string.

The return value is the given string extended to the length of characters, using the padstring to fill the missing part(s). If the given string is already longer than length, it is not extended, nor truncated.

Note: The maximum allowed value for the length is limited through $wgStringFunctionsLimitPad global setting.

Note: This function is NOT safe with utf-8 multibyte characters (i.e. non-english characters). Example:  returns xxŽmržlina</tt> (instead of xxxxŽmržlina</tt>).

#replace:
The #replace function returns the given string with all occurences of a needle replaced with product.

If the needle is not specified, or is empty a single space will be searched for.

If the product is not specified, or is empty, all occurences of the needle will be removed from the string.

Note: This function is case sensitive.

Note: Even if the product is a space, an empty string is used. This is a side-efect of the MediaWiki parser feature. To use a space as the product, put it in a nowiki tags. Example:  returns <tt>My little home page</tt>.

Note: The maximum allowed length of the needle is limited through $wgStringFunctionsLimitSearch global setting.

Note: The maximum allowed length of the product is limited through $wgStringFunctionsLimitProduct global setting.

Note: This function is safe with utf-8 multibyte characters. Example:  returns <tt>Žmrzlina</tt>.

#explode:
The #explode functions splits the given string into pieces and then returns one of the pieces. The syntax is:

The delimiter paramter specifies a string to be used to divide the string into pieces. This delimiter string is then not part of any piece, and when two delimiter strings are next to each other, they create an empty piece between them. If this parameter is not specified, a single space is used.

The position parameter specifies, which piece is to be returned. Pieces are counted from 0. If this parameter is not specified, the first piece is used (piece with number 0).

The return value is the position-th piece. This may be an empty string. If there are less pieces than the position specifies, an empty string is returned.

Note: This function is case sensitive.

Note: The maximum allowed length of the delimiter is limited through $wgStringFunctionsLimitSearch global setting.

Note: This function is safe with utf-8 multibyte characters. Example:  returns <tt>Žmrž</tt>.

#urlencode: and #urldecode:
These two functions operate in tandem: #urlencode converts a string into an URL-safe syntax, and #urldecode converts such a string back. The syntax is:

Note: These functions work by directly exposing PHP's urlencode and urldecode functions.

Limits
This module defines these global settings: <tt>$wgStringFunctionsLimitSearch</tt>, <tt>$wgStringFunctionsLimitReplace</tt>, <tt>$wgStringFunctionsLimitPad</tt>.

They are used to limit some parameters of some functions to ensure the functions operate in O(n) time complexity, and are therefore safe against DoS attacks.

$wgStringFunctionsLimitSearch
This setting is used by #pos:, #rpos:, #replace:, #explode: functions. All these functions search for a substring in a larger string while they operate, which can run in O(n*m), and therefore they make the software more voulnerable to DoS attacks. By setting this value to a specific small number, the time complexity is decreased to O(n).

This setting limits the maximum allowed length of the string being searched for.

The default value is 30 multibyte characters.

$wgStringFunctionsLimitReplace
This setting is used by #replace: function. This function replaces all occurences of one string for another, which can be used to quickly generate very large amounts of data, and therefore it can make the software more voulnerable to DoS attacks. This setting limits the maximum allowed length of the replacing string.

The default value is 30 multibyte characters.

$wgStringFunctionsLimitPad
This setting is used by #pad: function. This function creates a string by the specified length, which can be used to quickly generate very large amounts of data, and therefore it can make the software more voulnerable to DoS attacks. This setting limits the maximum allowed length of the resulting padded string.

The default value is 100 multibyte characters.

Requirements
This extension requires MediaWiki 1.6+ and the PHP mbstring extension.

If you do not have mbstring installed, you can do without by deleting all occurrences of " " (without quotes) in the code, which will replace all mbstring functions (mb_strlen, mb_substr, mb_strpos, mb_strrpos) by their non-mbstring equivalents. By doing this, the extension will no longer work correctly with utf-8 multibyte characters (e.g. non-English letters), but it will work.

Instructions
Do the following to install these functions as an extension to MediaWiki.
 * 1) Copy the source code below to <tt>extensions/StringFunctions/StringFunctions.php</tt>.
 * 2) Add the following to <tt>LocalSettings.php</tt> (near the bottom) in the root of your MediaWiki installation: require_once ("$IP/extensions/StringFunctions/StringFunctions.php");

Compatibility with MediaWiki 1.6
All StringFunctions will work on MediaWiki 1.6, but their syntax is without the # character. If you want to use the # character, find this section of the <tt>/extensions/StringFunctions/StringFunctions.php : $wgParser->setFunctionHook ( 'len',      array ( &$wgExtStringFunctions, 'runLen'       ) ); $wgParser->setFunctionHook ( 'pos',      array ( &$wgExtStringFunctions, 'runPos'       ) ); $wgParser->setFunctionHook ( 'rpos',     array ( &$wgExtStringFunctions, 'runRPos'      ) ); $wgParser->setFunctionHook ( 'sub',      array ( &$wgExtStringFunctions, 'runSub'       ) ); $wgParser->setFunctionHook ( 'pad',      array ( &$wgExtStringFunctions, 'runPad'       ) ); $wgParser->setFunctionHook ( 'replace',  array ( &$wgExtStringFunctions, 'runReplace'   ) ); $wgParser->setFunctionHook ( 'explode',  array ( &$wgExtStringFunctions, 'runExplode'   ) ); $wgParser->setFunctionHook ( 'urlencode', array ( &$wgExtStringFunctions, 'runUrlEncode' ) ); $wgParser->setFunctionHook ( 'urldecode', array ( &$wgExtStringFunctions, 'runUrlDecode' ) );

Replace the above with this: $wgParser->setFunctionHook ( '#len',      array ( &$wgExtStringFunctions, 'runLen'       ) ); $wgParser->setFunctionHook ( '#pos',      array ( &$wgExtStringFunctions, 'runPos'       ) ); $wgParser->setFunctionHook ( '#rpos',     array ( &$wgExtStringFunctions, 'runRPos'      ) ); $wgParser->setFunctionHook ( '#sub',      array ( &$wgExtStringFunctions, 'runSub'       ) ); $wgParser->setFunctionHook ( '#pad',      array ( &$wgExtStringFunctions, 'runPad'       ) ); $wgParser->setFunctionHook ( '#replace',  array ( &$wgExtStringFunctions, 'runReplace'   ) ); $wgParser->setFunctionHook ( '#explode',  array ( &$wgExtStringFunctions, 'runExplode'   ) ); $wgParser->setFunctionHook ( '#urlencode', array ( &$wgExtStringFunctions, 'runUrlEncode' ) ); $wgParser->setFunctionHook ( '#urldecode', array ( &$wgExtStringFunctions, 'runUrlDecode' ) );

Code
This code has been tested on MediaWiki 1.6.8 and above.

History:
 * Jan 30, 2007 -- v1.9 -- Added limits to #pos, #pad, #replace, #explode.
 * Oct 30, 2006 -- v1.8 -- Fixed for MediaWiki 1.8.
 * Oct 26, 2006 -- v1.6 -- Fixed spaces in #rpos and #replace.
 * Oct 1, 2006 -- v1.5 -- Added #rpos, #pad, #replace, #explode.
 * May 18, 2006 -- v1.2 -- Renamed #toURL and #fromURL to match MediaWiki's &#123;&#123;urlencode:}} function.
 * May 18, 2006 -- v1.1 -- Added #pos.
 * May 15, 2006 -- v1.0 -- First stable release.

<?php /*

Defines a subset of parser functions that operate with strings.

Returns the length of the given value. See: http://php.net/manual/function.strlen.php

Returns the first position of key inside the given value, or an empty string. If offset is defined, this method will not search the first offset characters. See: http://php.net/manual/function.strpos.php

Returns the last position of key inside the given value, or -1 if the key is not found. When using this to search for the last delimiter, add +1 to the result to retreive position after the last delimiter. This also works when the delimiter is not found, because "-1 + 1" is zero, which is the beginning of the given value. See: http://php.net/manual/function.strrpos.php

Returns a substring of the given value with the given starting position and length. If length is omitted, this returns the rest of the string. See: http://php.net/manual/function.substr.php

Returns the value padded to the certain length with the given with string. If the with string is not given, spaces are used for padding. The direction may be specified as: 'left', 'center' or 'right'. See: http://php.net/manual/function.str-pad.php

Returns the given value with all occurences of 'from' replaced with 'to'. See: http://php.net/manual/function.str-replace.php

Splits the given value into pieces by the given delimiter and returns the position-th piece. Empty string is returned if there are not enough pieces. Note: Pieces are counted from 0. See: http://php.net/manual/function.explode.php

URL-encodes the given value. See: http://php.net/manual/function.urlencode.php

URL-decodes the given value. See: http://php.net/manual/function.urldecode.php

Contributors: Juraj Simlovic Algorithm

$wgExtensionCredits['parserhook'][] = array( 'name'        => 'StringFunctions', 'version'      => '1.9.3', // Jan 30, 2007. 'description'  => 'Enhances parser with string functions', 'author'       => 'Juraj Simlovic', 'url'          => 'http://meta.wikimedia.org/wiki/StringFunctions', );

$wgExtensionFunctions[] = 'wfStringFunctions';

$wgHooks['LanguageGetMagic'][] = 'wfStringFunctionsLanguageGetMagic';

function wfStringFunctions { global $wgParser, $wgExtStringFunctions; global $wgStringFunctionsLimitSearch, $wgStringFunctionsLimitReplace, $wgStringFunctionsLimitPad;

$wgExtStringFunctions = new ExtStringFunctions ; $wgStringFunctionsLimitSearch =  30; $wgStringFunctionsLimitReplace = 30; $wgStringFunctionsLimitPad    = 100;

$wgParser->setFunctionHook ( 'len',      array ( &$wgExtStringFunctions, 'runLen'       ) );

$wgParser->setFunctionHook ( 'pos',      array ( &$wgExtStringFunctions, 'runPos'       ) ); $wgParser->setFunctionHook ( 'rpos',     array ( &$wgExtStringFunctions, 'runRPos'      ) );

$wgParser->setFunctionHook ( 'sub',      array ( &$wgExtStringFunctions, 'runSub'       ) );

$wgParser->setFunctionHook ( 'pad',      array ( &$wgExtStringFunctions, 'runPad'       ) );

$wgParser->setFunctionHook ( 'replace',  array ( &$wgExtStringFunctions, 'runReplace'   ) );

$wgParser->setFunctionHook ( 'explode',  array ( &$wgExtStringFunctions, 'runExplode'   ) );

$wgParser->setFunctionHook ( 'urlencode', array ( &$wgExtStringFunctions, 'runUrlEncode' ) ); $wgParser->setFunctionHook ( 'urldecode', array ( &$wgExtStringFunctions, 'runUrlDecode' ) ); }

function wfStringFunctionsLanguageGetMagic( &$magicWords, $langCode = "en" ) { switch ( $langCode ) { default: $magicWords['len']         = array ( 0, 'len' ); $magicWords['pos']         = array ( 0, 'pos' ); $magicWords['rpos']        = array ( 0, 'rpos' ); $magicWords['sub']         = array ( 0, 'sub' ); $magicWords['pad']         = array ( 0, 'pad' ); $magicWords['replace']     = array ( 0, 'replace' ); $magicWords['explode']     = array ( 0, 'explode' ); $magicWords['urlencode']   = array ( 0, 'urlencode' ); $magicWords['urldecode']   = array ( 0, 'urldecode' ); }   return true; }

class ExtStringFunctions {   /**     *      */    function runLen ( &$parser, $inStr = '' ) {       return mb_strlen ( $inStr ); }

/**    *      * Note: If the needle is an empty string, single space is used instead. * Note: If the needle is not found, empty string is returned. * Note: The needle is limited to specific length. */   function runPos ( &$parser, $inStr = , $inNeedle = , $inOffset = 0 ) {       global $wgStringFunctionsLimitSearch;

# empty needle if ( $inNeedle === '' ) $inNeedle = ' ';

# limit needle $inNeedle = mb_substr ( $inNeedle, 0, $wgStringFunctionsLimitSearch );

# strpos $ret = mb_strpos ( $inStr, $inNeedle, intval ( $inOffset ) );

# return empty string upon not found return ( $ret !== FALSE ) ? $ret : ''; }

/**    *      * Note: If the needle is an empty string, single space is used instead. * Note: If the needle is not found, -1 is returned. * Note: The needle is limited to specific length. */   function runRPos ( &$parser, $inStr = , $inNeedle =  ) {       global $wgStringFunctionsLimitSearch;

# empty needle if ( $inNeedle == '' ) $inNeedle = ' ';

# limit needle $inNeedle = mb_substr ( $inNeedle, 0, $wgStringFunctionsLimitSearch );

# empty haystack if ( $inStr == '' ) return "-1";

# strrpos $ret = mb_strrpos ( $inStr, $inNeedle );

# return -1 upon not found return ( $ret !== FALSE ) ? $ret : "-1"; }

/**    *      * Note: If length is zero, the rest of the input is returned. */   function runSub ( &$parser, $inStr = , $inStart = , $inLength = 0 ) {       # zero length if ( ! ( (int) $inLength ) ) return mb_substr ( $inStr, intval ( $inStart ) );

# non-zero length return mb_substr ( $inStr, intval ( $inStart ), intval ( $inLength ) ); }

/**    *      * Note: Length of the resulting string is limited. */   function runPad ( &$parser, $inStr = , $inLen = 0, $inWith = , $inDirection = 'left' ) {       global $wgStringFunctionsLimitPad;

# direction switch ( strtolower ( $inDirection ) ) {       case 'left': default: $direction = STR_PAD_LEFT; break; case 'center': $direction = STR_PAD_BOTH; break; case 'right': $direction = STR_PAD_RIGHT; break; }

#limit pad length if ($wgStringFunctionsLimitPad > 0) $inLen = min ( intval ( $inLen ), $wgStringFunctionsLimitPad );

# padding if ( $inWith == '' ) $inWith = ' ';

# pad return str_pad ( $inStr, intval ( $inLen ), $inWith, $direction ); }

/**    *      * Note: If the needle is an empty string, single space is used instead. * Note: The needle is limited to specific length. * Note: The product is limited to specific length. */   function runReplace ( &$parser, $inStr = , $inReplaceFrom = , $inReplaceTo = '' ) {       global $wgStringFunctionsLimitSearch, $wgStringFunctionsLimitReplace;

# empty needle if ( $inReplaceFrom == '' ) $inReplaceFrom = ' ';

# limit needle (this is being searched for) $inReplaceFrom = mb_substr ( $inReplaceFrom, 0, $wgStringFunctionsLimitSearch );

# limit product (this is being returned) $inReplaceTo = mb_substr ( $inReplaceTo, 0, $wgStringFunctionsLimitReplace );

# replace return str_replace ( $inReplaceFrom, $inReplaceTo, $inStr ); }

/**    *      * Note: If the divider is an empty string, single space is used instead. * Note: The divider is limited to specific length. * Note: Empty string is returned, if there is not enough exploded chunks. */   function runExplode ( &$parser, $inStr = , $inDiv = , $inPos = 0 ) {       global $wgStringFunctionsLimitSearch;

# empty divider if ( $inDiv == '' ) $inDiv = ' ';

# limit divider (this is being searched for) $inDiv = mb_substr ( $inDiv, 0, $wgStringFunctionsLimitSearch );

# explode $tokens = explode ( $inDiv, $inStr );

# out of range if ( !isset ( $tokens [ intval ( $inPos ) ] ) ) return "";

# in range return $tokens [ intval ( $inPos ) ]; }

/**    *      */    function runUrlEncode ( &$parser, $inStr = '' ) {       # encode return urlencode ( $inStr ); }

/**    *      */    function runUrlDecode ( &$parser, $inStr = '' ) {       # decode return urldecode ( $inStr ); } }

?>

Applications

 * Constructing a suitable sortkey from a page name, in a category tag in the transcluded part of a template.