Extension:Character Escapes

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual
Crystal Clear action run.png
Character Escapes

Release status: beta

Implementation Parser extension, Tag
Description Convenience tag for escaping tags, templates, magic words, and parser function calls nested in tags and parser functions that support character escaping
Author(s) David M. Sledge
Latest version 0.9.1 (2007-10-02)
MediaWiki ≥ 1.11.0
License GNU General Public License 2.0 or later
Download No link

Translate the Character Escapes extension if it is available at translatewiki.net

Check usage and version matrix; code metrics

Characters Escapes[edit | edit source]

Sometimes it is desired that wiki markup be parsed (or remain unparsed) under certain conditions. Since certain characters or character sequences are processed before reaching the parser function, we have to use escapes to prevent markup being parsed prematurely. MediaWiki does not have built-in mechanism for this, so we have to make our own:

  • \l (less than) is translated to <
  • \g (greater than) is translated to >

  • \o (open double curly braces) is translated to {{
  • \c (close double curly braces) is translated to }}
  • \p (pipe) is translated to |

  • \\ is translated to \

  • \n is translated to a newline

The first two translations make it possible to embed a wiki tag extension into a parameter of a parser function call. The next three translations make it possible to call a template, invoke a magic word, or call a parser function which prevents them from executing until conditions dictate that the results of such a call will be displayed. The next one is for times where you want to display text like "\p" without having it converted into a pipe, which is done by writing it as "\\p". The last one is for tags that use newline characters as delimiters for parameters. It allows a newline character to be passed as part of the parameter instead of indicating the beginning/ending of a parameter.

Example[edit | edit source]

{{ #vardefine: i | 0 }}{{
  #while: expr
  | <esc>{{ #var: i }} < 5</esc>
  |* <esc>{{ #var: i }}{{ #vardefine: i | {{ #expr: {{ #var: i }} + 1 }} }}</esc>
}}

produces the following:

  • 0
  • 1
  • 2
  • 3
  • 4

Note that the example uses the variables and control structure functions extensions.

Limitations[edit | edit source]

MediaWiki does not support nested tags of the same type (see bug #1310). Given the following:

<esc>{{ #ifexpr:... | <esc>{{ templateB | param }}</esc> | param }}</esc>

The text:

{{ #ifexpr:... | <esc>{{ templateB | param }}

is passed to the underlying function instead of:

{{ #ifexpr:... | <esc>{{ templateB | param }}</esc> | param }}

A workaround is to explicitly write out the nested escape sequences:

<esc>{{ #ifexpr:... | \o templateB \p param \c | param }}</esc>

Another solution is to apply the modification given in the discussion of bug #1310.

Writing Extensions that Use Character Escapes[edit | edit source]

If you would like your extension to make use of character escapes, the class CharacterEscapes contains two static functions for replacing characters with escapes (CharacterEscapes::charEsc()) and replacing escapes with characters (CharacterEscapes::charUnesc()).

Installation[edit | edit source]

Create the directory CharacterEscapes in your extensions directory. Then in the new directory create the file CharacterEscapes.php that contains the following source code:

<?php
 
if ( !defined( "MEDIAWIKI" ) ) {
    die( "This file is a MediaWiki extension, it is not a valid entry point" );
}
 
$wgExtensionFunctions[] = array( "CharacterEscapes", "setup" );
$wgExtensionCredits['parserhook'][] = array(
    'author'      => 'David M. Sledge',
    'name'        => 'Character Escapes',
    'version'     => '0.9.1',
    'description' => "Convenience markup for escaping tags, templates, magic " .
        "words, and parser function calls nested in tags and parser " .
        "functions that support character escaping",
    'url'         => 'https://www.mediawiki.org/wiki/Extension:Character_Escapes',
);
 
class CharacterEscapes {
    public static $tags = array( "esc" => "charEsc", "unesc" => "charUnesc" );
 
    public static function setup() {
        global $wgParser;
 
        foreach ( self::$tags as $hook => $method )
            $wgParser->setHook( $hook, array( __CLASS__, $method ) );
    }
 
    public static function unstrip( $input, $args, &$parser ) {
        $regex = "/\x07UNIQ[0-9a-fA-F]{1,16}-(" . implode(
            '|', array_keys( self::$tags ) ) . ")-[0-9a-fA-F]{8}-QINU\x07/";
 
        // find all the unique identifiers for the esc tag
        preg_match_all( $regex, $input, $strippedTags );
        $unstrippedTags = array();
 
        // unstrip each unique identifier
        foreach ( $strippedTags[0] as $strippedTag )
            $unstrippedTags[$strippedTag] =
                $parser->mStripState->unstripGeneral( $strippedTag );
 
        // replace each unique identifier with the unstripped text
        $input = strtr( $input, $unstrippedTags );
 
        return $input;
    }
 
    public static function charEsc( $input, $args, &$parser ) {
        $input = self::unstrip( $input, $args, $parser );
 
        return self::escChars( $input );
    }
 
    // these character escapes are so the nested parser functions and tags are
    // not called before the loops are performed.  Many thanks to Gero Scholz
    // (a Dynamic Page List 2 author) for the basic idea. This implementation
    // is different in that it uses a convention similar to what is seen in
    // many programming languages.
    public static function escChars( $text ) {
        // The following character escape sequences are used to avoid
        // premature tag expansion and parser function execution.
        //
        // character sequence  escape sequence
        //         {{                \o
        //         }}                \c
        //         |                 \p
        //         <                 \l
        //         >                 \g
        //       newline             \n
        //         \                 \\
        // prefix the pre-existing escape sequences with backslashes
        $text = str_replace(
            array(   "\\",  "{{", "}}",   "|",   "<",   ">",  "\n" ),
            array( "\\\\", "\\o", "\c", "\\p", "\\l", "\\g", "\\n" ), $text );
 
        return $text;
    }
 
    public static function charUnesc( $input, $args, &$parser ) {
        $input = self::unstrip( $input, $args, $parser );
 
        return self::unescChars( $input );
    }
 
    public static function unescChars( $text ) {
        // since we're dealing with regular expressions and strings,
        // the backslash character must be double escaped (4:1 ratio)
        $text = preg_replace(
            array( "/(?<!\\\\)((\\\\\\\\)*)\\\\o/",
                   "/(?<!\\\\)((\\\\\\\\)*)\\\\c/",
                   "/(?<!\\\\)((\\\\\\\\)*)\\\\p/",
                   "/(?<!\\\\)((\\\\\\\\)*)\\\\l/",
                   "/(?<!\\\\)((\\\\\\\\)*)\\\\g/",
                   "/(?<!\\\\)((\\\\\\\\)*)\\\\n/" ),
            array( "$1{{", "$1}}", "$1|", "$1<", "$1>", "$1\n" ), $text );
        $text = str_replace( "\\\\", "\\", $text );
 
        return $text;
    }
}

Then add

require_once("$IP/extensions/CharacterEscapes/CharacterEscapes.php");

to the end of LocalSettings.php.

See also[edit | edit source]