Extension talk:RegexParserFunctions

From mediawiki.org
Jump to navigation Jump to search

Could you put more samples of use here, Please. For example, what are the different using between match and replacement. --Roc michael 12:17, 15 June 2007 (UTC)[]

Hi Roc! More detailed examples and usage instructions are availble on the RegexParserFunctions Project HomePage. --Jimbojw 04:42, 16 June 2007 (UTC)[]
Thank for your help, Jimbojw.--Roc michael 07:10, 16 June 2007 (UTC)[]


I really love this extension!

I also have an improvement idea:

replacing like {{#regex:RegexParserFunctions|%^.*/(.*)$%|$1}} would be much more usefull if it would be possible to do something with the replace string $1. For example {{#regex:RegexParserFunctions|%^.*/(.*)$%| {{#len:$1}}}}. Certainly $1 as variable for such operations wouldn't be a good variable name because it also could be possible that in the match string is a $1. In this case we would get a problem. Perhaps you could make this work as variable for wikis with variable extension like: {{#regex:RegexParserFunctions|%^.*/(.*)$%| {{#len:{{#var:$1}}}}}} This would offer much new possiblities like run another regex over the matches. --Danwe 20:25, 22 February 2009 (UTC)[]

Helpful examples[edit]

I think it could be helpful if everybody who wrote a regex which is generally helpful adds it here. I Wrote one which dissolves all links inside a given string:

{{#regex: {{{1|}}} |%\[\[:?(?(?=[^\[\]\{{!}}]*\{{!}})[^\[\]\{{!}}]*\{{!}}{{!}})([^\[\]]*)\]\]%|$1}}

Example: "text text [[:link|linktext]] and [[link]]" is going to be "text text linktext and link" --Danwe 22:15, 22 February 2009 (UTC)[]

Remove Category links[edit]

%\[\[Category?(?(?=[^\[\]\{{!}}]*\{{!}})[^\[\]\{{!}}]*\{{!}}{{!}})([^\[\]]*)\]\]\n%

this removes cat links like [[Category:Test]]. But it doesn't work if the category name includes : like in [[Category:Test: Abc]]. How would I do that? Big thanks! --Subfader 20:32, 5 July 2009 (UTC)[]

/\[\[Category:.*\]\]\s*/s --Subfader 00:09, 8 July 2009 (UTC)[]

Doesn't work in 1.16alpha[edit]

Maybe you can have a look now already before everyone using this extension will come here ;) --Subfader 00:09, 8 July 2009 (UTC)[]

Using Extension:RegexFunctions instead :) --Subfader 22:45, 11 July 2009 (UTC)[]

New function for array regex (regexall)[edit]

I created a function regexall to get all matches of a regex. It returns all matches separated so you could use the result with the array extension for example. Replacing or getting some parts from brackets only is not possible yet. Please feel free to implement new features as well or to improve my code.

The function is based on the original regexParserFunction so I post it here for you as well:

/**
* Performs regular expression search and returns ALL matches separated.
* @param Parser $parser Instance of running Parser.
* @param String $subject Input string to evaluate.
* @param String $pattern Regular expression pattern - must use /, | or % delimiter
* @param String $separator String to separate all the matches.
* @param Int    $offset First match to print out. Negative values possible: -1 means last match.
* @param Int    $length Maximum matches for print out.
* @return String Result of all matching text parts separated by a string.
*/
function regexallParserFunction( &$parser , $subject=null , $pattern=null , $seperator=', ' , $offset=0 , $length=null ) {
	if ($subject===null || $pattern===null) return '';
	$acceptable = '/^([\\/\\|%]).*\\1[imsu]*$/';
	
	if (!preg_match($acceptable, $pattern)) return wfMsg('regexp-unacceptable',$pattern);

	if (preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER)) {
	
		if (is_numeric($offset)) {
			if (!empty($length) &&  is_numeric($length)){
				$matches = array_slice($matches, $offset, $length);
			} else {
				$matches = array_slice($matches, $offset);	
			}
		} else {return '';}
		
		$output = ''; //$end = ($end or ($end >= count($matches)) ? $end : count($matches) );
					
		for($count = 0; $count < count($matches); $count++)
		{
			$output.= ( $output != '' ? $seperator : '' );
			$output.= trim( $matches[$count][0] );
		}
		return $output;				
	}		
	return '';
}

--Danwe 14:02, 1 December 2009 (UTC)[]

New, (mostly) compatible regex extension with advanced features[edit]

I have just released my extension Regex Fun which also provides #regex function as provided by this function but with some extended features like:

  • More sophisticated regular expression validation. Invalid expression will now output a formated inline error, catchable by #iferror. This might be the only reason why my new extension could break some of you old code, if you rely on invalid regex returning empty string. But you shouldn't have invalid regex anyway.
  • 'r' flag for #regex returning "" in case no replacement was done.
  • 'e' flag allows to parse replacement string after reference insertion before input text replacement.
  • Three more useful parser functions dealing with regular expressions: #regexall, #regex_var and #regexquote

So there is no real reason to stick to the use of RegexParserFunctions right now. So far I have used that extension as well and time after time I have created my own functions, keeping it compatible. --Danwe 07:32, 5 November 2011 (UTC)[]

Real example of a PHP injection?[edit]

The page states it's vulnerable to PHP code execution. But there is a following line in the code:

$acceptable = '/^([\\/\\|%]).*\\1[imsu]*$/';

I.e. it checks the input regexp for [imsu] flags, [e] flag is discarded. And it checks it in default /s mode (without PCRE_MULTILINE) so it's not possible to do

/a\/
/e

And nothing else comes to my mind... So, what's the real example of php injection here?

OK, there is a null-byte attack - like /a/e<NULL>/. But I'm curious how to put a null byte into a mediawiki article? Even if it's present in the DB it gets transformed into \xFFFD probably by part of some UTF-8 sanitizing. So it seems impossible to use this kind of attack, either... Or is there a way of doing it?

VitaliyFilippov (talk) 11:40, 7 October 2013 (UTC)[]

@VitaliyFilippov: can you do it via Lua? Bawolff (talk) 15:12, 3 September 2020 (UTC)[]

Breaks the built-in 'urlencode' function[edit]

This extension registers its own 'urlencode' parser function, for some reason, which replaces the built-in parser function of the same name. This version of 'urlencode' does not recognize the parameters of the official 'urlencode', leading to incorrect results.

So, for example:
{{urlencode:Dog Cat|WIKI}} <!-- expected: Dog_Cat; actual: Dog+Cat -->

This extension also registers an undocumented 'eval' parser function, which seems like a security risk.

Smith.dan (talk) 17:29, 30 April 2018 (UTC)[]

Fatal exception of type MWException[edit]

With Mediawiki 1.33.1 when I enable the extension, every page gives [Xf32des6yAjFRcw3CODikAAAAAo] 2019-12-21 10:39:50: Fatal exception of type MWException

I switched to the RegexFunctions extension, which did work for me. Vicarage (talk) 17:43, 21 December 2019 (UTC)[]

Not compatible with current mediawiki[edit]

I flagged this extension unmaintained because its not compatible with current mediawiki (due to use of old style magic words). Bawolff (talk) 15:13, 3 September 2020 (UTC)[]