User:Danwe/regex

From mediawiki.org

This page contains some of my regular expressions. I just put them here to find them whenever the need arises.

JavaScript[edit]

Matches everything until some word (in this case ABC) and everything behind that word and stores those parts as backrefs. Nothing fancy, I just keep forgetting the syntax of look around assertions and this is probably the one most frequently used by me.

'this is some ABC simple test'.match( /((?!ABC).*?)ABC(.*)/ )

So here are the different look around asertions anyhow (source, javascriptkit.com):

lookaheads:

  • (?=pattern) matches only if there is a following pattern in input.
  • (?!pattern) matches only if there is not a following pattern in input.

lookbehinds - not supported in JS, but the web might help with that


This one is supposed to replace all attributes of all DOM nodes:

var crazyHtml = '<div sdfdf="<div>fsdf</div>sdf" sdfsdf="sdgsgf">baa <-- foo!</div>';
    // replace all attributes of all DOM nodes
    regex = /(<\S+)(?:[^<>"']+(?:(["'])[^\2]*\2)?)*?(\/?>)/g,

console.log( crazyHtml.replace( regex, '$1$3' ) );


PHP[edit]

From Semantic Expressiveness extension. Matches a certain DOM structure, using recursive expressions. Used in ExpressiveStringPieceSQResult class.

// [1] match <span> with 'shortQuery' class if we can make sure...
// [2] ... no further <span>-pairs inside...
//     OR
// [3] ... DOM inside only contains opening+closing <span>-pairs (ensured by recursive regex)
$regex = '/
	(?# COMMENT #1 )
	<span \s+(?:[^>]*\s+|) class\s*=\s*(?P<q>[\'"])(?:[^>\k<q>]*\s+|\s*) shortQuery (?:[^>\k<q>]*\s+|\s*)\k<q>[^>]*>

	(?# COMMENT #2 )
	( (?>(?!<span(?:\s+[^>]*|)>|<\/span>).)*

	(?# COMMENT #3 )
	| (?P<innerDOM> <span(?:\s+[^>]*|)>(?: (?>(?!<span(?:\s+[^>]*|)>|<\/span>).)* | (?&innerDOM) )*?<\/span> )*
	)* <\/span>'/sx;