Topic on Extension talk:Replace Text

ReplaceText's specific regex behaviour/limitation ?

4
MvGulik (talkcontribs)

I'm having a bit of a problem understanding ReplaceText's specific regex behaviour/limitations.

I'm trying to target the following two cases with a single regex.

" abc1def = 234 "
" abc5def = "

Which should not really be a problem for regex in general.
Using a regex like " *abc[0-9]def *= *[0-9]* *" on https://regexr.com/ (default setup) works fine in this respect.

But ReplaceText's regex implementation seems to not want to do this.
ReplaceText's goes for the " abc1def = " part, but skips/ignores the number part in this case.

I'm I missing something about ReplaceText's regex implementation?

ReplaceText: 1.4.1
MediaWiki: 1.31.0
PHP: 7.0.30-0+deb9u1 (fpm-fcgi)
MariaDB: 10.1.26-MariaDB-0+deb9u1

Ciencia Al Poder (talkcontribs)

The leading part of your expression is full of optional matches (asterisks). Since it doesn't need to match, it may be skipped altogether. Ideally you should add some other character at the end that is not optional (maybe a newline, a pipe, etc)

MvGulik (talkcontribs)

> "Ideally you should add some other character at the end that is not optional"

I spotted something similar happening while trying to work this out, but had not followed up on that yet. And your right, that actually works. (oddly enough)

Seems a bit of an odd constrain to me. Although using multiple optional matches can be a general source of RE hiccups, this RE is relative simple (I think).

Anyway. Thanks for the tip. Saves me some unnecessary replace runs. :)

MvGulik (talkcontribs)

After some more ReplaceText's RE testing in this case it feels to me like ReplaceText's RE switches quantifiers in some particular cases from greedy to extreme none-greedy.
(Note: My knowledge on RE's 'greedy' vs 'none greedy' is a bit rusty at the moment)

Extreme none-greedy: as in 'not matching anything even if there is something to match'.

For example "[a-c]*[0-9]*" seems to work fine. Capturing anything from "a" to "abc123". (seems: as in based on replace preview (which can be a bit iffy at times).)
But "abc[0-9]*" will not match the number part in strings like "abc123". (while "abc[0-9]+" will only match the first digit)
It will match all digits after adding a quantifier modifier "abc[0-9]*?". (or "abc[0-9]+?" in second case)

Is ReplaceText's RE by default 'none greedy'?
(Or em I looking at it from the wrong angle?)

Reply to "ReplaceText's specific regex behaviour/limitation ?"