Topic on Extension talk:Replace Text

I got "Wikimedia\Rdbms\DBQueryError" when I use "?" in regex.

2
Libattery (talkcontribs)

I got "Wikimedia\Rdbms\DBQueryError" when I use "?" in regex.

When I use regex, like "(<math>.*?<\/math>)" I got Error. Without "?" like "(<math>.*<\/math>) I got ok.

How can I use the lazy matching in regex without "?" ?.

I found no way to around because in Latex equation nearly every characters are used.

Please help me.

Verdy p (talkcontribs)

The regexp engine is not a full PCRE implementation. Advanced features (such as the capability to search or match newlines or make multiline searches, or use named capture groups, or to control the lazy behavior, or additional inline `(?flags: ... )` inside regexps themselves) are then not implemented/supported.

May be this extension may be tuned to use another full and modern PCRE engine (which will need to be enabled and setup properly). Note that PCRE is integrated and supported on Wikidata for its advanced reports.

So first make sure that there's a PCRE engine installed, like it is in Wikidata (also used internally by Extension:AbuseFilter on all Wikimedia public wikis). Additional works in the PHP code may be needed to setup and being able to use this alternate engine.

Unfortunately, the type and version of the regexp engine installed is not visible in Special:Version, either as an extension or library: these dependencies are hidden within each extensions that were each built and tested for their own regexp engine and there are then several separate engine instances installed (possibly in identical versions but with separate copies maintained separately); such dependencies should be documented in each extension.

If I just look at https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/ReplaceText/+/refs/heads/master/src/ReplaceTextSearch.php

I see that when regexps are enabled, the search is made in the SQL engine, using its own internal regexp engine (for example, postgresql is supported with its '~' operator in WHERE conditions, otherwise it uses the 'REGEXP' operator supported by MySQL and its variants, it may not even work at all with all DB engines not supporting any one of these two operators in their SQL dialect).

The syntax of the regexp is then dependant of the SQL backend engine used for your wiki, and you need to look at the installation of the SQL engine rather than the installation of PHP or Mediawiki itself. This also means that regexps used with Special:ReplaceText (if it's activated in the MediaWiki installation) or with the "ReplaceAll.php" script (run only from the host system's console by authorized sysadmins or via custom client programs on that host capable of doing SQL requests) are NOT portable (and no effort was made to port it): all matches are found by the SQL engine itself; then the PHP code will process these matches and will use its own syntax for replacements before submitting SQL updates or replaces.

As this can severely affect the whole database (and can use lot of resources and be very slow) it is naturally highly restricted and can explain why the extension is not even installed as Special:ReplaceText in Wikimedia wikis, for evident security and performance reasons (otherwise there's a huge risk of creating a severe denial of service attack with massive destruction of contents, and long times to restore/reload a database from a backup, and loose lot of valid works made since last backup).

But as it is very risky to use this extension, sysadmins that can use it should be clearly aware of the exact regexp syntax supported and their effect. This extension then currently targets only developers, and only those with strong knowledge of the SQL engine used and when/how it is maintained: the regexps you use and submit with this extension are NOT checked at all, they can do whatever they want, even unexpected things if you thought that your regexp was fine and harmless or should work and match what you expect! Be careful!

It's probably much less risky to use someting else, such as Pywikibot (using the standard MediaWiki API and undercontrol of bot authorizations and with limitations of change rates submitted, plus all action reversible from the history: ReplaceAll.php and Special:ReplaceText are not feeding the page history at all).

Reply to "I got "Wikimedia\Rdbms\DBQueryError" when I use "?" in regex."