Extension talk:Replace Text

Search using Replace Text not returning results

6
Tiggleshorts (talkcontribs)

Running the search for words that should return are returning empty results, from both the web interface and CLI.


Everything else on the wiki appears to be working fine, when doing regular word searches it appears content on the pages are returned. Are the results generated from some other process? Oddly searches for words in templates appears to work. Also odd, for testing a search for 'the' one page does return and from the results it appears that page is just seeing that one word, not displaying any surrounding text, even though there are many other words on that page.


I recently upgraded from 1.35 to 1.39.1. Anyone else see similar issues?

Tiggleshorts (talkcontribs)

After doing a small edit on a page, now that page does show up when I search for text using Replace Text. There have been many edits previously so I'm looking to see why those aren't showing up. Further edited pages now show also.

Perhaps some database needed to be initialized by running Replace Text?

Tiggleshorts (talkcontribs)

I had $wgCompressRevisions set to true, now I see that in the 'known issues' section. Will set that to false and test further.

Ciencia Al Poder (talkcontribs)

Setting it to false won't make it work, unless you edit all your pages to add a new uncompressed revision.

Tiggleshorts (talkcontribs)

Using the suggested alternative MassEditRegex extension worked in my case. If Replace Text could issue a warning of some sort if $wgCompressRevisions set to true that would have been helpful, I likely set that following some optimization guide years ago but forgot about it.

ArchATempAcct (talkcontribs)

Same exact bug since upgrading to 1.39, text replace not seeing pages (and some matching text) until I edit the page. Thank you for the alternative tool.

Reply to "Search using Replace Text not returning results"
ArchATempAcct (talkcontribs)

I am trying to delete all category links where the category ends with the word "cross".

I am using: (?i)\[\[:?Category:([a-z0-9]+)cross\]\] with a blank replacement string. What am I screwing up? I would really like to remove all category links with the word "cross" anywhere in them, but I am not ready to tackle that yet.

Ciencia Al Poder (talkcontribs)

Looks like your search input should work (unless the category has spaces anywhere between brackets)

ArchATempAcct (talkcontribs)

Database error

Jump to navigation Jump to search

A database query error has occurred. This may indicate a bug in the software.

[ZcgnAl4fIWg2gKD_Lb8zrAABvVQ] /index.php/Special:ReplaceText Wikimedia\Rdbms\DBQueryError: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading or after adding a new extension?


Please see https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Upgrading and https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:How_to_debug for more information.


Error 1139: Got error 'repetition-operator operand invalid' from regexp

Function: MediaWiki\Extension\ReplaceText\Search::doSearchQuery

Query: SELECT page_id,page_namespace,page_title,old_text,slot_role_id FROM `mwfp_page`,`mwfp_revision`,`mwfp_text`,`mwfp_slots`,`mwfp_content` WHERE (old_text REGEXP '(?i)\\[\\[:?Category:([a-z0-9]+)cross\\]\\]') AND page_namespace = 0 AND (rev_id = page_latest) AND (rev_id = slot_revision_id) AND (slot_content_id = content_id) AND (CAST( SUBSTR(content_address, 4) AS SIGNED ) = old_id) ORDER BY page_namespace, page_title LIMIT 1000

Backtrace:

from /home4/rabidpan/public_html/fascipedia/includes/libs/rdbms/database/Database.php(1618)

#0 /home4/rabidpan/public_html/fascipedia/includes/libs/rdbms/database/Database.php(1602): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)

#1 /home4/rabidpan/public_html/fascipedia/includes/libs/rdbms/database/Database.php(1576): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)

#2 /home4/rabidpan/public_html/fascipedia/includes/libs/rdbms/database/Database.php(952): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)

#3 /home4/rabidpan/public_html/fascipedia/includes/libs/rdbms/database/Database.php(1711): Wikimedia\Rdbms\Database->query(string, string, integer)

#4 /home4/rabidpan/public_html/fascipedia/includes/libs/rdbms/database/DBConnRef.php(103): Wikimedia\Rdbms\Database->select(array, array, array, string, array)

#5 /home4/rabidpan/public_html/fascipedia/includes/libs/rdbms/database/DBConnRef.php(326): Wikimedia\Rdbms\DBConnRef->__call(string, array)

#6 /home4/rabidpan/public_html/fascipedia/extensions/ReplaceText/src/Search.php(66): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array)

#7 /home4/rabidpan/public_html/fascipedia/extensions/ReplaceText/src/SpecialReplaceText.php(334): MediaWiki\Extension\ReplaceText\Search::doSearchQuery(string, array, string, string, boolean)

#8 /home4/rabidpan/public_html/fascipedia/extensions/ReplaceText/src/SpecialReplaceText.php(183): MediaWiki\Extension\ReplaceText\SpecialReplaceText->getTitlesForEditingWithContext()

#9 /home4/rabidpan/public_html/fascipedia/extensions/ReplaceText/src/SpecialReplaceText.php(82): MediaWiki\Extension\ReplaceText\SpecialReplaceText->doSpecialReplaceText()

#10 /home4/rabidpan/public_html/fascipedia/includes/specialpage/SpecialPage.php(701): MediaWiki\Extension\ReplaceText\SpecialReplaceText->execute(NULL)

#11 /home4/rabidpan/public_html/fascipedia/includes/specialpage/SpecialPageFactory.php(1428): SpecialPage->run(NULL)

#12 /home4/rabidpan/public_html/fascipedia/includes/MediaWiki.php(316): MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, RequestContext)

#13 /home4/rabidpan/public_html/fascipedia/includes/MediaWiki.php(904): MediaWiki->performRequest()

#14 /home4/rabidpan/public_html/fascipedia/includes/MediaWiki.php(562): MediaWiki->main()

#15 /home4/rabidpan/public_html/fascipedia/index.php(50): MediaWiki->run()

#16 /home4/rabidpan/public_html/fascipedia/index.php(46): wfIndexMain()

#17 {main}

ArchATempAcct (talkcontribs)

That error message disappeared when I ran the update script, so no worries there. So we are back to the spaces. How do I get it to see categories with spaces in them?

Ciencia Al Poder (talkcontribs)

This should work: (?i)\[\[[ :]*Category:([a-z0-9 ]+)cross *\]\]

Note that (?i) may not be supported on mariadb REGEXP

ArchATempAcct (talkcontribs)

RESOLVED.

A (.*) on each side of "cross" did the trick.

Reply to "category"

How can I select the namespace name as part to be replaced?

3
Rasputin1493 (talkcontribs)

I have a custom namespace from which I want to move a few hundred pages, but it won't recognize the namespace's name as part of the page title, which I want to replace.

Yaron Koren (talkcontribs)

You can't use Replace Text to change pages' namespace, unfortunately - there's a bug about this here, although I don't know how to fix it. Are you trying to just get rid of a namespace, and move all those pages to the main namespace or something? If so, the MW script namespaceDupes.php could be helpful.

Rasputin1493 (talkcontribs)

Oh well. No, I'm searching for a way to mass-move pages on a wiki that I have no shell access to. Their current namespace should remain as it contains pages I want to keep it for.

Reply to "How can I select the namespace name as part to be replaced?"

Supress redirect is missing

7
Tenbergen (talkcontribs)

I remember there being an option to suppress the generation of redirect pages (the known issues section in the instructions mentions this as well). As of ReplaceText version 1.7 I can't see that option on any of my wikis. The user in question has the rights to suppress redirects, and the option is present when I try to move an individual page. How do I make this option available again?

Yaron Koren (talkcontribs)

Do you mean an actual checkbox in the interface, or something like that? I don't remember such a thing. The "Known issues" section only mentions the 'suppressredirect' permission, I think. Or is that what you're talking about?

MvGulik (talkcontribs)

Seems there was such an option, or at least closely related.

See bottom of second screenshot in the Screenshots section. => "For moved pages: ..."

Yaron Koren (talkcontribs)

Oh - were you just talking about the "Save the old titles as redirects to the new titles" checkbox? That checkbox is still there.

Tenbergen (talkcontribs)

I just used it on another of my wikis and that section is there, correct. But on this one and another it's not. Both are MW1.40 and ReplaceText 1.7, both use vector legacy (2010) skin. I uploaded a screenshot to demonstrate.

MvGulik (talkcontribs)

@Tenbergen: The search text 'needs' to be found inside the page-title for those "For Moved Pages" options to show up. (It was not possible to verify if that was, or was not, the case in your screenshots)

Tenbergen (talkcontribs)

Arrrr yes. You are right, I would only expect to see it if there were pages found that could be moved. And when there are, it shows up. Thanks for explaining the obvious, sorry I wasted your time.

Reply to "Supress redirect is missing"

"Replace text in page titles": Empty page-title exception error.

1
MvGulik (talkcontribs)

Not a (direct) showstopper as such.

When a page-title change would result in an empty page-title it triggered the following error. (Replace_text form interface)

[ZLjjNw_BJjdvj9s8jky3lgAAAAQ] /wiki/Special:ReplaceText TypeError: Argument 2 passed to MediaWiki\Page\PageCommandFactory::newMovePage() must be an instance of Title, null given, called in /var/www/mediawiki-1.39.0/extensions/ReplaceText/src/SpecialReplaceText.php on line 386

Backtrace:

from /var/www/mediawiki-1.39.0/includes/page/PageCommandFactory.php(300)
#0 /var/www/mediawiki-1.39.0/extensions/ReplaceText/src/SpecialReplaceText.php(386): MediaWiki\Page\PageCommandFactory->newMovePage(Title, NULL)
#1 /var/www/mediawiki-1.39.0/extensions/ReplaceText/src/SpecialReplaceText.php(186): MediaWiki\Extension\ReplaceText\SpecialReplaceText->getTitlesForMoveAndUnmoveableTitles()
#2 /var/www/mediawiki-1.39.0/extensions/ReplaceText/src/SpecialReplaceText.php(82): MediaWiki\Extension\ReplaceText\SpecialReplaceText->doSpecialReplaceText()
#3 /var/www/mediawiki-1.39.0/includes/specialpage/SpecialPage.php(701): MediaWiki\Extension\ReplaceText\SpecialReplaceText->execute(NULL)
#4 /var/www/mediawiki-1.39.0/includes/specialpage/SpecialPageFactory.php(1428): SpecialPage->run(NULL)
#5 /var/www/mediawiki-1.39.0/includes/MediaWiki.php(316): MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, RequestContext)
#6 /var/www/mediawiki-1.39.0/includes/MediaWiki.php(904): MediaWiki->performRequest()
#7 /var/www/mediawiki-1.39.0/includes/MediaWiki.php(562): MediaWiki->main()
#8 /var/www/mediawiki-1.39.0/index.php(50): MediaWiki->run()
#9 /var/www/mediawiki-1.39.0/index.php(46): wfIndexMain()
#10 {main}
MediaWiki: 1.39.0
PHP: 7.4.33 (fpm-fcgi)
MariaDB: 10.5.19-MariaDB
ICU: 50.2
Replace Text	1.7 (cba3752) 18:03, 14 March 2023

Related page could potentially be flagged as invalid case, and skipped as such.

Potentially printing that there where cases skipped because of the empty page-title replacement result.

Reply to ""Replace text in page titles": Empty page-title exception error."

Replace spaces with underscores in file names?

10
Mbolatsara (talkcontribs)

Using the Original text:/Replacement text: form interface of this extension and the Use regular expressions option, what expressions can be used to find all spaces in File names to replace them with underscores?

The replacement should consider all and any markup begining with [[File: until .jpg, .png or other file type.

So anywhere from [[File: up until the first period becomes the searched pattern space:

[[File:Some file name.jpg|

... in which whitespaces should be replaced with underscores that may exist, resulting in:

[[File:Some_file_name.jpg|

Thanks

Dinoguy1000 (talkcontribs)

Why do you want to run this replacement? The software treats spaces and underscores the same in page/file names, and the general convention is to prefer spaces.

Mbolatsara (talkcontribs)

I agree, spaces are normally preferred.

However, unlike the usual metadata File page linked via images, I use a simplified image view.

Instead of [[File:Some image|800px]] on www.example.wiki leading to www.example.wiki/File:Some_image.jpg, images sometimes link to www.example.wiki/File.cgi?Some_image.jpg

File.cgi is an exception on my Apache that is bypassed by MW's default behaviour in returning non-existing page URLs, like: [[File:Some image.jpg|link=File.cgi?Some_image.jpg|800px]] would achieve.

Using the link= markup, Replace Text, either by command line or its form-based interface, would be the preferred method of replacing markup across many pages. A search and replace regex could for example:

  • Extract the filename pattern between any [[File: until the first period before jpg, JPG, png, etc.
  • Place in variable and replace spaces in filenames with underscores.
  • Insert the |link=Some_image.jpg variable in each [[File:... enclosure, just berore its closing ]].

This way the spaces and wiki markup can remain standard. The regex procedure would need to be idempotent to not operate on patterns which have File.cgi somewehere within [[File: .. File.cgi .. .jpg]] already.

Or, perhaps it is easier to change the MW's File: procedures for image displays to link to example.wiki/File.cgi?... instead of to example.wiki/File:..., in which case the underscores will need to be present.

Can this be configured by hooks in LocalSettings or in one particular MW template or PHP file?

Thank you for any ideas.

Dinoguy1000 (talkcontribs)

That's horrifying. Depending on what exactly your File.cgi does, this really seems like something you should be doing via a specific class name + Javascript (whether that JS is just a shim between the markup and File.cgi, or fully implements the functionality of File.cgi itself, also depends on what exactly File.cgi does).

Anyways, you could probably torture regex into doing what you wanted (or something close to it), but tbh this sounds like something a proper bot would be more appropriate for.

Mbolatsara (talkcontribs)

Linking to an external page, file or location via the[[File: ... |link=..]] markup is standard. Anyone who happens to be familiar with Replace Text's regex procedures, please kindly share your ideas how to insert the link markup above, bearing in mind the undercores needed in linked URLs which may be missing in the original [[File:Names of files.jpg]] segments.

MvGulik (talkcontribs)

Its not something that can be done with a single RE-job in this extension (at least not that I know of).


Bare basic example that targets file-titles with two words:

REGEX:"(?i)\[\[:?file:([a-z0-9]+)[ ]([a-z0-9]+)\.([a-z]+)\|"

Replace String:"$1_$2.$3"


For titles with other word-counts one would need appropriately adjusted, including the replace string, versions.

Unless your sure you don't have titles with mixed "Underscore/Space" ... More work ahead (three or more words per title). You could try "[_ ]" instead of "[ ]" as long as you don't hit the maximum page result limit.

This is missing the later added "|link=" part, as I have not looked at that part. Personally I think its better/easier to use a single RE-job to completely remover those parts first, and use accordingly adjusting Replace-Strings (from the example above) to re-add them.


(I probably don't know what I'm talking about and got it all wrong. And probably should not have reacted. Trying hard to work on that last part though.)

Dinoguy1000 (talkcontribs)

This is why I said ReplaceText regex isn't the right tool for the job. There is no way around having to multi-pass this if that's what you constrain yourself to.

Mbolatsara (talkcontribs)

Thanks for the above example. I presume a procedure to replace the spaces and insert the link= variables would require two or more steps.

There are different numbers of spaces in files depening on how many words different filenames happen to have but never a mix of spaces and underscores as in [[File:Some_ file name.jpg]] in my wiki situation.

I tested the regex using Special page form interface of Replace Text:

Original text: (?i)\[\[:?file:([a-z0-9]+)[ ]([a-z0-9]+)\.([a-z]+)\|

Replacement text: $1_$2.$3

The following error was returned by the Special page:

Database error: A database query error has occurred. This may indicate a bug in the software.

But without indication of a database table, it's difficult to know the cause of the error in case a MySQL table needs to be repaired.

If that's not the problem, since my Replace Text works with other more simple regexes, the old REL1_27 version of Replace Text I have may be incompatibe with the above code example. Did you by any chance test-run the regex using the Replace Text form interface or the extension's replaceAll.php command line option?

MvGulik (talkcontribs)

Database error) That should not happen. On this side the example RE works ok (Using the Replace Text form interface).

Used Replace_Text version:

1.7 (cba3752) 18:03, 14 March 2023

With:

MediaWiki: 1.39.0
PHP: 7.4.33 (fpm-fcgi)
MariaDB: 10.5.19-MariaDB
ICU: 50.2


Ps: With mixed spaces and underscores I mean [[File:Some_file name.jpg]] style links.


+Dinoguy1000 is right about "ReplaceText regex isn't the right tool for the job" in this case. Personally I would try to create a MW-bot for this. But that is not something that's done in a day if you have never used/created them.

Mbolatsara (talkcontribs)

Thank you for confirming your result. I'll test again with a newer MW installation.

Reply to "Replace spaces with underscores in file names?"

<strike>{n,m} quantifier (?)</strike> Lazy vs Greedy (Default)

4
MvGulik (talkcontribs)

Did not know "Replace Text" supported the {n,m} quantifier.
...
Well at least the {n} version that is.
When trying out the {n,m} format like in "{2,}" or "{2,99}" it just matched 2 cases, while ignoring the trailing rest. (?)

In case it matters.
Tested it with leading blank-lines removal like in "^\n{2,99}(.*)"

(Not tested {n,m} with additional trailing quantifiers ... don't really see the point of that)
(to get the job done I switched back to "^\n+?(.*)".)

MvGulik (talkcontribs)

Hmmm ... Might be related to the fact that quantifiers in 'Replace_Text' seem to be set to be lazy by default. (which I keep generally forgetting as I learned it the other way around)

Guess I need to do some additional reading & testing on this.

MvGulik (talkcontribs)

Yea, it is as expected.

RE: "a{4,8}" => "aaaaaabbbaaaaaacccaaaaaa" <= lazy mode.

RE: "a{4,8}?" => "aaaaaabbbaaaaaacccaaaaaa" <= greedy mode.


So far I don't know of any other RE-implementation that is doing it this way. Even the, on the main page linked, mysql page shows its greedy by default. (no mention of "greedy" or "lazy" though)

Although defaulting to RE-lazy-behavior for MW might not be a bad thing ...

(Makes testing 'Replace Text' intended RE code on sites like https://regexr.com/ a bit awkward, and error prone, though)

I do think this 'Lazy-Default vs Greedy' should be mentioned(!) on the 'Replace text' main page.

Lady G2016 (talkcontribs)

I was having the same problem until I saw your solution. I wanted to capture all of the characters in this string:

String: "{{Abc | 1498}}#hist=tab%3A2"

Enabling greedy mode fixed my problem. The % char seemed to cause additional difficulty, so I included it in my group.

RE: "\{\{Abc \| (\d*)\}\}([^ ]|\%)+" => "{{Abc | 1498}}#hist=tab%3A2" <= lazy mode

RE: "\{\{Abc \| (\d*)\}\}([^ ]|\%)+?" => "{{Abc | 1498}}#hist=tab%3A2" <= greedy mode

I agree that 'Lazy-Default vs. Greedy' should be mentioned on the 'Replace text' main page.

Reply to "<strike>{n,m} quantifier (?)</strike> Lazy vs Greedy (Default)"

ReplaceText extension and "expansion depth limit exceeded"

1
Jonathan3 (talkcontribs)

I don't want to duplicate it here, but I reported a problem on the Support Desk at Topic:Xepd60061zwhisr6. Essentially, dozens of pages ended up with this error message after using ReplaceText on them. I wonder whether anyone here has had this experience. Thanks.

Reply to "ReplaceText extension and "expansion depth limit exceeded""

Somewhat misleading feedback (category filter)

4
MvGulik (talkcontribs)

Somewhat minor, but which could be a bit less ambiguous.

When the "Replace only in category" filter is used. And Replace-Text can't find the target string in any of the pages listed in the targeted category. It comes back with: No category exists with the name "Category:<+name>".

Technically:
1) Replace-Text just could not find any page that matched the search string.
2) Categories are of course a bit tricky. As they can exist as a page, but still be empty. Or they can contain pages, without actually having some page-content set.

Something like No matching pages found in/for the "Category:<+name>" seem a better feedback text for these cases.

Replace Text: 1.7 (8e35c8f) 19:50, 4 December 2022
MediaWiki: 1.39.0
Cavila (talkcontribs)

> Or they can contain pages, without actually having some page-content set.

Is that the issue in your case? This seems to check if the category exists as a wiki page and if it doesn't, throws the error message you mentioned.

MvGulik (talkcontribs)

Mmm ... will have think a bit, and explore that code.

MvGulik (talkcontribs)

Although I'm not familiar with PHP ... other than a quick PHP crash-course at w3schools. The related code seems ok as far as I can tell.

The only thing that I could not figure out is what that exists() part in line 280 is doing. It seems not a PHP native function, but I also could not find any reverence to it in the other ReplaceText codes on Github.

Considering the output I got, and the fact that the used category has pages linked to it and also a content-page. A failing category-detection at line 280 would kinda explain that result.

+(The used category also has no special name)

Reply to "Somewhat misleading feedback (category filter)"

Suggestion: Lock "Replace text in page contents" to On. (?)

5
Summary by MvGulik

Closed: Invalid/Insufficient data.

MvGulik (talkcontribs)

-- Searching with the Replace-Text MW-search-page.

-- Used Options shortcuts:
a) [Replace text in page contents]
b) [Replace text in page titles, when possible]
c) [Replace only in category]

These Options are displayed/presented to the user as seeming to be independent options.

Replace-Text behavior however shows that if option [a] is Off. All other option also 'seem' to be non-usable/-active.

For example:

With option [a] Off, but Option [b] On, and looking for "/doc" (templated"namespace) Replace-Text never finds anything.

Or:

If Option [c] is used (valid category specified), and option [a] is Off. A "/doc" search comes back with "No category exists with the name "Category:<used category>".
-- This seems more related to the fact that no matches where found.

For Replace-Text to provide less ambiguous UI-feedback to the user, based on this behavior, it kinda makes sense to auto-disable all options if [a] is Off.
... or to just lock Option [a] to On.

-- Not specific to any Replace-Text version.

Yaron Koren (talkcontribs)

It seems like the real issue is the bug that you can't replace text in only page titles, no?

Yaron Koren (talkcontribs)

By the way, I can't replicate this bug. What version of Replace Text are you using? And could it be that you have not selected all the necessary namespaces?

MvGulik (talkcontribs)

>It seems like the real issue is the bug that you can't replace text in only page titles, no?

If its intended that that should be possible with Replace-Text, than yea.


>... could it be that you have not selected all the necessary namespaces?

Unlikely. I first made sure Replace-Text found a bunch of pages, before trying it with the [Replace text in page contents] option disabled.


Specs:

Replace Text: 1.7 (8e35c8f) 19:50, 4 December 2022
- - - -
MediaWiki: 1.39.0
PHP: 7.4.33 (fpm-fcgi)
MariaDB: 10.5.18-MariaDB
ICU: 50.2


Personally I can't recall any time that this worked on any version of Replace-text.
(only have used Replace-Text on the same wiki. Although with different MW-versions over time)
Ps: I'm not running/hosting the wiki. Just being its active admin.

MvGulik (talkcontribs)

Did run into case where I did get a 'found in title' result, while 1) there where no matches on title content. (Or 2) 'find in pages' was disabled.)

So there seems to be some additional (unknown at this point) trigger in play.

Will re-post, with more appropriate title, when I think I found that potential trigger.

(Closing: "Invalid/Insufficient data")