Extension talk:Replace Text/Archive 2018

Large Wikis
For wikis with numerous pages, an approach based on jobs might be more appropriate. Just a thought. Jean-Lou Dupont 15:48, 29 April 2008 (UTC)


 * Asking administrators to set up a cron job might be overkill - what do you think about the idea of including a command-line script instead, for the case of large wikis? Yaron Koren 19:25, 7 May 2008 (UTC)
 * I was thinking about a MediaWiki custom job written as an extension: each page request (from any user) would trigger X number of replace tasks. This is the general way jobs under MediaWiki work. Jean-Lou Dupont 19:38, 7 May 2008 (UTC)


 * Oh, I wasn't even aware of that functionality. I just looked it up; that definitely clarifies some things about how templates and categories work... do you know if there's documentation somewhere on how to add jobs? Yaron Koren 19:44, 7 May 2008 (UTC)
 * I usually just look at the code in those cases ;-) Look at JobQueue.php. Jean-Lou Dupont 19:49, 7 May 2008 (UTC)


 * Okay, thanks; that looks ideal. It could be useful not just for this extension, but for functionality like SMW's refresh-semantic-data action, which currently runs only as a command-line script. Is there any code of yours I can look at that uses the job queue? Yaron Koren 19:55, 7 May 2008 (UTC)
 * Unfortunately no but it looks straight forward enough. Jean-Lou Dupont 20:02, 7 May 2008 (UTC)

Error
I have this error: Fatal error:

Maximum execution time of 30 seconds exceeded in path\includes\Database.php on line 681

--85.18.14.29 10:53, 4 May 2008 (UTC)


 * See Jean-Lou Dupont 11:04, 4 May 2008 (UTC)

Regexp and others
Hi Yaron. Could you make this excellent extension use Regexp and make some check box on giving it a preview mode or a list of data which will be changed. :)--Roc michael 20:19, 8 May 2008 (UTC)
 * Hi Roc, I totally agree with you. +: I think we could have a way to select in which namespaces or categories we want to do the search, and could be a multiselect box, something like this: (first field) http://lab.arc90.com/tools/multiselect/

Finally!
Was looking for something like this as this would be the only reason to use bots what I'm too dumb to. Thanks!

Still I get memory problems: On localhost I get this error after ~3 seconds:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 3672 bytes) in C:\xampp\htdocs\wiki12clean\includes\Title.php on line 142 On my sever where I have a php memory limit of 30MB it gives me a blank page. Error log:

PHP Fatal error: Maximum execution time of 30 seconds exceeded in includes/Database.php on line 818 (MW 1.12.0 / PHP 5.1) --Subfader 21:47, 10 May 2008 (UTC)
 * And I also support the "per namespace" feature. --Subfader 15:56, 11 May 2008 (UTC)

Version 0.2 released
Hi, to everyone who's had problems with the extension before, I just released a new version, 0.2. This version has two big changes: the replacement of text is a two-step process, with the user first shown a list of pages to be replaced so they can choose which ones to replace in; and actual replacement is done through MediaWiki jobs, as Jean-Lou Dupont suggested above. Both of these changes should combine to deal with many of the problems people have experienced, like server timeout, memory overload, and lack of control over replacements. Please try out this new version when you can, and let me know if it improves things for you. Yaron Koren 20:53, 12 May 2008 (UTC)

Still my memory problem
Thanks for the fast work. But still: Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 3672 bytes) in C:\xampp\htdocs\wiki12\includes\Title.php on line 142

I use MW 1.12.0rc and I saw heavy changes are done to Title.php on SVN these days. --Subfader 21:53, 12 May 2008 (UTC)


 * At which stage of the process do you see this error? Yaron Koren 22:59, 12 May 2008 (UTC)
 * Right after hitting Continue --Subfader 05:16, 13 May 2008 (UTC)
 * Okay, I think I see what's going on - the way in which the extension currently finds the list of pages is extremely inefficient. I think there's an easy fix, which I'll add to the next version, which should speed things up, and reduce memory usage, considerably. Yaron Koren 14:54, 13 May 2008 (UTC)
 * Maybe use a form of iterator to solve the issue? Insert a job in the queue with 1st article as target. When a job concludes, insert a new job with the next article id and so on. With this method, no more long-lead-time-fetching. Of course, you could fetch a handful of article IDs per-job and process them as well.  I would also suggest adding an entry in the log when the whole process is finished. Jean-Lou Dupont 15:01, 13 May 2008 (UTC)
 * Actually, this comes before jobs are used - when the list of pages containing the search string is found. Thanks for the previous jobs tip, though. Yaron Koren 15:50, 13 May 2008 (UTC)
 * Well, you could have the job go through all pages one by one also; no more searching first and post replace job after. Jean-Lou Dupont 15:54, 13 May 2008 (UTC)
 * I don't have much clue but would a preview still be possible job-by-job? Speed is no issue with this extension imo since a bot isn't fast either. So if the preview would take 10 minutes to load; fine for me. At least it would give me a better feeling than bashing everything without undo. Btw: there are ~8500 articles and 16,112 total pages in my local wiki i tested this on. --Subfader 19:28, 13 May 2008 (UTC)
 * If the time and/or memory limits are still being exceeded on the preview page, you could paginate it, fetch like 100 previews at a time. To get around these limits and still have all the checkboxes on one page, you could use an AJAX function to call back to the server and pull the next page of results - might take several minutes to display all the previews, but it won't time out because its done in multiple requests.
 * Of course, there's still 2 places a large wiki could have problems - the time taken by the database to produce a result set (which you can't really work around - but its a high enough limit to not really matter), and memory on the client side (making your browser run really slowly - solve by splitting the preview into pages) --MrAngel 23:33, 4 July 2008 (UTC)

Works fine
Hi, it works smooth now. The "Job" speed varies a lot (on my local wiki). 1-15 actions per minute or even 10 minutes pause. Is it ok this way?

Preview loads but is a bit buggy when div's are included in the previewed section where the "term to be replaced" is included. This broke the layout. The source code of the section is

Thanks anyway, will use it a lot I think :) --Subfader 17:29, 14 May 2008 (UTC)


 * Okay, that's great to hear. I have no idea how long jobs are supposed to take - for me it works almost instantly, on a small wiki, but in any case, I doubt if the process can be sped up. Thanks for the bug report - that's a new bug in the current version; I'll fix it in the next version. Yaron Koren 19:25, 14 May 2008 (UTC)


 * Just wanna let you know that divs or <> seem to make trouble genereally. I tried these:
 * find " MDB" | replace with "  NEW" | >> MYSQL error
 * same for " class='username'>MDB " | ">MDB<" gave me a broken preview similar to above example. Subfader 21:47, 15 May 2008 (UTC)
 * 0.2.2 works fine in the preview now, just found another "preview breaker". Center :
 * (source of "MixesDB:Latest Comment Pages")
 * Not bugging me, just to let you know. --Subfader 16:28, 27 May 2008 (UTC)


 * Oh, oops, I was planning to fix that in this version but then I forgot. I just fixed it now, so if you get the "new" version 0.2.2, it should work. :) Thanks for the bug report. Yaron Koren 16:55, 27 May 2008 (UTC)


 * Update on my above note "1-15 actions per minute or even 10 minutes pause": That was on my localhost wiki. On my real server I have constantly ~30 actions per minute. Already did ~2500 replacements and it's just smoof. --Subfader 17:02, 27 May 2008 (UTC)


 * Smoof, awesome. Yaron Koren 17:09, 27 May 2008 (UTC)


 * When someone views a page, it uses the same PHP process to run a Job. So if you get no pageviews for 10 minutes, you'll also have no Jobs run in that time. I suspect this is only likely to happen on very small sites, or on your local/testing servers. --MrAngel 16:35, 1 July 2008 (UTC)

[RESOLVED] Using Bots / Recent Changes
You could insert $wgNewUserSupressRC into the code in order to enable hiding bot flagged users from recent changes (as Extension:NewUserMessage does), but better world be to instert only 1 entry to the normal recent changes (User Bla: Global Text Replace 'old text' to 'new text'. Still all details should show up in the user's contribs --Subfader 07:54, 15 May 2008 (UTC)


 * Sorry, what's a bot-flagged user? Yaron Koren 15:12, 15 May 2008 (UTC)


 * Could be the term is wrong :) A user which you flag as bot / give bot rights. Such aren't displayed in Recent Changes. But mine is with the replacing actions. --Subfader 15:54, 15 May 2008 (UTC)


 * Oh, I see. Well, maybe having a bot as the user doing the text replace is a bit misleading, since it's not a script but a special page making the changes? Or maybe that's too subtle a distinction to worry about. Yaron Koren 16:31, 15 May 2008 (UTC)


 * My users have no clue about wiki, better than letting them think it was done manually by a "user". That's why I ment a single entry in Recent Changes describing it was a bunch of changes would be useful. If you see ne need for it, no problem. --Subfader 16:34, 15 May 2008 (UTC)

Replace text in title (move)
Why is it not possible? I can imagine I'm not the only one who needs it this way. Either soft by additionally moving or hard (if it's possible and easier). Categories won't need to be moved though (that's just one manual change and losing all revisions for the category is ok). So there could be a tickbox before the preview [_] Change article names as well (move)? --Subfader 16:06, 15 May 2008 (UTC)
 * Well, my thought was that (a) there are challenges when moving a page, if a page already exists at the new location, and (b) the number of pages that would need to be moved is too small to justify the effort. How many pages would you want renamed, anyway? Yaron Koren 17:59, 15 May 2008 (UTC)
 * I don't plan to use this for one job "that needs to finally be done" only. I (want to) use this regulary, as mentioned I'm too dumb for setting up real bots and this replace text tasks would be the only reason. I waited long to find something like this (see first post by me). And the terms I'd rename are e.g. artist names which in all cases both exist in the article title and in the article content (as category link). Deleting the old and creating the new category is no effort then but moving 5-30 pages per replace task is. Subfader 19:46, 15 May 2008 (UTC)

I'd really appreciate such an addition. You could easily seperate it into 2 different tasks. On the Special page with the input fields you could make check boxes for: "Replace Text in: [x]Article [_]Title (move)" with article beeing checked by default. Please let me know what you think about it. --Subfader 09:09, 30 May 2008 (UTC)


 * It does make sense to allow it, and there's probably no need for a separate checkbox, since the user gets to choose exactly what gets replaced anyway. The big complication is when the new title matches some page that already exists. My guess is that it's impossible for the extension to do anything reasonable in that case, so it's best to just ignore those, maybe with some text to the user listing the pages that can't be moved. Otherwise, I guess it's a reasonable idea. Yaron Koren 16:39, 30 May 2008 (UTC)


 * Checkbox: Ok, from the point of view that you want to replace misspellings it makes sense to not split them up. But as soon as you want to differ it will be unhandy to go through a long list unchecking the title replacements. Grouping article and title replacements would help. New title already exists: Yep a list afterwards with the (linked) articles not being moved cos the new title already exists (linked as well for a quick cehck) would be good. --Subfader 20:45, 30 May 2008 (UTC)

Special Characters
I get an MYSQL error trying to replace ‘ or ’ with ' or "n´t" to "n't": 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't%' AND p.page_namespace != 1 AND p.page_namespace != 3 AND p.page_namespace ' at line 5 (localhost) Any idea? --Subfader 19:27, 22 May 2008 (UTC)


 * Thanks for the bug report; this will be fixed in the next version. Yaron Koren 20:01, 22 May 2008 (UTC)
 * Also magic words like aren't found. --Subfader 19:23, 24 May 2008 (UTC)


 * How interesting: I was able to reproduce that bug, but it took me a while to figure out what was causing it. The problem isn't magic words, it's that the extension can't handle strings at the very beginning of pages (as I assume is for you). This, too, will be fixed in the next version; thanks. Yaron Koren 20:18, 25 May 2008 (UTC)
 * Yep, I always added at the very beginning of the article. --Subfader 06:48, 26 May 2008 (UTC)
 * Works fine now in 0.2.2. Thanks! --Subfader 16:31, 27 May 2008 (UTC)
 * It also doesn't find foo eg for replacing --Subfader 18:14, 2 June 2008 (UTC)


 * I don't understand - are you trying to replace "Foo" or "Foo" ? Yaron Koren 18:39, 2 June 2008 (UTC)


 * My ver. 0.2.2 replaced hundreds of cases of " Category:abc123  " with "  ". Brian7632416 18:47, 2 June 2008 (UTC)


 * I tried to replace "" but it wasn't found although it's there. Of course I could replace "Category:Foo" which would work most of the cases if one wants to change the category. Although the problem I wanted to fix was solved otherwise now, I just wanted to let you know. --Subfader 19:05, 2 June 2008 (UTC)


 * Ok that was on my localhost. On my server it does the trick. Sorry for the confusion. --Subfader 07:21, 3 June 2008 (UTC)


 * What's causing the problem on your localhost, do you know? Yaron Koren 13:45, 3 June 2008 (UTC)

[RESOLVED] Run as minor edits
It should be possible to let it run as minor edits, since people won't be amused to get lots of watchlist notifications for minor changes on lots of pages. Extension:NewUserMessage uses it if you wan tto check. --Subfader 19:32, 22 May 2008 (UTC)


 * Hm... that's a good idea, but do you think it should be up to the admin? Or is that too much effort, and it's better to just always make them minor edits? Yaron Koren 20:06, 22 May 2008 (UTC)


 * If possible I think it should be optional minor/not-minor. The replaced text could be a simple change or it could be a significant change occuring across multiple pages. Smaug  [[Image:Tournesol.png|20px]] 20:09, 22 May 2008 (UTC)


 * A checkbox on teh special page would be the best, if not fro admin only in the php files would be enough for me as I would run all as minor edit as I cannot think of a case where the watching users should be informed about the change --Subfader 20:24, 22 May 2008 (UTC)


 * Minor Edit works fine in 0.2.2. Thanks a lot! --Subfader 16:35, 27 May 2008 (UTC)


 * Cool. I set it so that every edit is a minor edit, because that's the simplest solution, and because text replacements are a minor change pretty much by definition. But if anyone still thinks that should be a user option, let me know. Yaron Koren 17:14, 27 May 2008 (UTC)

Multiple lines
Thanks Yaron for another stellar extension. Any plans for handling multiple lines? --Tosfos 03:49, 25 May 2008 (UTC)


 * Hi, I'm glad you find it useful. Regular expressions might be added soon for finding strings, which should allow for replacing multiple lines (replacing something with multiple lines might be trickier, but we'll see.) Yaron Koren 20:20, 25 May 2008 (UTC)


 * I'm thinking it might be easier for non-techies if they were presented with two text areas so that they can represent a new line by simply going to the next line. Perhaps this can be implemented in addition to regular expressions at some point. --Tosfos 01:32, 3 June 2008 (UTC)

Automated Routines
Would be useful to define routines that are run regulary for replacements that need no verification preview but need to be done every now and then (like obvious mispellings) to keep the data clean. A routine could include those settings: Only implent if it doesn't take you too much coding. --Subfader 14:38, 25 May 2008 (UTC)
 * define user unhder which the actions are run
 * define timespan, e.g. weekly, monthly (where to start?)
 * define replacements 'X' with 'Y'


 * I don't believe MediaWiki has any mechanism for scheduling regular tasks; let me know if you know of something. Yaron Koren 20:22, 25 May 2008 (UTC)
 * No I don't know about schedules either. The point is I have already 20 standard replacements noted I would run every 2 weeks to keep my data clean. Maybe it's easier to forget the schedule but to combine many replacements in 1 job without preview and warnings? Again, if it's too much coding, no problem. I can still run them manually every now and then. doesn't take too much time though. --Subfader 20:39, 25 May 2008 (UTC)

[RESOLVED] Log Out in Job Queue
Whoa, took me a while to realize that you cannot log out while the job queue is running. I logged out like 20 times, cleared the cache, tried to log in as a different user, nothing. I stayed logged in as the bot user I use for the text replacements. Maybe make a note somewhere to avoid confusion for others. ;) --Subfader 19:41, 25 May 2008 (UTC)


 * Thanks for that note. I still don't fully understand how jobs work in MediaWiki, but your observation clarifies some things I've noticed. I'll add something to the documentation. Yaron Koren 20:34, 25 May 2008 (UTC)


 * Good if I can help. Another thing i realized: While the job queue was running (+1 hour) I changed my own talk page. And everytime I logged out the system logged me in again. I can tell because it always displayed me the "you got a new message" thingy for the talk page change. I watched my talk page, logged out, again: "you got a new message". Cache? I wonder why the system didn't notice I've read my talk page already since my last edit of it. Not buggy, just may be helpful to understand the job queue. --Subfader 20:36, 25 May 2008 (UTC)

Can anyone confirm thsi behaviour? This is highly confusing for all other users being logged in seeing a different user name on top while job queue is running, what can take hours for my replacements. --Subfader 07:03, 28 May 2008 (UTC)
 * Ok this is weird now but maybe the solution. I just started replacement actions as "User B" (bot grants) on a different machine(!) (a windows VPS server). On my local PC(!) where I'm logged in as "User A" I can see "User B" on the upper right User Links section. Although I'm not logged in as "User B" it makes me think so (see above).   gives me "User A" also when I edit etc.


 * Re-checked. When I run a job queue as User X, everyone sees the UserLinks on top as if he would be logged in as User X (but actually isn't). This also the case for all not logged in visitors. Very confusing for the users. I also tried adding the user to different user groups but no joy ;) --Subfader 15:47, 20 June 2008 (UTC)


 * In ReplaceTextJob.php, you have (lines 43-49)

if ($num_matches > 0) { global $wgUser; $wgUser = User::newFromId($this->params['user_id']); $edit_summary = $this->params['edit_summary']; $flags = EDIT_MINOR; $article->doEdit($new_text, $edit_summary, $flags); }


 * Note that a Job is run in the same process as a normal page request, after parsing but before output generation. So any code in the Skin file will see the $wgUser that you have changed here. As well as the effects you mention, this may also confuse some other extensions. I'd suggest putting the user back when you've finished with it. Something like:

if ($num_matches > 0) { global $wgUser; $realUser = $wgUser; $wgUser = User::newFromId($this->params['user_id']); $edit_summary = $this->params['edit_summary']; $flags = EDIT_MINOR; $article->doEdit($new_text, $edit_summary, $flags); $wgUser = $realUser; }


 * Hope that helps some --MrAngel 16:48, 1 July 2008 (UTC)


 * Oh, oops... that problem makes sense now. Thanks for finding the bug, and explaining a solution. This will go into the next version. Yaron Koren 20:34, 1 July 2008 (UTC)


 * Yay :) --Subfader 10:39, 2 July 2008 (UTC)

Version 0.2.1 Working Well - and THANKS Yaron Koren

 * MediaWiki: 1.11.1
 * PHP: 5.2.3 (apache2handler)
 * MySQL: 5.0.51a
 * Extensions:
 * MassEmail 0.3.5
 * RSS Reader 0.2.4
 * AWC's MediaWiki Forum 2.3.4
 * Google News Bar 071007

Replace Text extension (0.2.1) works very well for me on the above. The extension page and instructions are easily understandable and a pleasure to read. The installation itself was quick and easy.

I'm a document/text-oriented perfectionist who needed something to tidy up the sloppy submissions on my recipe wiki (DishiWiki). I'm new to MediaWiki and had just begun to investigate bot-editing but wasn't even able to find where to start, even though MediaWiki is full of references to bot-edits. All along, though, all I needed to do was to make changes such as ½ to 1/2 (for legibility, in that case), and similar types of "search and replace" operations. Since I'm new to MediaWiki, I did what I knew until I found a better way, so Replace Text enables me to go back and change what I did poorly when I didn't know any better.

I think a paragraph about what is happening in the background (is it a true cron job?) and how logging in and out (or attempting to do so) affects the "cron job" would be helpful, but Replace Text is running right now over on my wiki, and I'm filling the time by sending this congratulations to you. My experience was similart to Subfader's (above). I get about 10 pages edited per minute (average about 5 replaces per page), with about a 10-minute pause between distinct replace requests that were entered in rapid succession. (This is NOT a deal breaker, IMO.)

I hope somebody who knows how to do so will give you a star (or whatever they do here) to recognize a worthy contribution. Thanks again, Yaron. -- Brian7632416 02:19, 26 May 2008 (UTC)


 * Thanks for the compliments, and I'm glad the extension has been useful to you. I must admit that my similar lack of understanding of bots is partly why I created the extension in the first place. :) I'll definitely try to improve the documentation. Yaron Koren 23:43, 26 May 2008 (UTC)

Will Wild Cards Work in Replace Text?

 * MediaWiki: 1.11.1
 * PHP: 5.2.3 (apache2handler)
 * MySQL: 5.0.51a
 * Extensions:
 * MassEmail 0.3.5
 * RSS Reader 0.2.4
 * AWC's MediaWiki Forum 2.3.4
 * Google News Bar 071007
 * Replace Text 0.2.2

Replace Text 0.2.2 works great. What about wild cards? How could I replace these:


 * green onion with "green onion," and
 * Turnip greens with "Turnip greens," and
 * Haas avocado with "Haas avocado"

all at once to reduce to plain text all the intra-wiki links that appear on pages I have imported? Maybe, e.g.,:

Original text = ""**"" and Replacement text = ** --72.220.151.42 18:41, 27 May 2008 (UTC)


 * There are no plans right now to add wildcard handling, but it could happen. Yaron Koren 19:25, 27 May 2008 (UTC)

Restrict to categories
It would be useful to only let the replacement be applied on pages in a defined category (additional box) so unticking many pages in the preview for some standard text isn't neccessary. Combined with the multiple lines this would make it extremely powerful and handy. --Subfader 21:00, 5 June 2008 (UTC)

Handling Quotes
Is there anyway to replace a string with quotes in it ie; style="background:#ffdead;" Is what I want to change but it only finds upto before the quote so it only finds style= Am I doing something wrong?
 * If you simply want to change the color find  /   and replace.? --Subfader 05:02, 10 June 2008 (UTC)
 * Thanks for the bug report; there was a bug in the handling of strings with quotation marks. This should be fixed in the latest version, 0.2.3. Yaron Koren 14:11, 24 June 2008 (UTC)

Problems
I have an article where i have this words: Mamoeiro and Mamoeiro. I have a lot of other articles that have Mamoeiro word on the text too. I wanted to replace all words Mamoeiro by: Mamoeiro. When i do it on replace text, the articles that already have Mamoeiro will be replaced by this wrong long string: Mamoeiro|bairro Mamoeiro]]. So i really think to solve this problem, Replace text should add the check box per word and not per article.


 * Here's how I handle that situation in any type of search-and-replace operation, be it a word-processor document, HTML, XML, etc.:
 * Temporarily replace Mamoeiro with a string that is sure not to appear elsewhere. (I like "%%").
 * Replace all the remaining (standalone) "Mamoeiro"s with Mamoeiro.
 * Replace (restore) all the "%%"s with Mamoeiro.
 * You will want to think through each instance of using this technique to prevent conflicts, and be sure to let the job queue completely finish each replace operation before invoking the next one. Hope this helps. -Brian7632416 00:23, 14 June 2008 (UTC)
 * Hi Brian, It's a workaround that can help, but i think will be great if we have a real solution.
 * It's just a "replace text" extension. It's helpful to us non-programmers. I don't want to have to check hundreds of check-boxes (in a per-word paradigm). I'll stick to the Replace Text version that does not make that change, and I'll use the "workaround" (?) mentioned above. -Brian7632416 01:04, 14 June 2008 (UTC)
 * Way easier. Use brain! :) Just find the blank after Mamoeiro (cos this is not included in the code you already have added). FIND "Mameiro " REPLACE WITH " Mamoeiro ". --Subfader 15:05, 14 June 2008 (UTC)
 * But there may be one of any number of punctuation marks after a freestanding/non-piped/non-linked "Mameiro," so multiple swipes would be needed. OP said he has "a lot of other articles with the word on [sic] the text." That's why I like to work backwards, starting by "disguising" the text that should remain unchanged, in the end. -Brian7632416 16:13, 14 June 2008 (UTC)


 * In this case, I suspect the best fix would be to add a "don't replace text within links" checkbox - easy enough to do (actually, I just did it - need to test thoroughly before submitting a patch, though) --MrAngel 16:14, 5 July 2008 (UTC)


 * Hi MrAngel It's really a good feature, thanks. As i can see by your comments it's not a good idea have per-word replace. If per-word replace really can't be added, I think we should have an option to see all lines being replaced. Is it possible? Lleoliveirabr 20:03, 5 July 2008 (UTC)