Topic on Extension talk:Replace Text

Replace spaces with underscores in file names?

10
Mbolatsara (talkcontribs)

Using the Original text:/Replacement text: form interface of this extension and the Use regular expressions option, what expressions can be used to find all spaces in File names to replace them with underscores?

The replacement should consider all and any markup begining with [[File: until .jpg, .png or other file type.

So anywhere from [[File: up until the first period becomes the searched pattern space:

[[File:Some file name.jpg|

... in which whitespaces should be replaced with underscores that may exist, resulting in:

[[File:Some_file_name.jpg|

Thanks

Dinoguy1000 (talkcontribs)

Why do you want to run this replacement? The software treats spaces and underscores the same in page/file names, and the general convention is to prefer spaces.

Mbolatsara (talkcontribs)

I agree, spaces are normally preferred.

However, unlike the usual metadata File page linked via images, I use a simplified image view.

Instead of [[File:Some image|800px]] on www.example.wiki leading to www.example.wiki/File:Some_image.jpg, images sometimes link to www.example.wiki/File.cgi?Some_image.jpg

File.cgi is an exception on my Apache that is bypassed by MW's default behaviour in returning non-existing page URLs, like: [[File:Some image.jpg|link=File.cgi?Some_image.jpg|800px]] would achieve.

Using the link= markup, Replace Text, either by command line or its form-based interface, would be the preferred method of replacing markup across many pages. A search and replace regex could for example:

  • Extract the filename pattern between any [[File: until the first period before jpg, JPG, png, etc.
  • Place in variable and replace spaces in filenames with underscores.
  • Insert the |link=Some_image.jpg variable in each [[File:... enclosure, just berore its closing ]].

This way the spaces and wiki markup can remain standard. The regex procedure would need to be idempotent to not operate on patterns which have File.cgi somewehere within [[File: .. File.cgi .. .jpg]] already.

Or, perhaps it is easier to change the MW's File: procedures for image displays to link to example.wiki/File.cgi?... instead of to example.wiki/File:..., in which case the underscores will need to be present.

Can this be configured by hooks in LocalSettings or in one particular MW template or PHP file?

Thank you for any ideas.

Dinoguy1000 (talkcontribs)

That's horrifying. Depending on what exactly your File.cgi does, this really seems like something you should be doing via a specific class name + Javascript (whether that JS is just a shim between the markup and File.cgi, or fully implements the functionality of File.cgi itself, also depends on what exactly File.cgi does).

Anyways, you could probably torture regex into doing what you wanted (or something close to it), but tbh this sounds like something a proper bot would be more appropriate for.

Mbolatsara (talkcontribs)

Linking to an external page, file or location via the[[File: ... |link=..]] markup is standard. Anyone who happens to be familiar with Replace Text's regex procedures, please kindly share your ideas how to insert the link markup above, bearing in mind the undercores needed in linked URLs which may be missing in the original [[File:Names of files.jpg]] segments.

MvGulik (talkcontribs)

Its not something that can be done with a single RE-job in this extension (at least not that I know of).


Bare basic example that targets file-titles with two words:

REGEX:"(?i)\[\[:?file:([a-z0-9]+)[ ]([a-z0-9]+)\.([a-z]+)\|"

Replace String:"$1_$2.$3"


For titles with other word-counts one would need appropriately adjusted, including the replace string, versions.

Unless your sure you don't have titles with mixed "Underscore/Space" ... More work ahead (three or more words per title). You could try "[_ ]" instead of "[ ]" as long as you don't hit the maximum page result limit.

This is missing the later added "|link=" part, as I have not looked at that part. Personally I think its better/easier to use a single RE-job to completely remover those parts first, and use accordingly adjusting Replace-Strings (from the example above) to re-add them.


(I probably don't know what I'm talking about and got it all wrong. And probably should not have reacted. Trying hard to work on that last part though.)

Dinoguy1000 (talkcontribs)

This is why I said ReplaceText regex isn't the right tool for the job. There is no way around having to multi-pass this if that's what you constrain yourself to.

Mbolatsara (talkcontribs)

Thanks for the above example. I presume a procedure to replace the spaces and insert the link= variables would require two or more steps.

There are different numbers of spaces in files depening on how many words different filenames happen to have but never a mix of spaces and underscores as in [[File:Some_ file name.jpg]] in my wiki situation.

I tested the regex using Special page form interface of Replace Text:

Original text: (?i)\[\[:?file:([a-z0-9]+)[ ]([a-z0-9]+)\.([a-z]+)\|

Replacement text: $1_$2.$3

The following error was returned by the Special page:

Database error: A database query error has occurred. This may indicate a bug in the software.

But without indication of a database table, it's difficult to know the cause of the error in case a MySQL table needs to be repaired.

If that's not the problem, since my Replace Text works with other more simple regexes, the old REL1_27 version of Replace Text I have may be incompatibe with the above code example. Did you by any chance test-run the regex using the Replace Text form interface or the extension's replaceAll.php command line option?

MvGulik (talkcontribs)

Database error) That should not happen. On this side the example RE works ok (Using the Replace Text form interface).

Used Replace_Text version:

1.7 (cba3752) 18:03, 14 March 2023

With:

MediaWiki: 1.39.0
PHP: 7.4.33 (fpm-fcgi)
MariaDB: 10.5.19-MariaDB
ICU: 50.2


Ps: With mixed spaces and underscores I mean [[File:Some_file name.jpg]] style links.


+Dinoguy1000 is right about "ReplaceText regex isn't the right tool for the job" in this case. Personally I would try to create a MW-bot for this. But that is not something that's done in a day if you have never used/created them.

Mbolatsara (talkcontribs)

Thank you for confirming your result. I'll test again with a newer MW installation.

Reply to "Replace spaces with underscores in file names?"