Topic on Project:Support desk

Problem with Umlaut after Upgrade to MW_1.31.1

7
Summary by Alfredo1066

Filezilla must be forced to use utf8 as charset when communicating with server.

Alfredo1066 (talkcontribs)

Hi,

after upgrading from 1.26.3 to 1.31.1 links to files with umlauts in the filename are broken and I can't call up the file even though it is present in the images-directory.

When the link to that specific file is in a page whose name also contains an umlaut the page name is also distorted in the 404 error message. For example, an ü is shown as %C%BC.

My wiki's language is German, and the following are set in LocalSettings.php:

  • $wgShellLocale = "de_DE.utf8";
  • $wgLanguageCode = "de";

The following software is installed:

  • PHP: 7.2.10-he.0 (apache2handler)
  • MySQL: 5.6.37-82.2-log
  • ICU: 52.1

MW_1.26.3 ran with PHP 5.6 and everything worked fine.

Any ideas?

Bawolff (talkcontribs)

Its probably not relavent anymore, but older versions of php had lots of problems with this on windows, so is your server on windows?


> For example, an ü is shown as %C%BC.

This is very weird. %C3%BC is how to correctly encode a ü. I'm not sure how the 3 could be stripped. Maybe check your rewrite rules, and if you have any weird apache modules installed that do weird stuff.

If available, please post a link to your site.

Alfredo1066 (talkcontribs)

Thanks for your reply. No, the wiki is not on Windows, it runs on a unix-environment. The wiki is private, so unfortunately I can't post a link.

My feeling is that this is more of a problem with the database. In LocalSettings.php the following is set:

  • $wgDBprefix = "";
  • $wgDBTableOptions = "ENGINE=MyISAM, DEFAULT CHARSET=binary";
  • $wgDBmysql5 = false;

Collation in the database is binary except for the searchindex, where it is latin1_german2_ci.

As far as I know, latin1 is a subset of utf8, but could this inconsistency be the source of my troubles? And if so, how do I get about it?

Or is the engine (myisam) causing problems. As of MW_1.30 the default engine is now InnoDB. Should I change this too?

Bawolff (talkcontribs)

So changing the charset in a DB can cause encoding problem (Usually binary is the best, it doesn't really matter whether its binary or utf8 though, just if you convert between one or the other its easy to do it wrong and cause others. Normally though, if its a conversion isse though, usually its not quite what you describe, but would be the utf8 bytes being convertered to what the utf8 points for the latin1 code points equivalent, which isn't exactly what you describe.

But if you want to check things, you could maybe look at what img_name field in the image table and page_title in your page table in your db looks like to see if its correct.

My main guess would really be something in the webserver and/or some sort of proxy server issue.

MyISAM vs InnoDB shouldn't affect this. InnoDB is reccomended though. It has better performance under load, and its more reliable (If your computer looses power or whatever).

2001:16B8:10C2:3B00:39BB:A6CD:552A:AD56 (talkcontribs)

If the searchindex table is the only table, which is not binary, then adjusting this from latin1 to utf8 should be fairly simple:

  • You can TRUNCATE the table with this MySQL query: TRUNCATE searchindex;
  • Then convert the empty table to utf8: ALTER TABLE searchindex CONVERT TO CHARACTER SET utf8;
  • Finally run the maintenance script rebuildtextindex.php to fill it again.

(I think however that this should not influence the umlaut problem you are having.)

This post was hidden by 2001:16B8:1043:A900:DC3D:6FDA:B3D0:C503 (history)
Alfredo1066 (talkcontribs)

Thank you both for helping out.

It turned out that the problem arose because of Filezilla using the wrong charset. Once I forced Filezilla to use utf8 when communicating with my server (the wiki resides on a shared host), I was able to call up the file with the umlaut in its name. So, using utf8, I downloaded the whole of the image-directory of mw_1.26.3 and then uploaded all files into the new image-directory of mw_1.31.1.

I also cleaned the searchindex table by downloading it and then manually changing (notepad++) funny chars into umlauts. But, as you both suggested, this had no effect on the principal problem.

There is one weird behaviour still present: I get randomly logged out when editing. Initially I thought it related to the problem of this query, but not so.