UTF8 Problem with update from 1.16 to 1.17
Thank you so much Jirka!
I spent 2 hours searching for a solution for the umlaut problem - until I found this thread.
I successfully updated my MediaWiki from 1.16.2 to 1.19.1.
A short explanation for those who might be still confused about what to do:
- Navigate to the old database in phpMyAdmin
- Click on the SQL button (between structure and search)
- Enter the following and press OK: ALTER TABLE page CONVERT TO CHARACTER SET latin1 COLLATE latin1_bin
- Export the old database as gzip file
- Create a new database and import the gzip file
- It's necessary to update the old database. Navigate to yoursite.com/mediawiki/mw-config/index.php and follow the instructions.
These steps worked for me.
It was sufficient to only alter the page table - not more. The umlaute are shown correctly.
Is it necessary to turn the page table (of the new database) to binary??
It didn't change anything.
But here is how to do it:
- Navigate to the new database in phpMyAdmin
- Click on the SQL button
- Enter the following and press OK: ALTER TABLE page CONVERT TO CHARACTER SET binary
> Is it necessary to turn the page table (of the new database) to binary??
No, it is not. What you must have in the table is data, which is UTF-8 encoded. This is important, because MediaWiki always writes data as UTF-8 (and always expects to get back UTF-8 encoded data from the DB). You can either use UTF-8 encoding or binary encoding (in this case MediaWiki will just store its UTF-8 data in the binary tables, which is fine as well).
Thanks for your reply.
If I understand correctly you shouldn't let the table latin1 encoded. So in our case it's necessary to turn the table page back to binary or UTF-8. Please correct me if I'm wrong...
> If I understand correctly you shouldn't let the table latin1 encoded.
> So in our case it's necessary to turn the table page back to binary or UTF-8.
Basically you are right: Your tables should either be set to be UTF-8 encoded or binary. But be careful: You must make sure that, when you switch your tables to UTF-8 (or to binary), you in fact have UTF-8-encoded data in them. You generally can store data, which has any encoding, in a DB table with any - possibly other - encoding. If you did that, you would run into strange problems, e.g. as described at the beginning of this thread.
Thanks for the information.
I realized that dealing with only one table (page) wasn't enough. Category titles and user pages with umlaute were broken too. So I decided to run "ALTER TABLE tablename CONVERT TO CHARACTER SET latin1 COLLATE latin1_bin" on all tables. Then I exported and imported.
After that I ran "ALTER TABLE tablename CONVERT TO CHARACTER SET binary" on all tables. About 5 tables (among them "searchindex") didn't accept the character set binary and showed to have latin1_swedish_ci (I don't know why!?). Anyway, I took those 5 tables and assigned them binary via the operation button -> collation, which worked.
I hope this is all okay?
Wiki looks fine now in all places :)
I can't tell you how exactly you have to CONVERT TO CHARSET your data. It depends on what character set you actually had in the database. When you end up with UTF-8 after your operation, it's all fine.
The tables, which did not accept binary encoding really should not be binary, but UTF-8. If I remember correctly, there actually was a reason to keep these tables in UTF-8. After all it does not really matter, if you use UTF-8 or binary, because MediaWiki will provide UTF-8 data anyway, so using UTF-8 won't make something break.
This was awesome, it really worked for the page's titles but I'm still having the problem with the images that use special characters...they don't display, not even in the images list!...I did everything you said in the images table but nothing seems to work... Any idea?
I'm afraid I can't really help you.
The first time I updated my DB (for test reasons) everything worked fine - also images with special chars were shown properly.
The second time I updated my DB (now the real version) I had an issue with (only) one image (containing "ü"). I reuploaded the file and then it worked...