Topic on Project:Support desk

[RESOLVED] Pagetitle encoding problems when upgrading a wiki

8
Vivi-1 (talkcontribs)
88.130.104.212 (talkcontribs)

Hi!

I have once fixed that kind of error; it is a tidysome work. :-(

The result, which you must have in the end is: Your database uses UTF-8 or binary encoding and the data in your database is (in any case) UTF-8-encoded.

First you need to know what kind of encoding your system uses:

  • What is the encoding of the database, of the different tables and of all columns, which have an encoding (that means of columns holding "text" and so on as opposed to "integers")?
    • Are all these encodings identical?
    • They should all be either "utf8_general_ci" or "binary" - and all the same.
  • What you see in the screenshot is that the data in the column, which you posted, does in fact not use the encoding, which you provide as "encoding declaration" for that column.
    • So what you need to do is: If the column's content in fact is already UTF-8 encoded, then convert the encoding declaration for that column (not the content in it!) to UTF-8. That worked in some way by temporarily making the column a BLOB, then converting it (MySQL does not change the content of blobs when you convert the charset) and then set the column back to the type it had before.
    • If the content in the column in fact is not UTF-8-encoded, you need to convert the encoding of the content to UTF-8. I think that worked same as above, but without changing the affected columns to blob before.

I know there also are some scripts around

Vivi-1 (talkcontribs)

Hello,


So, I'm not on my main computer, so, can't test it on my test instalation :(

Interclassments are on utf8_bin or utf8_general_ci or ... latin1_swedish_ci, latin1_bin I (or maybe another dev who can try it faster than me) will try your solution. (I can try tomorrow only ^^)

Vivi-1 (talkcontribs)

Hi,

I have try this :

ALTER TABLE `archive` CHANGE `ar_title` `ar_title` BLOB;
ALTER TABLE `archive` CHANGE `ar_title` `ar_title` TEXT CHARACTER SET utf8 COLLATE utf8_bin;

And MySQL respond:

#1170 - BLOB/TEXT column 'ar_title' used in key specification without a key length
(
88.130.88.216 (talkcontribs)

Hi,

I am not really a MySQL expert, but maybe the script here helps you. It shows you the logic, which you have to go through. That should be a help to understand, what is needed. The script is written for a database, which is used by TYPO3; this explains the first few lines, in which the script tries to get the MySQL credentials. These first lines cannot be transferred 1:1 to your situation, but for your needs, you could try setting the variables in the script directly. Maybe it then even works for you without the need of doing much additional manual work. :-)

Vivi-1 (talkcontribs)
echo utf8_decode($data);

I feel stupid.

I'm making a script to correct all the affected collumns in the DB.

If you have any problem, have a look to : https://github.com/neitanod/forceutf8

Vivi-1 (talkcontribs)

correcting (the test) DB doesn't seem to be a good idea :|


I'm lost. And I have no idea why There is problem.

When accessing DB with PHP, the data are show without error. But MW as a problem with the title name in pages.

Vivi-1 (talkcontribs)

AANNNDDDD : SOLVED :D

<?php
namespace ForceUTF8;
error_reporting( -1 );
ini_set( 'display_errors', 1 );
ini_set('default_charset', 'utf-8');

require_once('Encoding.php');
$handle = opendir('./sql/');
$dirIgnore = array ('.','..');

while ($file = readdir($handle)) {
    if (!in_array($file, $dirIgnore)) {
        echo 'Encoding '. $file ."\n";
        file_put_contents('./sql_utf8/'. $file, Encoding::fixUTF8(file_get_contents('./sql/'. $file)));
    }
}

?>

You need : https://github.com/neitanod/forceutf8

The source .sql files should be in ./sql/ and the encoded files in ./sql_utf8/

I reccomand to use one file per table.

Please, note that this is MY solution, it's not the most powerfull ... it can kill cats, burn your computer ...

Reply to "[RESOLVED] Pagetitle encoding problems when upgrading a wiki"