Topic on Extension talk:Arrays

Multi-byte characters are not safe

4
Dinoguy1000 (talkcontribs)

Currently, the arrays extension does not properly handle multi-byte characters (though this may be a shortcoming of regex, I don't know):

Character: ナ
Size: {{ #len: ナ }}
Array size: {{ #arraydefine: array | ナ | // }}{{ #arraysize: array }}

results in

Character: ナ
Size: 1
Array size: 5

Note that attempting to use #arrayprint to view the array results in the parse of the entire page's contents being aborted.

Dinoguy1000 (talkcontribs)

rev:45734 may be relevant for fixing this bug.

Dinoguy1000 (talkcontribs)

Don't suppose this could get some attention?

Danwe (talkcontribs)

I guess we should use mb_split instead of preg_split here and perhaps other mb_ functions throughout the code. Not really familiar with those yet and unaware of unwanted side-effects.

Not really committing too much time to MW coding these days. If anyone wants to commit a patch for this - this should contain tests though! I am willing to review and merge. I am also available for hire to tackle this issue throughout the extension if anyone is willing to compensate for it.

Reply to "Multi-byte characters are not safe"