Thread:Project:Support desk/Is there any updated documntation about the actual implementation of **utf-8 / Unicode normalization** in MediaWiki?/reply

You can always read the source code :P

There are a variety of ways unicode normalization can be done depending on your configuration. In general we will use php extensions written in C if available, since they're fast (this includes the icu extension, the intl extension.intl is included by default in newer php, so is almost always available). Otherwise we will use a fallback php implementation.

Some additional normalizations are done based on content language for Malayalam and Arabic. This is done per-content for performance reasons, but is kind of evil (in my opinion). The ml normalization caused some problems with external urls that used the non-normalized versions of characters, otherwise I'm not aware of any open issues, but I am not an expert on the subject by any means.

See manual:$wgAllUnicodeFixes, manual:$wgFixMalayalamUnicode, and manual:$wgFixArabicUnicode (Note, those are for additional normalizations. converting to normal form C is done regardless)