Extension:Transliterator

An extension to allow transliteration based on ad-hoc schemes stored in the wiki's MediaWiki space (by default under the [ [Mediawiki:Transliterator:]] pseudo-namespace.

Usage on pages
Most people will want to use this

Which will generate the transliteration of based on the map found in [ [MediaWiki:Transliterator:&lt;mapname>]].

Formatting output
The extension supports several extra parameters to help template-authors integrate this easily into their code. At a basic level, it allows customisation of the output using the third parameter. Will output nothing if there is no page at [ [MediaWiki:Transliterator:&lt;mapname>]], or " " if the map exists. This should allow template-authors to avoid doing some form of {{#if: statement to see whether a transliteration can be created.

User-supplied transliterations
The fourth parameter allows you to set a user-override on the output of a transliteration. This has two uses, one where the transliteration that is generated is incorrect, and two, where the map does not yet exist for a language. Will output if } is not blank. If is blank and [ [MediaWiki:Transliterator:&lt;mapname>]] exists, it will output  as before. And if is blank, and [ [MediaWiki:Transliterator:&lt;mapname>]] does not exist, it will output nothing.

Failure to transliterate
The final parameter allows for an "error" message to be displayed instead of a blank output in the two cases above. This can be useful where the transliteration is mission-critical, but should be used sparingly.

Creating maps

 * out of date - should reflect changes to case-sensitivity

A map is simply a correctly formatted page in the [ [MediaWiki:Transliterator:]] pseudo-namespace. Each line of these files should either be a mapping of  or a comment   ( The first line may differ, see the advanced note below ). All whitespace characters are ignored, so  is exactly the same as. Additionally, HTML entities in the form of &#&lt;code>; are decoded, so  could also be written.

The algorithm for processing these maps repeatedly finds the longest possible mapping closest to the start of the word. This means that if you have the rules a => b aa => c ae => d Then you will find that gives you "bb" gives you "crdvbrk" gives you "ce" "de"

By default any characters that are not within the map are passed through unchanged - this prevents too much loss of information, however it may make errors harder to spot. By specifying a rule with a blank &lt;left>, you can provide a sequence to be used for each character that does not match. For example: a => b b => c  => ? Will cause to give you "bc" to give you "bcb???"

In addition to matching on the letters in the input, the symbols  and   can be used to specify beginning and end of a word. (In practical terms, each phrase as it arrives to be transliterated has these symbols added, and they are then removed from the final transliteration. i.e. ). This is to provide support for languages that have different rules depending on where in the string the symbols appear. For example: a => b ^a => c a$ => d Will cause to give you "rbr" to give you "cbbd" to give you "cs Evd cbn be c pbin" (note that you could catch the standalone " a " with a rule like "^a$")

ADVANCED: By default this extension treats characters in fully combined chunks (slightly more combined than just NFC); this means that if you specify the rule , it will not convert a q́ with a combining accent (even though it has no single NFC codepoint). If you are dealing with a language where you want to treat the letters and accents independently, or for other situations where the number of rules can be significantly reduced (such as in Korean transliteration), you can use  in the first non-comment line of your map. It will then treat all data in terms of NFD codepoints.

Possible errors
All of these error messages appear at the place which is invoked. The maps are not parsed when they are saved.


 * Ambiguous rule  in [ [MediaWiki:Transliterator: &lt;mapname>]]
 * This is caused when a map contains two rules with the same content on the left of the =>. This can never be correct, as it would leave the Transliterator to make an impossible decision as to which right-hand-side to replace the left-hand-side with.


 * Invalid syntax  in [ [MediaWiki:Transliterator: &lt;mapname>]]
 * This is caused by a line that contains no "=>" and that does not start with a "#", The parser cannot decide whether you meant it to be a comment, but forgot to say, or whether you meant it to be a rule and got it wrong, so it asks for confirmation.


 * More than  rules in [ [MediaWiki:Transliterator: &lt;mapname>]]
 * In order that this extension doesn't create massive maps that could potentially consume the server's memory, it limits itself in size. The limit in number of rules is configurable as below. There is no real solution to this problem, unless you work out a better set of rules (with some multi-character sequences there are ways of using the longest-first property to leave out some repetitious rules).


 * Rule  has more than   characters on the left in [ [MediaWiki:Transliterator: &lt;mapname>]]
 * Due to the algorithm used to transliterate, having long rules on the left both increases the size of the map, and increases the maximum time that may be taken in transliteration. If you find yourself wanting to break this limit, the chances are that your language cannot be transliterated automatically.

Advanced customisation
A synonym for the call can be added by editing [ [MediaWiki:transliterator-invoke]], if you customize this message the original  will still work.

The namespace in which maps are put can be customized. By default it is "Transliterator:", but if you'd prefer a different place, edit [ [MediaWiki:transliterator-prefix]]. It is not possible to move the maps outside of MediaWiki (and the chances are that doing so would be a bad idea anyway). NOTE: if you edit this message, all of your maps will need to be moved - so it is likely that once you have started using the extension you don't want to change it.

The global variable, by default 255, specifies the maximum number of entries in a mapping; while  , by default 10, specifies the maximum length of the left hand side of the rule. These are totally arbitrary limits, and it may be the case that different bounds work better for you.

For the interested: The absolute maximum size of the lookup table created for each map is bounded by O($wgTransliteratorRuleCount^2 * $wgTransliteratorRuleSize + the size of the map page). The absolute maximum number of operations to transliterate something is O($wgTransliteratorRuleSize * input length). These are worst case and unlikely to appear in practice, particularly as most transliteration schemes deal with individual letters or digraphs.