Localisation/Tutorial

Internationali[sz]ation/locali[sz]ation (i18n/l10n) tutorial (for use in Pune hackathon 2012) (will eventually sync up with https://www.mediawiki.org/wiki/Localisation/Tutorial ) First, give a general intro to the i18n problem, explaining where we are. We'll be going over each of the major extensions that i18n has developed, breaking down where functionality lives regarding input, output, and searching. Then we will give the participants an exercise, probably "add a new keymapping to Narayam".

General principles and goals
We're using the standard Unicode UTF-8 character set. (Wikipedia was one of the first major website to adopt Unicode for everything.)

We are using other standards whenever possible: CLDR, ISO 639, etc.

We always use open source software and open source fonts.

We're building a software stack which is open source and reusable for the web.

We need to localise the MediaWiki experience for all the ways people edit and read Wikimedia content. So, we have to support the desktop web and mobile web (tablets, feature phones, smart phones) + should also be ready for offline (PDF, Kiwix) and also Print

A technical overview of the different i18n/l10n capabilities of MediaWiki: http://www.mediawiki.org/wiki/Localisation

The concept of message in MediaWiki is very central. It's important not just for localization work, but for understanding a lot of things in MW in general

The l10n team: http://www.mediawiki.org/wiki/Internationalization_and_localization_tools If you want to help translate MediaWiki and its extensions, see https://translatewiki.net for more documentation and our TODO lists: http://translatewiki.net/wiki/Issues_and_features

Localising
Now we'll show you Wikimedia localisation from a tools perspective and from a user perspective, including overviews of the major extensions.

Input
(open source keymaps for different languages) Extension: Narayam http://www.mediawiki.org/wiki/Extension:Narayam If typing in a language is not well-supported in common operating systems, we are trying to provide an input method right in the website. For the key mapping we usually choose the standard keyboard for a language, but if there is demand, we find other keymaps which are used and add support for those too. Identifying 3 top most used keymaps and then mapping to a browser (on screen) keymap Currently we have: 14 Indic Languages (Lohit family), Hebrew, Arabic (Urdu), Cyrillic (minority language of Russia), some European languages, Berber and others. Future directions / external ideas:
 * also see https://www.mediawiki.org/wiki/Help:Extension:Narayam
 * (Comment from Amir: it doesn't really belong in the tutorial, it's probably the meeting summary. I kept the links as bookmarks.)

Arabic script (for Persian and Arabic) On-screen keyboard: http://behdad.org/editor/

Korean - Hangul, phonetic, fonts are glyphs - 2 or 3 chars create a glyph

Automata for English keymap to Korean, implemented with JavaScript: http://ohi.kr/

Output
Extension: Webfonts The css3-based technology allows us to use delivering required font along with an html page. This is important since for many non-latin languages, we cannot assume that users have installed required fonts, or they know how to get fonts and install.
 * http://www.mediawiki.org/wiki/Extension:WebFonts
 * also see user documentation at https://www.mediawiki.org/wiki/Help:Extension:WebFonts

Translation tools
Translate - for translation
 * http://www.mediawiki.org/wiki/Extension:Translate
 * also see https://www.mediawiki.org/wiki/Help:Extension:Translate

Babel
http://www.mediawiki.org/wiki/Extension:Babel We initially used this to indicate the languages that the user knows. Now it also tries to include CLDR data from the Unicode Common Locale Data Repository.

Exercise
Siebrand suggested that the exercise could be developing a key mapping for Narayam, per https://www.mediawiki.org/wiki/Extension:Narayam#Developing_a_key_mapping Step-by-step instructions on how someone develops and adds a key mapping/scheme: Exercises/explorations from a past session of this workshop: One of Alolita's students was Korean, so she worked with him to show him what Korean fonts were available and to see how one would add them to add to WebFonts. She suggested that we could do the same with translate. Then she wrote a simple unit test for one of the i18n extensions, with Jeremy. Hands-on exercise: something involving looking at Narayam, taking the example of a specific language, and walking through webfonts?
 * 1) Find a key mapping for the language you want to add. For a lot of Indian scripts, an InScript key mapping is available.
 * 2) Add it to Narayam.php:
 * 3) In the array $wgNarayamSchemes, add your mapping similar to the other ones:
 * 'xyz' => array( 'ext.narayam.rules.xyz', 'beta' ),
 * 'xyz' => 'ext.narayam.rules.xyz', if you know the language well and have tested it thoroughly
 * 1) If there are existing key mappings for the language, call it 'ext.narayam.rules.xyz-name' where 'name' is e.g. inscript
 * 2) Register the interface message that will show up in the menu, in the 'messages
 * 3) If you have set it as a beta mapping and are testing it on an own wiki, make sure to set $wgNarayamUseBetaMapping to true

Antoine's exercise: idea: i18n messages.
Basic API documentation is in the Message class docblock. Our documentation is regenerated daily from trunk at http://svn.wikimedia.org/doc/. The Message class general documentation is available at https://svn.wikimedia.org/doc/classMessage.html#_details. Though it needs improvement, it is a good start to have an overall overview about the messaging system.

Presentation history
Alolita originally gave this on Jan 21, 2012 to Alolita Sharma, Chulki Lee, Shervin Afshar in San Francisco. All in all, this took about 75 minutes, including Q&A. To be presented in Pune on 10 Feb 2012.