Localisation/Tutorial

Internationali[sz]ation/locali[sz]ation (i18n/l10n) tutorial


 * For use in Pune Hackathon Feb 2012

First, we give a general intro to the i18n problem, explaining where we are. We'll be going over each of the major extensions that i18n has developed, breaking down where functionality lives regarding input, output, and searching. Then we will give the participants an exercise, probably "add a new keymapping to Narayam".

General principles and goals
How we do it:
 * We're using the standard Unicode UTF-8 character set. (Wikipedia was one of the first major website to adopt Unicode for everything.)
 * We are using other standards whenever possible: CLDR, ISO 639, etc.
 * We always use open source software and open source fonts.
 * We're building a software stack which is open source and reusable for the web.
 * We need to localise the MediaWiki experience for all the ways people edit and read Wikimedia content. So, we have to support the desktop web and mobile web (tablets, feature phones, smart phones) + should also be ready for offline (PDF, Kiwix) and also Print
 * A technical overview of the different i18n/l10n capabilities of MediaWiki: http://www.mediawiki.org/wiki/Localisation
 * The concept of message in MediaWiki is very central. It's important not just for localization work, but for understanding a lot of things in MediaWiki in general.  See the Message class general documentation, available at https://svn.wikimedia.org/doc/classMessage.html#_details
 * The l10n tools team: http://www.mediawiki.org/wiki/Internationalization_and_localization_tools
 * The translation team: https://translatewiki.net

Localising
Now we'll show you Wikimedia localisation from a tools perspective and from a user perspective, including overviews of the major extensions.

Input
(open source keymaps for different languages)

Extension: Narayam -- http://www.mediawiki.org/wiki/Extension:Narayam Future directions / external ideas:
 * also see https://www.mediawiki.org/wiki/Help:Extension:Narayam
 * If typing in a language is not well-supported in common operating systems, we are trying to provide an input method right in the website.
 * For the key mapping we usually choose the standard keyboard for a language, but if there is demand, we find other keymaps which are used and add support for those too. Identifying 3 top most used keymaps and then mapping to a browser (on screen) keymap
 * Currently we have: 14 Indic Languages (Lohit family), Hebrew, Arabic (Urdu), Cyrillic (minority language of Russia), some European languages, Berber and others.
 * (Comment from Amir: it doesn't really belong in the tutorial, it's probably the meeting summary. I kept the links as bookmarks.)


 * Arabic script (for Persian and Arabic) On-screen keyboard: http://behdad.org/editor/
 * Korean - Hangul, phonetic, fonts are glyphs - 2 or 3 chars create a glyph
 * Automata for English keymap to Korean, implemented with JavaScript: http://ohi.kr/

Output
Extension: Webfonts The css3-based technology allows us to use delivering required font along with an html page. This is important since for many non-latin languages, we cannot assume that users have installed required fonts, or they know how to get fonts and install.
 * http://www.mediawiki.org/wiki/Extension:WebFonts
 * also see user documentation at https://www.mediawiki.org/wiki/Help:Extension:WebFonts

Searching
https://www.mediawiki.org/wiki/Extension:Lucene-search is often-used, and https://www.mediawiki.org/wiki/Extension:MWSearch (here's the category: https://www.mediawiki.org/wiki/Category:Search_extensions )

Translation tools
Translate - for translation If you want to help translate MediaWiki and its extensions, see https://translatewiki.net for more documentation and our TODO lists: http://translatewiki.net/wiki/Issues_and_features
 * http://www.mediawiki.org/wiki/Extension:Translate
 * also see https://www.mediawiki.org/wiki/Help:Extension:Translate

Babel
See http://www.mediawiki.org/wiki/Extension:Babel. We initially used this to indicate the languages that the user knows. Now it also tries to include CLDR data from the Unicode Common Locale Data Repository.

Exercises
Developing a key mapping for Narayam, per https://www.mediawiki.org/wiki/Extension:Narayam#Developing_a_key_mapping

Step-by-step instructions on how someone develops and adds a key mapping/scheme:
 * 1) Find a key mapping for the language you want to add. This is a keyboard layout or list of Latin characters with their equivalent in the script of your language. For a lot of Indian scripts, an InScript key mapping is available. See for example these pages: https://fedoraproject.org/wiki/Special:PrefixIndex/I18N/Indic
 * 2) Add it to Narayam.php:
 * 3) In the array $wgNarayamSchemes, add your mapping similar to the other ones:
 * 'xyz' => array( 'ext.narayam.rules.xyz', 'beta' ),
 * 'xyz' => 'ext.narayam.rules.xyz', if you know the language well and have tested it thoroughly at the end
 * 1) If there are existing key mappings for the language, call it 'ext.narayam.rules.xyz-name' where 'name' is e.g. inscript or phonetic
 * 2) Register the interface message that will show up in the menu, in the 'messages' array in $wgResourceModules['ext.narayam.core']. It's best to use the same message key as the mapping name, 'narayam-xyz' or 'narayam-xyz-name'.
 * You can add the message key in Narayam.i18n.php
 * 1) Now add a $wgResourceModules array, similarly to the other arrays
 * 2) If you have set it as a beta mapping and are testing it on an own wiki, make sure to set $wgNarayamUseBetaMapping to true
 * 3) Now create the file resources/ext.narayam.rules.xyz.js (depending on the name of course)
 * 4) If you have a key mapping you can paste it there and convert it to a javascript array. At the bottom of the file you need to add  jQuery.narayam.addScheme. Both are explained on https://www.mediawiki.org/wiki/Extension:Narayam#Developing_a_key_mapping

WebFonts for a specific language
Idea: something involving looking at Narayam, taking the example of a  specific language, and walking through Webfonts and/or Translate? Example: look at what Korean fonts are available, show how you would add them to WebFonts and Translate.

Antoine's exercise idea: i18n messages
Basic API documentation is in the Message class docblock. Our documentation is regenerated daily from trunk at http://svn.wikimedia.org/doc/. The Message class general documentation is available at https://svn.wikimedia.org/doc/classMessage.html#_details. Though it needs improvement, it is a good start to have an overall overview about the messaging system.

That is reference documentation; another idea is to write some documentation at https://www.mediawiki.org/wiki/WfMessage%28%29 that helps the reader understand how to use this class to solve various problems.

This is a less likely exercise because it's ill-defined and because it doesn't directly help the student learn how to use important i18n tools.

Unit test for an i18n extension
Follow https://www.mediawiki.org/wiki/Manual:Unit_testing and write a unit test for an i18n extension. This is a less likely exercise because it requires that the student additionally learn how to use Selenium and the workshop may not have time for that.

Presentation history

 * Alolita originally gave this on Jan 21, 2012 to Alolita Sharma, Chulki Lee, Shervin Afshar in San Francisco. All in all, this took about 75 minutes, including Q&A.


 * To be presented in Pune on 10 Feb 2012. Edited by Siebrand, Sumana, SPQRobin, Antoine, and Niklas.