Wikimedia Language engineering/Pune LanguageSummit November 2013/Event Report

The Fall 2013 edition of the Open Source Language Summit was held in Pune, India on 18-19th November 2013. The event was organized by Wikimedia Foundation’s Language Engineering team along with Red Hat at the Red Hat engineering center.

Participation
Wikimedia Language Engineering, VisualEditor and Mobile teams as well as language technology team members from Red Hat, Google, Microsoft Research, [Adobe, Mifos and open source developers from various Open Source communities including Swathanthra Malayalam Computing, Ankur India, IndLinux, Fedora, Debian, Wikipedians from various Indic language communities as well as Google Summer of Code students participated in the work sprints at the 2 day summit.

Sessions
During the 2 days of the event, collaborative work-sessions were conducted for improvements in cross platform language support, desktop and web fonts, input methods, on-screen keyboards, content translation, and language aids like dictionaries, glossaries. Methods and tools for testing internationalized web applications were also discussed. Extensive hands-on sessions were also held to extend FUEL terminology word-lists.

Sessions Details : Fonts
Sessions held included:
 * Session Name: Indic Font Specification
 * Session Name: Autonym Font
 * Packaging Fonts
 * Updating Lohit2 fonts to conform with the new Open Type spec for Indic scripts
 * Q & A with Behdad Esfahbod: State of the Union on Harfbuzz

Several work-sessions focused on improving coverage of available fonts across desktop, web and mobile platforms. The latest freely licensed Aksharyogini font for Devanagari was presented and technical improvements were discussed by participating font experts. Santhosh Thottingal presented the recently released Autonym font that was created by the Wikimedia Language Engineering team to simplify display of language names on Wikimedia websites. During the sessions, webfonts not available in Linux distributions like Fedora and Debian were identified and submitted for packaging, , , ,. This would significantly improve native font support and complement webfonts for multilingual web content on Wikipedia pages. Kartik Mistry, Pravin Satpute, Vasudev Kamath and other participants will be following up on the bugs filed during the sessions.

At the Language Summit held earlier this year, Santhosh Thottingal and Pravin Satpute had initiated a project to document the technical specifications of fonts for India’s language scripts. The project - named Fontbook, was based on the Open Type font specification. The specification consists of sections common to the scripts as well as sections specific to each script. These sections were expanded and recent recommendations from organisations like W3C and TDIL were discussed for inclusion. The project has been moved to a public repo on github and participants from more Indian languages are being invited to contribute. Over the next few months, the specification will be extended to at least 8 Indian languages.

Pravin Satpute and Kartik Mistry led the work-sessions on applying technical specifications of the Lohit font family to other Indian language fonts such as Samyak. The Lohit font-family, used as the primary font for a large number of Indian language scripts for Fedora and Red Hat Enterprise Linux, has been significantly tweaked over the years to seamlessly render the complex Indian scripts across platforms. During the Language Summit, Pravin Satpute and Sneha Kore presented on their work for the next version of the Lohit font family to comply with the latest Open Type 1.6 specification and with the Harfbuzz-ng rendering engine which will make modifications applied for previous versions redundant. It is expected that this effort will complement the extended specification to be accomplished through the Fontbook project.

Lead developer of the Harfbuzz project, Behdad Esfahbod joined in remotely and presented on the current level of font rendering and support on Chrome, FirefoxOS and Android. Esfahbod envisions better support for webfonts and discussed cross-platform testing practices for Indian scripts, especially with large volume of content like Wikipedia pages.

Sessions Details : Input Methods & Onscreen Keyboards
Session List:
 * Input Methods on VisualEditor (includes jQuery.ime integration)
 * Keyboard layout Images for documentation of input methods
 * Onscreen Keyboards
 * ibus-typing-booster - predictive text typing system

Work-sessions on input methods were focused on various aspects like onscreen keyboards, predictive typing, and input method help. A separate session focused on improving input on the Visual Editor for non-latin scripts with feedback from implementation so far.

Interaction Designer Pau Giner and Google Summer of Code student Praveen Singh hosted a 6-8-5 ideation session on gathering ideas to make on-screen keyboards more useful. Suggestions from the audience included a transliteration typing tool, addition of context menus to enhance spell checkers and dictionaries. Earlier, Praveen Singh showcased on-screen keyboards for the jquery.ime library, kicked off as a Google Summer of Code project earlier this year mentored by Santhosh Thottingal. Ideas gathered during this ideation session are expected to provide new feature guidance for the jquery.ime on-screen keyboards currently in development.

Parag Nemade led the session to create images of input method layouts using a script written in Python. The script currently works only for layouts that follow a one-to-one character mapping of keys. During the session, input methods in the ibus and jquery.ime libraries, that presently miss layout images and can be mapped through the script, were identified and created. Discussions during the retrospective revealed that current users do not have a quick way to provide feedback about their experience while using input methods. Over the next few weeks, more help images will be created and the Python script will be modified to extend this functionality for input methods not using the 1:1 key mappings.

Anish Patil, from the Red Hat internationalization (i18n) team showcased the indic-typing-booster, a predictive typing method developed and maintained by the i18n team. Several bugs were identified during the tool walkthrough. Anish also walked through the web-word-edit project through which a list of words can be curated and validated for use in systems that rely heavily on suggestions from large word lists. The indic-typing-booster currently uses the word-lists used by the Hunspell dictionaries and the web-word-edit is an effort to improve the typing predictions. The project is available at http://webwordedit-wwe.rhcloud.com/ and is open for participation.

The session led by David Chan and Santhosh Thottingal on enabling more input methods on the VisualEditor was one of the highlights of the first day. The session brought together engineers from the Wikimedia Language Engineering and Visual Editor teams. Indian language Wikipedians who have been providing significant feedback since the VisualEditor was enabled on Wikimedia websites also contributed. David demonstrated the event logger system built for capturing IME input events which is being used as an automated IME testing framework available at http://tinyurl.com/imelog to build a library of similar events across IMEs, OSs and languages. Santhosh stepped through several complexities of handling input to support the VisualEditor’s inherent need to provide non-native support for special handling of language content blocks within the contentEditable surface which are tough use cases. He also walked through how jQuery.ime can support VisualEditor’s needs, as it does not operate differently for each script, operating system or browser. This was followed by a brainstorming and ideation discussion during which possibilities of using onscreen keyboards, predictive typing, handwriting recognition, dictionary based auto-completion, special support features for Indic scripts on VE. Issues surrounding use of Indic languages like bilingual use etc were also highlighted. David made a call for more participation through the IME log form to collect special use cases and will also gather more insight by learning from the ibus system with Parag Nemade’s help.