Wikimedia Language engineering/Pune LanguageSummit November 2013/Event Notes

= Open Source Language Summit - November 2013 =


 * Schedule: http://open-source-language-summit-2013.shdlr.com/grid
 * Twitter: Hashtag #languagesummitpune
 * IRC: #mediawiki-i18n on FreeNode

Session: Input Methods on VisualEditor (includes jQuery.ime integration)
Notes 
 * David Chan leading; sets off introductions from everyone round-table
 * Santhosh introduced jQuery.IME and explained what it is for, why it was built
 * David outlined how bug-filing helps - the importance of very specific version numbers, exact keystrokes to fire the IME, and expected and observed behaviours, and the problems facing comprehensive IME support
 * David demonstrated the EventLogger system capturing IME input event streams, giving detailed run through of several IMEs and the events that they can create
 * David showed the draft automated IME testing framework he has built for VisualEditor and explained his intention to build a library of as many languages, IMEs and OSes as possible to test them.
 * Santhosh discussed how jQuery.IME can help simplify the needs in VisualEditor because it doesn't operate in a different way in each script/browser/OS
 * Santhosh demonstrated problems like multiple different conflicting numbers (e.g. cursor positions vs. key strokes vs. Unicode code points vs. backspace positions)
 * Santhosh returned to the reasons why IME difficulties are an issue for VisualEditor, due to the need to do non-native programmatic management of the contentEditable surface to support generated content blocks like images or templates
 * Pau asked about the relative value of on-screen keyboards, predictive type, spell-checking, hand-writing recognition etc.

General discussion about possibilities and requirements from Indic scripts
 * Particular requests for VisualEditor
 * Support for native IMEs – especially for users with Windows as their OS
 * In-built IME in VE (e.g. expectations of auto-convert on space/save)
 * Auto-completion based on dictionaries

Volunteer language experts for Indic languages
 * Samyak Bhuta, for Gujarati, samyak.bhuta @ gmail dot com
 * Vijay Languages Marath,Hindi,Sanskrit,Nepali,Ahiranii mahitgar at yahoo dot co dot in

Retrospective: (David)
 * Good mix of participants (technical and non-technical Wikipedians, OSS contributors)
 * Brainstormed about handling complexities of input tools for Indic languages, trapping keystrokes, event ordering, DOM model, event logger tool
 * Log submission now available, please contribute! URL: http://tinyurl.com/imelogform
 * URL: https://bit.ly/ve-eventlogger, https://bit.ly/ve-imefeedback
 * Submissions for Indic language IMEs are especially welcome
 * OSKs vs Latin keyboards advantages/disadvantages
 * Learnt a lot of Indic languages - bilingual usage, code switching, switching across languages, Issues around ime usage
 * Santhosh - identifying the problem definitions, patches in progress
 * Abhijit - highlighted cross-browser, cross-platform differences; working with original core developers,
 * David - in developing for ibus - why are event sequences diferent? may not be possible for languages (HPN)
 * OSK - T9 input optimized for mobile usage - standardized (Hari)

Session: Cross project coverage for basic language support components
Notes:
 * Showcasing the Language Coverage Dashboard
 * http://tools.wmflabs.org/lcm-dashboard/lcmd/
 * https://github.com/wikimedia/lcm-dashboard


 * Desktop Support Requirements: (Pravin Satpute talking about the Fedora world)
 * Character Encoding
 * Fonts
 * Shaping Engines
 * Input Methods
 * OS Level Support
 * Locale Definition (CLDR)
 * Minimum Criteria for Language Support
 * If an ISO code does not exist, the language cannot be used on the desktop


 * Desktop Enhancements
 * Plan to check the language coverage in WMF projects for standardized ISO recognised language and assess coverage for Desktop language support]

Retrospective (Runa)
 * Overview of what the GSoc team developed, features developed and demo'ed, plans for future visualizations and features
 * Fedora desktop support features as use case for LCMD (ISO-less languages are not handled for desktop)
 * Extending for Fedora desktop
 * Suggestion from Hari Nadig - data from LCMD can be used through a Mediawiki extension for Indic language Wiki projects to show some stats for Indic language projects (which is being developed
 * Will evaluate for Fedora desktop and implement (next step)
 * Is there an option to contribute instead of forking CLDR
 * CLDR - contributing to it instead of forking - experts should review

FUEL color module:
http://etherpad.wikimedia.org/p/osls-2013-fuel-colors

FUEL date and time module:
https://etherpad.wikimedia.org/p/osls-2013-fuel-date-and-time

FUEL number module:
https://etherpad.wikimedia.org/p/osls-2013-fuel-number

During the today's Language Summit (18th November, 2013), we discussed about the existing FUEL-colors module. It was observed that the current one is not so definitive and came up with following points:
 * we will follow the list of colors given in http://www.w3.org/TR/css3-color/
 * we will be creating two modules, fuel-colors-basic, fuel-colors-extended.
 * fuel-colors-basic: http://www.w3.org/TR/css3-color/#html4
 * fuel-colors-extended: http://www.w3.org/TR/css3-color/#svg-color
 * This is just a proposal. If you have any issues or suggestion let us discuss here.
 * We will be closing this discussion probably by 30th of November and of course we can extend this date, if the discussion is prolonged.

Retrospective: (Siebrand) (on all 3 sessions on FUEL through the day)
 * What is FUEL - Siebrand provided a short blurb on what FUEL is
 * Objective to make localizations more consistent
 * There 3 collections currently and 3 in progress (color, number, date/time)
 * Colors discussion:
 * 250 colors taken from Wikipedia categories were reviewed, (xkcd ref: 15 instead of 14 standards - yet another standard?)
 * Instead of trying to create a new collection, reusing is better - looked at W3C CSS standard - 131 colors
 * Cultural bias in defining collection colors? Do we need to remove this cultural bias?
 * Or have the standard changed?
 * Name of the color should be localized not re-invented


 * DTTM discussion:
 * Interesting discussion, CLDR has a few flaws - have to pay money to vote on what goes into the collection
 * Paid members choose from contributions
 * FUEL strategy is to create a new standard; inconclusive discussions
 * Few options on the table:
 * Fork CLDR
 * Work with CLDR and find ways to collaborate
 * Create a competing standard
 * Will be discussed on mailing list - progress - as to what to do next and how; Siebrand will send this email


 * Number discussion:
 * List was created with 1-100, ordinals
 * Part of this collection was out of scope for FUEL
 * Translators localizing numbers may not be useful
 * Ordinals could be used as adjectives so is not as easy as it looks
 * Rajesh will send an email to the FUEL mailing list and then decide how to fulfill those functional requirements

Session Name: Keyboard layout Images for documentation of input methods
Notes
 * Latest inscript2 keymap images are captured and saved at
 * WMF input method documentation
 * Fedora Keyboard Help documentation


 * Languages that Need Help Documents:
 * Image Generation script: Python script that takes keymap filename input and shows mappings in UI.
 * This works only for 1:1 mapping keymaps

Retrospective notes: (Parag)
 * Documentation done but to be uploaded
 * TWN - feedback can be provided through Ask a Question?
 * WMF - wikis - there is a huge problem directing these user questions, no centralized system to process comments from users (Siebrand)
 * We do not have a specific feedback method at the moment other than the talk page. (Pau)

Session Name: Leveraging content translation platforms for Indic languages
Notes
 * Microsoft Research
 * Translation platform demo
 * Discussion on various content translation components in MSFT and Google
 * Web based data is key to training MT engines

Session Name: Updating Lohit2 fonts to conform with the new Open Type spec for Indic scripts
Notes
 * Presentation on Idea behind lohit2 (http://pravin-s.blogspot.in/2013/08/project-creating-standard-and-reusable.html)
 * Depth discussion on Adobe Glyph Nameing guidelines and problems
 * Demonstration on Kannada Work done by Aravinda (https://aravindavk.in/blog/improving-kannada-fonts/)
 * Sneha presented on Process followed for Lohit2 Devanagari, Gujarati
 * Santhosh presented on GSoc automated testing project.

Retrospective (Sneha)
 * Session on Lohit 2 improvements
 * Adobe glyph list - clarifying doubts
 * Aravinda - talked about Kannada block - script specializations
 * Sneha - Walkthrough of development process for Lohit
 * Santhosh - walked through automated testing process

Session Name: Packaging fonts
 Notes
 * Fonts available in Debian
 * Fonts available in Fedora
 * Packaging as much as fonts in Debian, Fedora and other distribution so that it won't load as 'webfonts' (61 fonts in repository) when use is accessing Wikipedia pages.
 * Debian/Fedora
 * Compare fonts in ULS, Debian and Fedora (see links above).
 * Package missing fonts for Debian/Fedora.


 * ULS
 * Write automated 'New upstream' check for ULS.
 * Update to new upstreams: https://gerrit.wikimedia.org/r/#/c/96008/


 * Fedora bugs filed:
 * https://bugzilla.redhat.com/show_bug.cgi?id=1031587 (tharlon-fonts)
 * https://bugzilla.redhat.com/show_bug.cgi?id=1031588 (phetsarath-fonts)
 * https://bugzilla.redhat.com/show_bug.cgi?id=1031603 (tuladha-jejeg-fonts)
 * https://bugzilla.redhat.com/show_bug.cgi?id=1031569 (cdac-sakal-marathi-fonts)


 * Debian bugs filed:
 * http://bugs.debian.org/cgi-bin/pkgreport.cgi?tag=languagesummitpune2013;users=debian-in-workers@lists.alioth.debian.org

Retrospective (Kartik)
 * Packaging consistencies across Fedora, Debian, Wikimedia
 * 61 fonts in ULS repo - checked in Fedora or Debian - if missing adding these fonts to Fedora and Debian - Vasudev
 * Aksharyogini and Sakal Marathi and Meera Tamil are getting added
 * Defining mechanisms to maintain fonts so how can we automate process (Kartik will work on this)
 * Fedora and Debian have mechanisms to automately check

Session: Identify and document the sources of free licensed bilingual dictionaries
Notes


 * Mediawiki Page
 * Is there any free licensed licensed bilingual dictionaries?
 * freedict: is client/server model 'dictd' protocol.
 * freedict only available for Hindi (from Indic languages).
 * Artha: http://artha.sourceforge.net/

APIs
 * No 'well defined' Wiktionary API: will take many months to have it with wikidata.
 * Write or use API where it can be available.
 * GujaratiLexicon.com API: Kartik/Samyak to work.

Retrospective * Created an useful document on mediawiki.org

Session: Q & A with Behdad Esfahbod: State of the Union: Harfbuzz - Font rendering for Chrome, Android

 * Santhosh: How to do testing better for Indic scripts
 * Windows - testing is key - Behdad tests all bug fixes on Windows
 * Open Type support on IOS
 * Apple has full open type support now (google is more competitive w msft than appl is since it wants feature compatibility)
 * Firefox has been shipping with Harfbuzz on every platform
 * Jonathan Kew is working on testing infrastr for Windows
 * Webfonts - my vision for the web - a font file should run on every web browser
 * Mobile web fonts - noto was designed to support this use case
 * Open Type Spec for Google Noto fonts