Wikimedia Language engineering/Reports/2015/May

Monthly updates from WMF Editing - Language team

Content Translation
Summary of WMF Language Engineering

Development Update

 * Reference tool card can copy a reference to source even if the tranlation is from scratch
 * Proper handling of wikis with different content language code and domain name code.
 * Many RTL fixes to get the tool ready for RTL wiki deployments
 * Links system is getting rewritten to meet upcoming complex usecases
 * Improvements to translation source selector. A page selector widget was developed to do prefix search with results listed with thumbnail images and brief description
 * One click beta feature enablement campaigns at new article creation workfows - both with wikitext and VE based article creation points.
 * Entry points for beta feature users - contributions menu, contribution page links, article creation, interlanguage links
 * Stats page getting new design since number of languages are too big to represent in table
 * Switched to RESTBase to fetch HTML pages. Publishing api switched to use ParsoidVirtualRESTService
 * Automatic linking of source and target articles in wikidata when translation published is coming.
 * Echo integration to notify translators on milestones and various other events are in progress
 * CXServer logging improved to log to logstash
 * Continuing collaboration with Apertium, more apertium language pairs are packaged and enabled in WMF apertium instance
 * Niklas, Santhosh and Pau wrote a paper on computer-assisted translation (CAT) system for Wikipedia, which was selected for European Association of Machine Translation conference (eamt2015.org). They presented it at Antalya, Turkey.

Development Update

 * We became aware of pressing issues in the Translate translation memory performance at translatewiki.net (in addition to known issues of missing suggestions at Wikimedia sites). Niklas will propose fixes based on the help of David Chan during the Lyon hackathon which could fix both issues.
 * Many bugs in Translate extension and MediaWiki core i18n were fixed during the Lyon Hackathon in addition to other cleanups during the month.
 * Translate web service framework was improved to support querying multiple services in parallel to increase response times
 * Some core Translate classes have been slightly refactored to be more test friendly to stop Translate unit tests failing intermittently.
 * TwnMainPage now shows powered by items.
 * Message group workflow selector was briefly broken on Special:Translate until the code was fixed

Usage Data

 * Translation rally increased MediaWiki language coverage (covered in Niklas's blog and Wikimedia blog)
 * MLEB was not released this month. Latest release, 2015.04, has been downloaded 92 times so far, possibly indicating a drop from the usual 150 downloads per release.

Deployments and other site related updates

 * Content Translation has been deployed in following Wikipedia during May:
 * Armenian (hy), Turkish (tr), Albanian (sq), Aromanian (roa-rup), Avar (av), Azerbaijani (az), Gagauz (gag), Kabardian (kbd), Karachay-Balkar (krc), Karakalpak (kaa), Maltese (mt), Ossetian (os), Abkhazian (ab), Ladino (lad), Mirandese (mwl), Romani (rmy), Crimean-Tatar (crh), Tagalog (tl), Cebuano (ceb), Waray-Waray (war), Ilokano (ilo), Kapampangan (pam), Zamboanga Chavacano (cbk-zam), Central Bicolano (bcl), Pangasinan (pag), Georgian (ka), Kashubian (csb), Rusyn (rue), Belarussian (be), Belarussian Taraškievica (be-x-old), Latvian (lv), Lithuanian (li), Latgalian (ltg), Bhojpuri (bh), Polish (pl), Hindi (hi), Aymara (ay), Gurarani (gn), Extremaduran (ext), Papiamento (pap), Swahili (sw), Somali (so), Shona (sn), Yoruba (yo), Amharic (am), Kabyle (kab), Wolof (wo), Igbo (ig), Northern Sotho (nso), Quechua (qu) Nahuatl (nah) and Lithuanian (lt), Slovak (sk), Estonian (et), Finnish (fi), Romanian (ro), Hungarian (hu), Serbian (sr), Croatian (hr), Bosnian (bs), Northern Sami (se), Samogitian (bat-smg), Veps (vep), Silezian (szl), Voro (fiu-vro), West Frisian (fy), Dutch Low Saxon (nds-nl), Dutch (nl).


 * Apertium Machine Translation support added for language pairs:
 * Basque -> Spanish
 * Catalan -> Occitan
 * English -> Galician
 * Portuguese -> Galician
 * Spanish <-> Aragonese
 * Spanish <-> Asturian
 * Spanish <-> French
 * Spanish <-> Galician
 * Spanish -> Occitan
 * Kazakh <-> Tatar


 * cxserver is updated to use RESTBase API for page fetch.


 * Campaigns: newarticle and cxstats campaigns are enabled in all wikis where Content Translation is deployed.


 * Use all languages in 'source' selector.

Cross team work/requirements

 * Pywikibot finished conversion of their i18n file format to JSON during Lyon hackathon with some assistance from translatewiki.net