Wikimedia Language engineering/Reports/2015/May

Monthly updates from WMF Editing - Language team

Content Translation
Summary of WMF Language Engineering

Development Update

 * Reference tool card can copy a reference to source even if the tranlation is from scratch
 * Proper handling of wikis with different content language code and domain name code.
 * Many RTL fixes to get the tool ready for RTL wiki deployments
 * Links system is getting rewritten to meet upcoming complex usecases
 * Improvements to translation source selector. A page selector widget was developed to do prefix search with results listed with thumbnail images and brief description
 * One click beta feature enablement campaigns at new article creation workfows - both with wikitext and VE based article creation points.
 * Entry points for beta feature users - contributions menu, contribution page links, article creation, interlanguage links
 * Stats page getting new design since number of languages are too big to represent in table
 * Switched to RESTBase to fetch HTML pages. Publishing api switched to use ParsoidVirtualRESTService
 * Automatic linking of source and target articles in wikidata when translation published is coming.
 * Echo integration to notify translators on milestones and various other events are in progress
 * CXServer logging improved to log to logstash
 * Continuing collaboration with Apertium, more apertium language pairs are packaged and enabled in WMF apertium instance
 * Niklas, Santhosh and Pau wrote a paper on Machine aided translation system for wikipedia, which was selected for European Association of Machine Translation conference (eamt2015.org). They presented it at Antalya, Turkey.

Deployments and other site related updates

 * Content Translation has been deployed in following Wikipedia during May:
 * Armenian (hy), Turkish (tr), Albanian (sq), Aromanian (roa-rup), Avar (av), Azerbaijani (az), Gagauz (gag), Kabardian (kbd), Karachay-Balkar (krc), Karakalpak (kaa), Maltese (mt), Ossetian (os), Abkhazian (ab), Ladino (lad), Mirandese (mwl), Romani (rmy), Crimean-Tatar (crh), Tagalog (tl), Cebuano (ceb), Waray-Waray (war), Ilokano (ilo), Kapampangan (pam), Zamboanga Chavacano (cbk-zam), Central Bicolano (bcl), Pangasinan (pag), Georgian (ka), Kashubian (csb), Rusyn (rue), Belarussian (be), Belarussian Taraškievica (be-x-old), Latvian (lv), Lithuanian (li), Latgalian (ltg), Bhojpuri (bh), Polish (pl), Hindi (hi), Aymara (ay), Gurarani (gn), Extremaduran (ext), Papiamento (pap), Swahili (sw), Somali (so), Shona (sn), Yoruba (yo), Amharic (am), Kabyle (kab), Wolof (wo), Igbo (ig), Northern Sotho (nso), Quechua (qu) Nahuatl (nah) and Lithuanian (lt), Slovak (sk), Estonian (et), Finnish (fi), Romanian (ro), Hungarian (hu), Serbian (sr), Croatian (hr), Bosnian (bs), Northern Sami (se), Samogitian (bat-smg), Veps (vep), Silezian (szl), Voro (fiu-vro), West Frisian (fy), Dutch Low Saxon (nds-nl), Dutch (nl).


 * Apertium Machine Translation support added for language pairs:
 * Basque -> Spanish
 * Catalan -> Occitan
 * English -> Galician
 * Portuguese -> Galician
 * Spanish <-> Aragonese
 * Spanish <-> Asturian
 * Spanish <-> French
 * Spanish <-> Galician
 * Spanish -> Occitan
 * Kazakh <-> Tatar


 * cxserver is updated to use RESTBase API for page fetch.


 * Campaigns: newarticle and cxstats campaigns are enabled in all wikis where Content Translation is deployed.


 * Use all languages in 'source' selector.