The updates below are kept for historical reasons:
Last update on: 2014-12-monthly
The language engineering team kicked off development of a prototype version of context translation workflow. This functionality aims to create a workspace for helping editors bootstrap new articles in non-Latin language Wikipedias. In the prototype, Russian and Welsh are being used for initial concept verification.
The prototype ContentTranslation server was created in Node.js, mostly by Santhosh Thottingal and David Chan. The server will be responsible for syncing the translations between all the languages, storing translated parallel texts (using Redis) and retrieving caching the results of language tools queries (machine translation, translation memory, dictionaries, segmentation, etc.).
Some front-end components for the translation interface were made, mostly by Sucheta Goshal and Amir Aharoni.
Santhosh Thottingal and David Chan continued development and technology research on the Content Translation project. Development was focused specifically on updates to the side-by-side translation editor and section alignment of translated text. Kartik Mistry and Santhosh Thottingal worked on infrastructure for testing the Content Translation server. David Chan continued his technology research on sentence segmentation.
Pau Giner updated the Content Translation UI design specification incorporating review comments from UX and product reviews. The team also participated in a review of the Content Translation project with the product team leadership.
ContentTranslation was the team's main effort this month. Source text segmentation was further improved and stabilized. Other developed features include:
- A beta feature that shows a red interlanguage link when the article is not translated to the user's language;
- Basic handling of templates and images;
- Basic publishing of the translation as a formatted article;
- Testing infrastructure for the server.
Most of the team met in Valencia to complete the ContentTranslation architecture and roadmap. The dictionary feature is now up for limited testing.
The team added support for link adaptation, worked on the infrastructure for machine translation support using Apertium and on hiding templates, images and references that cannot be easily translated. They also prepared for deployment on beta wikis and made multiple bug fixes and design tweaks.
An initial version was released on Beta Labs; it supports machine translation between Spanish and Catalan. The machine translation API leverages open source machine translation with Apertium. The tool supports experimental template adaptation between languages. Numerous bug fixes were made based on testing and user feedback. We worked on matching the Apertium version to the cluster, and planning for the next round of development has started.
The machine translation abuse algorithm was redone. The team also worked on reference adaptation improvements, refactoring the front-end event architecture and rewriting the cxserver registry to support multiple machine translation engines.
The second version of the tool was released. This version has not yet been deployed due to technical issues in the Labs setup. This is currently being resolved with the Ops team. Notable improvements include:
- a basic formatting toolbar (for Chrome);
- more accurate warnings for unchanged machine translated content;
- design improvements for the top bar and progress bar;
- bi-directional support for Spanish-Portuguese machine translation;
- link adaptation improvements.
The team is performing ongoing tests with users for Spanish-Portuguese, Portuguese-Spanish translations, and we started planning for the third release.
The second version was deployed, and Kartik worked with the Operations team and upstream developers of Apertium to prepare requisite packages for the machine translation service. The category adaptation was added, as well as bi-directional machine translation support between Catalan, Portuguese and Spanish. Language support was extended, and development started on the translation dashboard and the 3rd version of Content Translation. The first draft of the graduating language support specification was completed; This specification will guide selection of further language pairs to be supported through the Content Translation tool.
The third version was released; it includes several enhancements and fixes. New features include a simple first version translation dashboard for viewing, loading and saving own translation, and the ability to save ongoing translations. The deployment of the Content Translation Database is currently in progress. On completion, users will be able to use the newly added dashboard and save and resume translations for unfinished articles. Collaboration continues with Tech Ops team for preparing the tool for deployment in a production Wikipedia as a beta-feature
The Machine Translation service code was refactored to make it more extensible for other languages and translation services. As an experiment, the Yandex machine translation service was tested. Several fixes related to template adaptation were done. The language selector and the top-bar in the editing interface have been redesigned.
The fourth release is currently underway with a specific goal to prepare the tool for deployment as a beta feature in January.
- Refactored and redesigned the article and language selector using ULS. (This required changes in the design of ULS itself.)
- Multiple fixes in section alignment in the translation interface.
- Warning about possible overwriting of the translated page (same title, same topic)
- Show license text on entry points.
- Disabled all ContentTranslation features unless the user enabled the beta feature.
- Automatic draft translation saving (one of the most requested features from user feedback sessions).
- Multiple other minor bug fixes.
- Preparation for deployment in January - configuration, puppetization, etc.