Content translation/Section translation

Content translation's Boost initiative is aimed at expanding the use of translation to help more communities to grow. By enabling new and more visible ways to contribute by using translation, we expect communities to attract new editors, and expand the knowledge available in their languages. Content translation has been successful in supporting the translation process on many Wikipedia communities. It has already helped to create thousands of new articles while encouraging the creation of good quality translations. However, we identified potential for expanding its use to more contexts that can benefit from translation:


 * Translation can be used by more wikis. The adoption of Content translation varies significantly from wiki to wiki, and there are wikis with potential to benefit from using translation more.
 * Translation can be used in more ways. Currently, Content translation focuses on creating new articles on desktop. Supporting new kinds of contribution such as expanding existing articles with new sections, or mobile translation enable more opportunities to contribute.

Please, provide any feedback about this initiative in the discussion page. We are interested in hearing your ideas on how to help communities grow by using translation.

This initiative is part of a long-term vision to support cross-wiki content propagation, and it is aligned with the Wikimedia Foundation plans to "Grow participation globally, focusing on emerging markets" and increasing "Worldwide readership". More details and updates are provided below.

Goals and impact
These are the main goals for the project and metrics to measure them:


 * Grow the community of translators. Attract more users to translate (using different devices) that remain active over time and help recruit new translators.
 * Metric: Increase the percentage of new editors that completed their second translation in a month.
 * Grow the content available. Increase the coverage in both the topics available and their depth (information they contain).
 * Metric: Double the number of weekly translations on selected wikis (to create new content or extend existing one).

Communities involved
We want to focus on Wikipedias with potential to grow by using translation. Wikipedias with less than 100K articles, a significant editing activity for the size of their wiki (more than 70 active editors), and making little use of translation currently (less than 100 translations per month).

Given the above, we initially selected the following Wikipedias: Malayalam, Bengali, Tagalog, Javanese, and Mongolian.

As the project becomes more visible, other communities showed interest in the project, and some initiatives will be applied there too. These Wikipedias are Central Bikol, Albanian and Arabic.

Additional considerations
This small group of wikis represents a much larger group of communities that can grow with the proposed improvements. Selecting a small set allows us to focus and collaborate more closely with them. Nevertheless, we expect the improvements to benefit a larger number of wikis, and users from all wikis are definitely welcome to participate.

This list is considered a preliminary selection. Once we complete the initial communication we'll get a better understanding of the interest of those communities to get involved in the project, and the final list may change.

Provide your feedback
Please, provide any feedback about this initiative in the discussion page. We are interested in hearing your ideas on how to help communities grow by using translation.

You can also check the project in Phabricator to track the progress of the different tasks and share any comment about them.

Scope
We want knowledge to propagate across languages more fluently. These are the initial work areas considered:


 * Make Content translation more visible. This involves making the tool available by default, expose the tool more visibly at relevant places, surface relevant content gaps, and customize the process to meet the needs of the community. In this way, more editors will be able to find relevant content to translate.
 * Increase the coverage of content for existing articles. Explore ideas to expand existing articles by translating new sections. This will enable users to expand existing articles with new aspects to cover the topic in more detail.
 * Support translation on more devices. Support translation from mobile devices to provide more opportunities for contribution on any device, enabling new editors to participate.

These areas are based on our experience working on Content translation, the user feedback received and previous research. However, specific research for this project will inform the specific work areas to consider.

Research
As part of the Boost Initiative, the Language Team is also conducting research to better understand the language-related needs of different wikis. This research also supports the design process by evaluating ideas around new ways of expanding existing articles and contributing from mobile devices.


 * Section Translation Research – The Section Translation Design Research project evaluated current mobile prototypes with two small wikis. The project evaluated not only initial prototypes, but also a number of design changes after each round of testing. The project also supported design exploration by gathering interview data around critical assumptions of Section Translation, including the role of mobile and the relevance of article sections as a meaningful unit of translation.

Status updates



 * Enabled Content Translation in Armenian Wikipedia as a default tool.
 * Content Translation was tested on multiple servers to verify how it functions out-of-beta.
 * Adjusted the threshold for machine translation (MT) content for the Chinese Wikipedia to prevent publishing when overall unmodified MT content is higher than 70% at community request.
 * Enabled Google Translate support in Content Translation for the following languages: Amharic, Kyrgyz, Luxembourgish, Scots Gaelic, and Xhosa.
 * Added the Sakha language to Content Translation’s list of Machine Translation supported languages through Yandex Translation
 * Updated the CX abuse filter statistics script from MySQL to Hive
 * Reverted machine translation limits for the Chinese Wikipedia as a response to the community’s feedback.
 * Completed design specifications for Section Translation and its features.
 * Adjusted the machine translation threshold for Chinese Wikipedia to be 5% more strict based on community feedback.
 * Tested a neural machine translation service, MarianNMT, to evaluate its its hosting requirements and performance on our systems.


 * Content translation enabled by default for Lithuanian.
 * Improved testing environment by fixing an issue that resulted in a corrupt repository.
 * More reliable section mapping in Section translation by preventing API errors.
 * More reliable publishing of translations by preventing that broken categories stop the whole process.
 * Content translation enabled by default for Slovenian Wikipedia
 * Better support for translating in multiple sessions by restoring access to unfinished translations
 * Updated the default machine translation tool to Google Translate for the Chinese Wikipedia at community request
 * Expanded monthly reports to include a more convenient way to monitor changes in Content Translation use by language
 * Complete design research report on section translation to validate the concept and design ideas
 * Completed a full design research analysis on Section Translation Sessions with Bengali, Javanese and Indonesian editors to help improve user experience
 * Investigated the cause of some missing interlanguage links for articles translated using Content Translation.
 * Improved monthly content translation analytics support by fixing errors that lead to incorrect change tags when articles are not published.


 * Improved consistency by fixing a regression that affected the visual styles of Content Translation.
 * Support for community events related to translation by creating a url campaign for WikiGap to provide visibility to the translations created through it, and a survey to get the impressions about Content Translation of participants of events (to be initially applied on editathon on cybersecurity in India).
 * Completed the analysis for second round of research on Section Translation with Bengali Wikipedia editors, and conducted research sessions for the next round with Javanese editors. This completes the sessions planned for this study, being pending only the final analysis and results.
 * Improved template support by generating mappings for template parameters for our target languages using a machine learning approach.
 * Improved discovery with better detection of newcomers to show an invitation to translate.
 * Started the process to evaluate improving performance of OpusMT by geting access to a server for performance measurements.
 * Content translation enabled by default for Estonian, Azerbaijani, and Malay.
 * Published initial design ideas for Section translation entry points as part of the design exploration.


 * Content translation enabled by default for Telugu, Kannada, Gujarati, Marathi, and Punjabi.
 * Technical exploration for the architecture to support Section Translation development (both on desktop and mobile) following the recommendations of the Frontend Architecture Working Group.
 * Better support for the Chuvash language by enabling machine translation support.
 * Content translation enabled by default for Bosnian and Macedonian.
 * Improvements in template support by setting-up a server instance to generate the mappings for template parameters following the computationally-intensive machine learning approach.
 * Integrated for the first time a translation service that is both open source and based on Neural Machine Translation in Content Translation (OpusMT). OpusMT has been enabled for Assamese to experiment how community translations could help bootstrap an open machine translation service (announcement and feedback).
 * Translation limits in Content Translation have been adjusted for Assamese to prevent low quality translations, given the experimental nature of the OpusMT translation service used for the language.
 * Translation limits in Content Translation have been adjusted for Telugu based on feedback from the community and the measurement of the current deletion ratios.
 * Content translation enabled by default for Basque, Tamil, and Swahili.


 * Improved support for references by preventing duplicate references in the source article.
 * Update of testing servers to support the new parsing system.
 * Content translation enabled by default for Afrikaans, Icelandic, Latvian, and Nepali Wikipedias.


 * Updated test server instances with recent technology versions as required by recent MediaWiki updates.
 * Enabled server support for Content translation in two new Wikipedias: Sakizaya (szy) and Mon (mnw) Wikipedias.
 * Support for the Wiki for Human Rights initiative by creating a URL campaign in Content translation.
 * More reliable support for highlighting sentences across original and translated content and preventing wrongly formatted duplicates of ISBN links.
 * Support for compatibility with the new version of the parsing system.
 * Content translation configured to be enabled as a default tool (out of beta) for newly created Wikipedias


 * Content translation enabled by default for Albanian Wikipedia after editors request.
 * Better integrated invites for users to translate when creating a new article, showing a specific article to translate or a one-time general invite depending on the context.
 * Compilation of prolific translators in selected wikis to raise awareness of our initiatives and plan further research.
 * Infrastructure to set up a local instance of Oups MT based on Marian Neural Machine Translation.
 * Section translation research preparations: research brief, test protocol, and recruitment communications (participant screener, community messages, and direct messages)


 * Initial design ideas and prototypes for section translation.
 * Started conversations with all selected communities (Malayalam, Bengali, Tagalog, Javanese, and Mongolian) with potential to grow with translation.
 * Improved discoverability by basic support for surfacing specific translation suggestion as an alternative to create an article from scratch.
 * Content translation enabled by default for Tagalog, Central Bikol, Malayalam, Bengali, and Mongolian Wikipedias.
 * Better guidance by encouraging to complete old translations with a notification after 3 months.
 * Expand machine translation by allowing external tools to use all language pairs even if restricted by their Wikipedia communities.


 * Content translation enabled by default for Javanese Wikipedia (announcement)
 * Clearer entry points by removing redundant invite to translate.
 * Improved entry points with more reliable placement for the interlanguage links dialog and reduced performance penalty for readers.


 * Improved discoverability by providing an updated invite to propose translating instead of creating an article from scratch, avoid unnecessary steps when searching for the article to translate, and surfacing the persistent entry points in a more reliable way using global preferences.
 * Improved suggestions by surfacing suggestions as the default view when users are not working on any translation already, making it possible to link to any specific view in the dashboard, providing smarter services to find potential source articles for a given title in another language and potential translators, and use a more clear visual metaphor to keep suggestions for later.
 * Better control for users to keep suggestions for later by keeping the "for later" list visible for all languages.
 * Exposing better content gaps by showing the suggestions view by default when the user has no in-progress translations.


 * Better support for exposing the tool as default by supporting settings to disable the entry points for wikis where the tool is out of beta.
 * Better guidance for users by showing an invite to try the main entry point to those users that discovered the tool through a non-persistent invite so that they learn how to find the tool the next time.