Content translation/Section translation
Section translation is an expansion of the current Content Translation capabilities. Section translation enables users to expand existing Wikipedia articles by translating new sections. In addition, Section Translation is designed to work on mobile devices (in addition to desktop) using a modernized front-end architecture. In this way, users will have new opportunities to translate that were not possible with Content Translation before.
Section translation is the main project of the Boost initiative which is aimed at expanding the use of translation to help more communities to grow. Below you can find more information about the wikis the project will be focusing on.
Please, provide any feedback about this initiative in the discussion page. We are interested in hearing your ideas on how to help communities grow by using translation. More details and updates are provided below.
Try the tool
Section translation is in early development. For now it is only available at our test server.
It requires you to log-in but a new account can be easily for testing purposes.
Goals and impact
These are the main goals for the project and metrics to measure them:
- Grow the community of translators. Attract more users to translate (using different devices) that remain active over time and help recruit new translators.
- Metric: Increase the percentage of new editors that completed their second translation in a month.
- Grow the content available. Increase the coverage in both the topics available and their depth (information they contain).
- Metric: Double the number of weekly translations on selected wikis (to create new content or extend existing one).
Boost initiative: communities with potential to grow with translation
Content translation has been successful in supporting the translation process on many Wikipedia communities. It has already helped to create thousands of new articles while encouraging the creation of good quality translations.
We want to focus on Wikipedias with potential to grow by using translation. Research on Section translation and other initiatives to make translation in general more visible such as providing Content Translation access by default (out of beta) will involve and focus on the needs of these particular group of wikis.
This initiative is part of a long-term vision to support cross-wiki content propagation, and it is aligned with the Wikimedia Foundation plans to "Grow participation globally, focusing on emerging markets" and increasing "Worldwide readership".
Communities to focus on
Wikipedias with less than 100K articles, a significant editing activity for the size of their wiki (more than 70 active editors), and making little use of translation currently (less than 100 translations per month).
This small group of wikis represents a much larger group of communities that can grow with the proposed improvements. Selecting a small set allows us to focus and collaborate more closely with them. Nevertheless, we expect the improvements to benefit a larger number of wikis, and users from all wikis are definitely welcome to participate.
Provide your feedback
Please, provide any feedback in the discussion page. We are interested in hearing your ideas on how to help communities grow by using translation and Section translation in particular.
You can also check the project in Phabricator to track the progress of the different tasks and share any comment about them.
The Language Team is also conducting research to better understand the language-related needs of different wikis. This research also supports the design process by evaluating ideas around new ways of expanding existing articles and contributing from mobile devices.
- Section Translation Research – The Section Translation Design Research project evaluated current mobile prototypes with two small wikis. The project evaluated not only initial prototypes, but also a number of design changes after each round of testing. The project also supported design exploration by gathering interview data around critical assumptions of Section Translation, including the role of mobile and the relevance of article sections as a meaningful unit of translation.
- As part of the "Edit a sentence" step for Section Translation, translations will be loaded using the Mobile Visual Editor.
- Analyzed roughly 45hrs of videos and transcripts from 21 different research sessions as part of Multilingual Editor Experiences in Small Wikis research.
- As part of the "Edit a sentence" step for Section Translation, users will now be able to see the original sentence above the visual editor with options to paginate and expand.
- Investigated cases where translations present unusually high deletion ratios or confusing machine translation (MT) statuses in order to understand usage and needs surrounding MT services and the content translation tool.
- As part of the "Edit a sentence" step for Section Translation, a translation will be loaded with the Mobile Visual Editor, allowing users to edit contents in the usual way and leave the editor.
- As part of the "Pick a sentence" step for Section Translation, users will be able to choose a sentence to edit.
- As part of the "Edit a sentence" step for Section Translation, users will have access to the original sentence as they edit that sentence’s translation.
- Adjusted the threshold for Vietnamese to prevent publishing when overall unmodified content is higher than 95% based on community feedback.
- Corrected a language-data error that led to Compact Language Links not showing the Chavacano language as an option.
- Corrected errors that caused "Section already present banner" to show up momentarily when the screen loads.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, users can interact with sentences by tapping on them and getting feedback.
- Enabled Central Bikol in OpusMT labs instance to allow editors to check when we inform the community.
- Enabled Content Translation in Oriya Wikipedia as a default tool.
- Enabled Content Translation in Somali Wikipedia as a default tool.
- Enabled Content Translation in Irish Wikipedia as a default tool.
- Enabled Content Translation in Belarusian Wikipedia as a default tool.
- Enabled Content Translation in Esperanto Wikipedia as a default tool.
- Checked Apertium configuration for Serbo-croatian support and moved forward by enabling Google the machine translation default instead.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, the selected sentence can be highlighted.
- Completed the Section Translation test server user feedback plan.
- Analyzed videos and transcripts for Multilingual Ed X research sessions - potential editors.
- Any project using GitLab can now opt-in to receive translation updates via Merge Requests.
- Fixed an issue where Content Translation fails to add translated pages sitelinks to wikidata.
- Researched and drafted brief for the Machine Translation and Human Interaction project.
- A document cataloging all the Machine Translation configuration options was created to make it easier on our users.
- Visual Editor integration to the section translation project is ongoing after initial exploration.
- Ladin (lldwiki) language wikipedia was added to cxserver.
- Scheduled and prepped executive summary discussions with the Language team as part of the Multilingual Editors Experience.
- Fixed a bug where adding categories was not possible when using the Content Translation tool.
- Fixed a UI glitch when users used the “compare contents” option in Content Translation.
- Completed the “Compare” step of the Section Translation workflow, allowing users to access the selected source section and the article in the target language.
- Identified the steps/blockers needed to support test automation using Continuous Integration (CI) infrastructure for Section Translation.
- Completed the workflow step for Section Translation that supports alternative machine translation selection options.
- Completed the workflow step for Section Translation that allows a translator to see a placeholder where their new translation will be in the existing article.
- Resolved errors that caused cropped text instances.
- A detailed specification document to facilitate discussing Translatable Modules was written and published on MediaWiki.
- Corrected errors that led to post-save operations for translation units failing on occasion.
- Enabled Content Translation in Urdu Wikipedia as a default tool.
- Enabled Content Translation in Welsh Wikipedia as a default tool.
- Enabled Content Translation in Bashkir Wikipedia as a default tool.
- Excluded testwikis and private wikis from ContentTranslation (CX) draft purge script and separated the CX database on testwiki.
- As part of the "Compare the contents before translating" step, added support for opening articles was completed.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, section contents will now display so that users can navigate on a sentence-by-sentence basis.
- Investigated how to best apply QA practices to Section Translation as it uses a new vue.js-based architecture.
- Enabled Machine Translation for closely-related languages based on community input.
- Enabled Content Translation in Sundanese Wikipedia as a default tool.
- Corrected errors that caused Cx2 to add unnecessary id attributes to elements that do not need them, such as numeric id values to tables.
- Corrected code that lead to a syntax error while using Content Translation.
- Recruitment for Multilingual Editors Experience research was completed.
- As part of Phase 3 of Multilingual Ed X, many research sessions with new/potential editors were scheduled and completed.
- As part of Phase 3 of Multilingual Ed X, many research sessions with existing editors were scheduled and completed.
- Corrected code errors that lead to a Content Translation publishing failure in certain cases.
- Provided section order information in the section suggestions API for Section Translation.
- Updated Content Translation highlight colors to be appropriate for the background colors.
- Machine Translation services supporting Chinese Mandarin were opened to closely related languages and variants to improve the users’ experience using Content Translation.
- All steps until the editor are completed as Section Translation development continues.
- Google Translate is now the default machine translation for Ukrainian.
- Content Translation is now a default tool in the Assamese Wikipedia.
- Content Translation is now a default tool in the Burmese Wikipedia.
- Confirmed resolution of a console error that caused some issues when loading the Content Translation page.
- Corrected errors on Youdao MT that caused machine translation failure.
- Completed building the UI framework to support the Content Translation dashboard and Section Translation workflows.
- Created the API that identifies the missing sections between articles in different languages, a necessary component for Section Translation.
- Resolved issues with user-added categories being ignored in development environments of Content Translation.
- Completed the debrief guides for moderators to complete after each research session.
- Revised some code so that Content Translation produces less log spam.
- Users are able to change their interface language again when using Content Translation after resolving code errors.
- Research sessions have started for Multilingual Editors Experience project, 7 sessions completed this week. Ongoing recruitment tasks completed.
- Completed a review process for the main tickets for Section Translation, reporting missing/unclear aspects that need to be better detailed in order to make the development process more efficient.
- Context Translation is now in out of beta for the Galician Wikipedia.
- Corrected default display setting so that article images in Content Translation dashboard are not blurred anymore.
- Improved Content Translation's automatic template parameter mapping across languages in small Wikipedias by altering template parameters.
- Content Translation is enabled for the newly created Awadhi Wikipedia.
- Corrected the issue causing Content Translation’s beta feature to not always be automatically enabled when starting a new translation.
- Re-aligned UI elements so that the Content Translation dashboard displays correctly.
- Researched and wrote a research brief for Multilingual Editor Experiences’ to help inform the Design and Development teams’ decisions
- A paper co-authored by Santhosh Thottingal and Jorg Tiedemann (Department of Digital Humanities, University of Helsinki) titled "OPUS-MT – Building open translation services for the World" was accepted for 22nd Annual Conference of the European Association for Machine Translation and the proceedings are published (See page 479-480). It outlines the project and collaboration between WMF and University of Helsinki.
- Created research protocol for phase 3 of our Multilingual Editor Experiences in Small Wikis research project.
- Resolved an issue with Content Translation that caused the console to not display certain types of errors.
- Set the dropdown list from select component to be empty by default, improving the user experience.
- Restored Machine Translation to the Content Translation master server after a release caused it to malfunction.
- Created a URL campaign for African languages for a COVID-19 translation project.
- Content Translation handled an edge case where a known user was trying to translate a page which was already being translated by an unknown user.
- Enabled Google Translate support in Content Translation for Kinyarwanda, Odia, Tatar, Turkmen and Uyghur.
- Adjusted how Content Translation manages machine translation warnings with regards to mathematical equations.
- Enabled Content Translation in Armenian Wikipedia as a default tool.
- Content Translation was tested on multiple servers to verify how it functions out-of-beta .
- Adjusted the threshold for machine translation (MT) content for the Chinese Wikipedia to prevent publishing when overall unmodified MT content is higher than 70% at community request.
- Enabled Google Translate support in Content Translation for the following languages: Amharic, Kyrgyz, Luxembourgish, Scots Gaelic, and Xhosa.
- Added the Sakha language to Content Translation’s list of Machine Translation supported languages through Yandex Translation
- Updated the CX abuse filter statistics script from MySQL to Hive
- Reverted machine translation limits for the Chinese Wikipedia as a response to the community’s feedback.
- Completed design specifications for Section Translation and its features.
- Adjusted the machine translation threshold for Chinese Wikipedia to be 5% more strict based on community feedback.
- Tested a neural machine translation service, MarianNMT, to evaluate its its hosting requirements and performance on our systems.
- Content translation enabled by default for Lithuanian.
- Improved testing environment by fixing an issue that resulted in a corrupt repository.
- More reliable section mapping in Section translation by preventing API errors.
- More reliable publishing of translations by preventing that broken categories stop the whole process.
- Content translation enabled by default for Slovenian Wikipedia
- Better support for translating in multiple sessions by restoring access to unfinished translations
- Updated the default machine translation tool to Google Translate for the Chinese Wikipedia at community request
- Expanded monthly reports to include a more convenient way to monitor changes in Content Translation use by language
- Complete design research report on section translation to validate the concept and design ideas
- Completed a full design research analysis on Section Translation Sessions with Bengali, Javanese and Indonesian editors to help improve user experience
- Investigated the cause of some missing interlanguage links for articles translated using Content Translation.
- Improved monthly content translation analytics support by fixing errors that lead to incorrect change tags when articles are not published.
- Improved consistency by fixing a regression that affected the visual styles of Content Translation.
- Support for community events related to translation by creating a url campaign for WikiGap to provide visibility to the translations created through it, and a survey to get the impressions about Content Translation of participants of events (to be initially applied on editathon on cybersecurity in India).
- Completed the analysis for second round of research on Section Translation with Bengali Wikipedia editors, and conducted research sessions for the next round with Javanese editors. This completes the sessions planned for this study, being pending only the final analysis and results.
- Improved template support by generating mappings for template parameters for our target languages using a machine learning approach.
- Improved discovery with better detection of newcomers to show an invitation to translate.
- Started the process to evaluate improving performance of OpusMT by geting access to a server for performance measurements.
- Content translation enabled by default for Estonian, Azerbaijani, and Malay.
- Published initial design ideas for Section translation entry points as part of the design exploration.
- Content translation enabled by default for Telugu, Kannada, Gujarati, Marathi, and Punjabi.
- Technical exploration for the architecture to support Section Translation development (both on desktop and mobile) following the recommendations of the Frontend Architecture Working Group.
- Better support for the Chuvash language by enabling machine translation support.
- Content translation enabled by default for Bosnian and Macedonian.
- Improvements in template support by setting-up a server instance to generate the mappings for template parameters following the computationally-intensive machine learning approach.
- Integrated for the first time a translation service that is both open source and based on Neural Machine Translation in Content Translation (OpusMT). OpusMT has been enabled for Assamese to experiment how community translations could help bootstrap an open machine translation service (announcement and feedback).
- Translation limits in Content Translation have been adjusted for Assamese to prevent low quality translations, given the experimental nature of the OpusMT translation service used for the language.
- Translation limits in Content Translation have been adjusted for Telugu based on feedback from the community and the measurement of the current deletion ratios.
- Content translation enabled by default for Basque, Tamil, and Swahili.
- Improved support for references by preventing duplicate references in the source article.
- Update of testing servers to support the new parsing system.
- Content translation enabled by default for Afrikaans, Icelandic, Latvian, and Nepali Wikipedias.
- Updated test server instances with recent technology versions as required by recent MediaWiki updates.
- Enabled server support for Content translation in two new Wikipedias: Sakizaya (szy) and Mon (mnw) Wikipedias.
- Support for the Wiki for Human Rights initiative by creating a URL campaign in Content translation.
- More reliable support for highlighting sentences across original and translated content and preventing wrongly formatted duplicates of ISBN links.
- Support for compatibility with the new version of the parsing system.
- Content translation configured to be enabled as a default tool (out of beta) for newly created Wikipedias
- Content translation enabled by default for Albanian Wikipedia after editors request.
- Better integrated invites for users to translate when creating a new article, showing a specific article to translate or a one-time general invite depending on the context.
- Compilation of prolific translators in selected wikis to raise awareness of our initiatives and plan further research.
- Infrastructure to set up a local instance of Oups MT based on Marian Neural Machine Translation.
- Section translation research preparations: research brief, test protocol, and recruitment communications (participant screener, community messages, and direct messages)
- Initial design ideas and prototypes for section translation.
- Started conversations with all selected communities (Malayalam, Bengali, Tagalog, Javanese, and Mongolian) with potential to grow with translation.
- Improved discoverability by basic support for surfacing specific translation suggestion as an alternative to create an article from scratch.
- Content translation enabled by default for Tagalog, Central Bikol, Malayalam, Bengali, and Mongolian Wikipedias.
- Better guidance by encouraging to complete old translations with a notification after 3 months.
- Expand machine translation by allowing external tools to use all language pairs even if restricted by their Wikipedia communities.
- Content translation enabled by default for Javanese Wikipedia (announcement)
- Clearer entry points by removing redundant invite to translate.
- Improved entry points with more reliable placement for the interlanguage links dialog and reduced performance penalty for readers.
- Improved discoverability by providing an updated invite to propose translating instead of creating an article from scratch, avoid unnecessary steps when searching for the article to translate, and surfacing the persistent entry points in a more reliable way using global preferences.
- Improved suggestions by surfacing suggestions as the default view when users are not working on any translation already, making it possible to link to any specific view in the dashboard, providing smarter services to find potential source articles for a given title in another language and potential translators, and use a more clear visual metaphor to keep suggestions for later.
- Better control for users to keep suggestions for later by keeping the "for later" list visible for all languages.
- Exposing better content gaps by showing the suggestions view by default when the user has no in-progress translations.
- Better support for exposing the tool as default by supporting settings to disable the entry points for wikis where the tool is out of beta.
- Better guidance for users by showing an invite to try the main entry point to those users that discovered the tool through a non-persistent invite so that they learn how to find the tool the next time.