Content translation/Section translation
Section translation is an expansion of the current Content Translation capabilities. Section translation enables users to expand existing Wikipedia articles by translating new sections. In addition, Section Translation is designed to work on mobile devices (in addition to desktop) using a modernized front-end architecture. In this way, users will have new opportunities to translate that were not possible with Content Translation before.
Section translation is the main project of the Boost initiative which is aimed at expanding the use of translation to help more communities to grow. Below you can find more information about the wikis the project will be focusing on.
Please, provide any feedback about this initiative in the discussion page. We are interested in hearing your ideas on how to help communities grow by using translation. More details and updates are provided below.
Try the tool[edit]
Section translation is in early development. For now it is available at a test server and on Bengali Wikipedia.
Goals and impact[edit]
These are the main goals for the project and metrics to measure them:
- Grow the community of translators. Attract more users to translate (using different devices) that remain active over time and help recruit new translators.
- Metric: Increase the percentage of new editors that completed their second translation in a month.
- Grow the content available. Increase the coverage in both the topics available and their depth (information they contain).
- Metric: Double the number of weekly translations on selected wikis (to create new content or extend existing one).
Boost initiative: communities with potential to grow with translation[edit]
Content translation has been successful in supporting the translation process on many Wikipedia communities. It has already helped to create thousands of new articles while encouraging the creation of good quality translations.
We want to focus on Wikipedias with potential to grow by using translation. Research on Section translation and other initiatives to make translation in general more visible such as providing Content Translation access by default (out of beta) will involve and focus on the needs of these particular group of wikis.
This initiative is part of a long-term vision to support cross-wiki content propagation, and it is aligned with the Wikimedia Foundation plans to "Grow participation globally, focusing on emerging markets" and increasing "Worldwide readership".
Communities to focus on[edit]
Wikipedias with less than 100K articles, a significant editing activity for the size of their wiki (more than 70 active editors), and making little use of translation currently (less than 100 translations per month).
Given the above, we initially selected the following Wikipedias: Malayalam, Bengali, Tagalog, Javanese, Albanian, and Mongolian.
As the project becomes more visible, other communities showed interest in the project, and some initiatives will be applied there too. These Wikipedias are Central Bikol and Arabic.
Additional considerations[edit]
This small group of wikis represents a much larger group of communities that can grow with the proposed improvements. Selecting a small set allows us to focus and collaborate more closely with them. Nevertheless, we expect the improvements to benefit a larger number of wikis, and users from all wikis are definitely welcome to participate.
Provide your feedback[edit]
Please, provide any feedback in the discussion page. We are interested in hearing your ideas on how to help communities grow by using translation and Section translation in particular.
You can also check the project in Phabricator to track the progress of the different tasks and share any comment about them.
Research[edit]
The Language Team is also conducting research to better understand the language-related needs of different wikis. This research also supports the design process by evaluating ideas around new ways of expanding existing articles and contributing from mobile devices.
- Section Translation Research – The Section Translation Design Research project evaluated current mobile prototypes with two small wikis. The project evaluated not only initial prototypes, but also a number of design changes after each round of testing. The project also supported design exploration by gathering interview data around critical assumptions of Section Translation, including the role of mobile and the relevance of article sections as a meaningful unit of translation.
- Section Translation Usability Testing – The Section Translation Usability Testing (Bengali Wikipedia) project aimed to learn about the experiences of the first editors using Section Translation on the Bengali Wikipedia.
Status updates[edit]
April 2021[edit]
- Enabled the Content Translation as a default tool in Uzbec, Amharic, and Maltese Wikipedias.
- Content translation provides users with suggestions of articles to translate; Based on user feedback, it would be useful to provide some general control about the knowledge area these suggestions are about.
March 2021[edit]
- Enabled SectionTranslation in testwiki.
- Created and revised a research plan and brief for Section Translation Entry Points using feedback from Language Team.
- Recruited research participants from the first wiki that Section Translation is deployed to, Bengali, and scheduled research sessions.
- Conducted Section Translation research sessions.
- Suggestions are now showing properly after changing the language pair in Section Translation.
- Resolved ‘undefined index’ error that appeared for some users of the Section Translation tool.
- Update Apertium to 3.7 release from upstream.
- Simplified the Section Translation quick tutorial based on research findings to make the order of the steps more intuitive and text shorter so it’s easier for users to read.
- Updated the “suggest” API and app so that users only see 5 suggested articles, making the feature more accessible to users with slow phones.
- Updated the Special:CX menu so that each menu item appears on a separate line.
- Explored ideas for Section Translation entry points: opportunistic (users find opportunities as they navigate content), persistent (stable entry points users can always find), and proactive (the system invites users to translate in response to previous actions).
February 2021[edit]
- Created, revised, and tested protocols for Section Translation research on the first wiki.
- Temporally hid parts of the Section Translation dashboard that are not yet supported on mobile so that users can only see fully functional parts of the workflow.
- As part of the "Pick a sentence" step, users are expected to see only one selected sentence at a time, which can be highlighted in either yellow (if it has not been translated) or blue (if it was translated already).
- Corrected code errors that caused Section Translation dashboard recommendations to not appear and the loading gif to remain indefinitely.
- Based on UX Writer input, "Proposed translation from [MT service]" will be changed to "Suggested translation from [MT service]" on the Section Translation tool.
- After successfully publishing the translation contents, users will no longer see a warning about leaving an in-progress translation to avoid confusion.
- When the user applies a proposed translation, the text should be black to allow users to distinguish it from the untranslated content (gray text).
- Bengali Wikipedia editors participated in previous research about Section Translation with positive results, and we want to enable an initial version of the tool in their wiki. This will allow users to experience how the tool works in a realistic environment without the limitations of the test instance and provide early feedback.
- Uzoma (our CRS) helped us make our announcement to the Bengali Wikipedia; we are monitoring the relevant talk pages for feedback.
- Investigated how easy it would be to add Microsoft Translator API support in cxserver.
- Users selecting a suggestion to translate may decide to go with a different one instead (e.g., after seeing the sections available to translate). It is now possible to return to the dashboard.
- We plan to enable the Section Translation tool in one wiki to learn from its use with real content. Since the tool was designed to select any target language, users may be able to select a different language to publish their translation on. We investigated what happens when someone selects a different language to publish in order to avoid problematic situations.
- Corrected when the user is in the "pick a sentence step" and selects a sentence to translate, clicking on an image no longer causes the browser to navigate away from the page.
- Fixed bug that caused translations to not be published even after the user clicks the publish button.
- Deployed Section Translation on the Bengali Wikipedia on 22 February 2021!
- Adjusted the threshold for Vietnamese to prevent publishing when overall unmodified content is higher than 90%.
- Verified that MT is available in production for Section Translation.
- Enabled Section Translation on Bengali Wikipedia.
- As part of the "Show follow-up options after publishing" step of the Section Translation workflow, users now get an invite to translate another section when they get confirmation that their most recent translation was published.
- As part of the "Show follow-up options after publishing" step of the Section Translation workflow, users now get an invite to translate another section as they get confirmation that their most recent translation was published.
- As part of the "Preview and publish" step of the mobile editor for Section Translation, users can preview their translation contents before publishing them.
- As part of the "Pick a sentence" step (T251551) of the Section Translation mobile editor, users can select sentences that have been already translated to see information on the translation and an option to edit it.
January 2021[edit]
- After the publication of a proposal for translatable modules, we evaluated the implementation efforts required and developed a plan.
- Sent recruitment messages and communicated with possible participants to try to identify additional test users who match criteria our criteria (general level of experience with CX, language pairs supported by Apertium, openness to testing on mobile).
- In order to analyze and visualize the Content Translation funnel, we collected the events that make up the funnel in collaboration with our Product Analyst.
- Analyzed feedback from test users and prepared then presented the report to the Language team.
- Enabled Content Translation in the Tsonga (ts) Wikipedia as a default tool.
- Requested a campaign tag for Wikidocumentaries in Content Translation so that the link would work by bypassing default settings.
- Fixed an error that caused link and template adaptation to fail sometimes (which prevented translating a given section).
- Created a research plan and research brief for Section Translation’s first wiki; Also gathered feedback from Language Team and revised the plan accordingly.
- Set user expectations by indicating that Section Translation mobile support is still experimental via a notice at the top of the translation dashboard on mobile.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, an option to select alternative machine translation (MT) options is provided.
- Based on the input from the technical exploration, we explored design options to find a way for users to select a specific topic area to filter the suggestions.
December 2020[edit]
- Updated the Wikipedia logo on the Section Translation testing server so that it is accurate.
- Enabled content translation in Breton (br) Wikipedia as a default tool.
- Corrected OpusMT errors that caused users to be unable to translate from english to Central Bikol.
- Since Content Translation can now user Parsoid directly, other tools have been deprecated.
- In Section Translation, VE is now preloaded which reduces delays for the user during translation editing.
- Enabled Content Translation in Igbo (ig) Wikipedia as a default tool.
- Enabled Content Translation in Sinhalese (si) Wikipedia as a default tool.
- Enabled Content Translation in Asturian (ast) Wikipedia as a default tool.
- Enabled Content Translation in Georgian (ka) Wikipedia as a default tool.
- Improved MT support for Central Bikol by implementing the use of OpusMT.
- Analyzed data and prepared a report on Multilingual Editors Experiences with Reporting.
- As part of the Multilingual Editors Experience research, journey maps and small wiki personas were drafted.
- Added validation for invalid titles in notifications, creating a workaround that logs errors.
November 2020[edit]
- Increased standard font sizes from 16 pt to 18 pt for Mobile VE in Section Translation.
- Created a responsive and mobile-friendly custom skin for Content Translation special pages.
- Fixed mobile screen height so that the skip tutorial button is visible for all users.
- As part of the "Preview and publish" step of the mobile editor for Section Translation, users can now publish their translation contents after previewing them.
- As part of the "Compare the contents before translating" step of the Section Translation workflow, a sticky header lets the user scroll through the contents while the options to switch between source and target are visible.
- Corrected code error that prevented a not logged in user from viewing some Translation stats pages.
- Created the Section Translation test server feedback collection tool which includes messaging and guidance for participants as well as multiple feedback mechanisms.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, translators can now also translate the section titles within an article.
- Language team’s OpusMT cloud instance was updated to the newest version.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, a card is now shown at the bottom of the viewport with the proposed Machine Translation.
- An error associated with changing the target language of a translation was rectified.
October 2020[edit]
- As part of the "Edit a sentence" step for Section Translation, translations will be loaded using the Mobile Visual Editor.
- Analyzed roughly 45hrs of videos and transcripts from 21 different research sessions as part of Multilingual Editor Experiences in Small Wikis research.
- As part of the "Edit a sentence" step for Section Translation, users will now be able to see the original sentence above the visual editor with options to paginate and expand.
- Investigated cases where translations present unusually high deletion ratios or confusing machine translation (MT) statuses in order to understand usage and needs surrounding MT services and the content translation tool.
- As part of the "Edit a sentence" step for Section Translation, a translation will be loaded with the Mobile Visual Editor, allowing users to edit contents in the usual way and leave the editor.
- As part of the "Pick a sentence" step for Section Translation, users will be able to choose a sentence to edit.
- As part of the "Edit a sentence" step for Section Translation, users will have access to the original sentence as they edit that sentence’s translation.
- Adjusted the threshold for Vietnamese to prevent publishing when overall unmodified content is higher than 95% based on community feedback.
- Corrected a language-data error that led to Compact Language Links not showing the Chavacano language as an option.
- Corrected errors that caused "Section already present banner" to show up momentarily when the screen loads.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, users can interact with sentences by tapping on them and getting feedback.
- Enabled Central Bikol in OpusMT labs instance to allow editors to check when we inform the community.
- Enabled Content Translation in Oriya Wikipedia as a default tool.
- Enabled Content Translation in Somali Wikipedia as a default tool.
- Enabled Content Translation in Irish Wikipedia as a default tool.
- Enabled Content Translation in Belarusian Wikipedia as a default tool.
- Enabled Content Translation in Esperanto Wikipedia as a default tool.
- Checked Apertium configuration for Serbo-croatian support and moved forward by enabling Google the machine translation default instead.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, the selected sentence can be highlighted.
- Completed the Section Translation test server user feedback plan.
- Analyzed videos and transcripts for Multilingual Ed X research sessions - potential editors.
- Any project using GitLab can now opt-in to receive translation updates via Merge Requests.
September 2020[edit]
- Fixed an issue where Content Translation fails to add translated pages sitelinks to wikidata.
- Researched and drafted brief for the Machine Translation and Human Interaction project.
- A document cataloging all the Machine Translation configuration options was created to make it easier on our users.
- Visual Editor integration to the section translation project is ongoing after initial exploration.
- Ladin (lldwiki) language wikipedia was added to cxserver.
- Scheduled and prepped executive summary discussions with the Language team as part of the Multilingual Editors Experience.
- Fixed a bug where adding categories was not possible when using the Content Translation tool.
- Fixed a UI glitch when users used the “compare contents” option in Content Translation.
- Completed the “Compare” step of the Section Translation workflow, allowing users to access the selected source section and the article in the target language.
- Identified the steps/blockers needed to support test automation using Continuous Integration (CI) infrastructure for Section Translation.
- Completed the workflow step for Section Translation that supports alternative machine translation selection options.
- Completed the workflow step for Section Translation that allows a translator to see a placeholder where their new translation will be in the existing article.
- Resolved errors that caused cropped text instances.
- A detailed specification document to facilitate discussing Translatable Modules was written and published on MediaWiki.
- Corrected errors that led to post-save operations for translation units failing on occasion.
- Enabled Content Translation in Urdu Wikipedia as a default tool.
- Enabled Content Translation in Welsh Wikipedia as a default tool.
- Enabled Content Translation in Bashkir Wikipedia as a default tool.
- Excluded testwikis and private wikis from ContentTranslation (CX) draft purge script and separated the CX database on testwiki.
- As part of the "Compare the contents before translating" step, added support for opening articles was completed.
- As part of the "Pick a sentence" step of the Section Translation mobile editor, section contents will now display so that users can navigate on a sentence-by-sentence basis.
August 2020[edit]
- Investigated how to best apply QA practices to Section Translation as it uses a new vue.js-based architecture.
- Enabled Machine Translation for closely-related languages based on community input.
- Enabled Content Translation in Sundanese Wikipedia as a default tool.
- Corrected errors that caused Cx2 to add unnecessary id attributes to elements that do not need them, such as numeric id values to tables.
- Corrected code that lead to a syntax error while using Content Translation.
- Recruitment for Multilingual Editors Experience research was completed.
- As part of Phase 3 of Multilingual Ed X, many research sessions with new/potential editors were scheduled and completed.
- As part of Phase 3 of Multilingual Ed X, many research sessions with existing editors were scheduled and completed.
- Corrected code errors that lead to a Content Translation publishing failure in certain cases.
- Provided section order information in the section suggestions API for Section Translation.
- Updated Content Translation highlight colors to be appropriate for the background colors.
- Machine Translation services supporting Chinese Mandarin were opened to closely related languages and variants to improve the users’ experience using Content Translation.
- All steps until the editor are completed as Section Translation development continues.
- Google Translate is now the default machine translation for Ukrainian.
- Content Translation is now a default tool in the Assamese Wikipedia.
- Content Translation is now a default tool in the Burmese Wikipedia.
July 2020[edit]
- Confirmed resolution of a console error that caused some issues when loading the Content Translation page.
- Corrected errors on Youdao MT that caused machine translation failure.
- Completed building the UI framework to support the Content Translation dashboard and Section Translation workflows.
- Created the API that identifies the missing sections between articles in different languages, a necessary component for Section Translation.
- Resolved issues with user-added categories being ignored in development environments of Content Translation.
- Completed the debrief guides for moderators to complete after each research session.
- Revised some code so that Content Translation produces less log spam.
- Users are able to change their interface language again when using Content Translation after resolving code errors.
- Research sessions have started for Multilingual Editors Experience project, 7 sessions completed this week. Ongoing recruitment tasks completed.
June 2020[edit]
- Completed a review process for the main tickets for Section Translation, reporting missing/unclear aspects that need to be better detailed in order to make the development process more efficient.
- Context Translation is now in out of beta for the Galician Wikipedia.
- Corrected default display setting so that article images in Content Translation dashboard are not blurred anymore.
- Improved Content Translation's automatic template parameter mapping across languages in small Wikipedias by altering template parameters.
- Content Translation is enabled for the newly created Awadhi Wikipedia.
- Corrected the issue causing Content Translation’s beta feature to not always be automatically enabled when starting a new translation.
- Re-aligned UI elements so that the Content Translation dashboard displays correctly.
- Researched and wrote a research brief for Multilingual Editor Experiences’ to help inform the Design and Development teams’ decisions
- A paper co-authored by Santhosh Thottingal and Jorg Tiedemann (Department of Digital Humanities, University of Helsinki) titled "OPUS-MT – Building open translation services for the World" was accepted for 22nd Annual Conference of the European Association for Machine Translation and the proceedings are published (See page 479-480). It outlines the project and collaboration between WMF and University of Helsinki.
- Created research protocol for phase 3 of our Multilingual Editor Experiences in Small Wikis research project.
- Resolved an issue with Content Translation that caused the console to not display certain types of errors.
- Set the dropdown list from select component to be empty by default, improving the user experience.
- Restored Machine Translation to the Content Translation master server after a release caused it to malfunction.
- Created a URL campaign for African languages for a COVID-19 translation project.
- Content Translation handled an edge case where a known user was trying to translate a page which was already being translated by an unknown user.
- Enabled Google Translate support in Content Translation for Kinyarwanda, Odia, Tatar, Turkmen and Uyghur.
- Adjusted how Content Translation manages machine translation warnings with regards to mathematical equations.
May 2020[edit]
- Enabled Content Translation in Armenian Wikipedia as a default tool.
- Content Translation was tested on multiple servers to verify how it functions out-of-beta .
- Adjusted the threshold for machine translation (MT) content for the Chinese Wikipedia to prevent publishing when overall unmodified MT content is higher than 70% at community request.
- Enabled Google Translate support in Content Translation for the following languages: Amharic, Kyrgyz, Luxembourgish, Scots Gaelic, and Xhosa.
- Added the Sakha language to Content Translation’s list of Machine Translation supported languages through Yandex Translation
- Updated the CX abuse filter statistics script from MySQL to Hive
- Reverted machine translation limits for the Chinese Wikipedia as a response to the community’s feedback.
- Completed design specifications for Section Translation and its features.
- Adjusted the machine translation threshold for Chinese Wikipedia to be 5% more strict based on community feedback.
- Tested a neural machine translation service, MarianNMT, to evaluate its its hosting requirements and performance on our systems.
April 2020[edit]
- Content translation enabled by default for Lithuanian.
- Improved testing environment by fixing an issue that resulted in a corrupt repository.
- More reliable section mapping in Section translation by preventing API errors.
- More reliable publishing of translations by preventing that broken categories stop the whole process.
- Content translation enabled by default for Slovenian Wikipedia
- Better support for translating in multiple sessions by restoring access to unfinished translations
- Updated the default machine translation tool to Google Translate for the Chinese Wikipedia at community request
- Expanded monthly reports to include a more convenient way to monitor changes in Content Translation use by language
- Complete design research report on section translation to validate the concept and design ideas
- Completed a full design research analysis on Section Translation Sessions with Bengali, Javanese and Indonesian editors to help improve user experience
- Investigated the cause of some missing interlanguage links for articles translated using Content Translation.
- Improved monthly content translation analytics support by fixing errors that lead to incorrect change tags when articles are not published.
March 2020[edit]
- Improved consistency by fixing a regression that affected the visual styles of Content Translation.
- Support for community events related to translation by creating a url campaign for WikiGap to provide visibility to the translations created through it, and a survey to get the impressions about Content Translation of participants of events (to be initially applied on editathon on cybersecurity in India).
- Completed the analysis for second round of research on Section Translation with Bengali Wikipedia editors, and conducted research sessions for the next round with Javanese editors. This completes the sessions planned for this study, being pending only the final analysis and results.
- Improved template support by generating mappings for template parameters for our target languages using a machine learning approach.
- Improved discovery with better detection of newcomers to show an invitation to translate.
- Started the process to evaluate improving performance of OpusMT by geting access to a server for performance measurements.
- Content translation enabled by default for Estonian, Azerbaijani, and Malay.
- Published initial design ideas for Section translation entry points as part of the design exploration.
February 2020[edit]
- Content translation enabled by default for Telugu, Kannada, Gujarati, Marathi, and Punjabi.
- Technical exploration for the architecture to support Section Translation development (both on desktop and mobile) following the recommendations of the Frontend Architecture Working Group.
- Better support for the Chuvash language by enabling machine translation support.
- Content translation enabled by default for Bosnian and Macedonian.
- Improvements in template support by setting-up a server instance to generate the mappings for template parameters following the computationally-intensive machine learning approach.
- Integrated for the first time a translation service that is both open source and based on Neural Machine Translation in Content Translation (OpusMT). OpusMT has been enabled for Assamese to experiment how community translations could help bootstrap an open machine translation service (announcement and feedback).
- Translation limits in Content Translation have been adjusted for Assamese to prevent low quality translations, given the experimental nature of the OpusMT translation service used for the language.
- Translation limits in Content Translation have been adjusted for Telugu based on feedback from the community and the measurement of the current deletion ratios.
- Content translation enabled by default for Basque, Tamil, and Swahili.
January 2020[edit]
- Improved support for references by preventing duplicate references in the source article.
- Update of testing servers to support the new parsing system.
- Content translation enabled by default for Afrikaans, Icelandic, Latvian, and Nepali Wikipedias.
December 2019[edit]
- Updated test server instances with recent technology versions as required by recent MediaWiki updates.
- Enabled server support for Content translation in two new Wikipedias: Sakizaya (szy) and Mon (mnw) Wikipedias.
- Support for the Wiki for Human Rights initiative by creating a URL campaign in Content translation.
- More reliable support for highlighting sentences across original and translated content and preventing wrongly formatted duplicates of ISBN links.
- Support for compatibility with the new version of the parsing system.
- Content translation configured to be enabled as a default tool (out of beta) for newly created Wikipedias
November 2019[edit]
- Content translation enabled by default for Albanian Wikipedia after editors request.
- Better integrated invites for users to translate when creating a new article, showing a specific article to translate or a one-time general invite depending on the context.
- Compilation of prolific translators in selected wikis to raise awareness of our initiatives and plan further research.
- Infrastructure to set up a local instance of Oups MT based on Marian Neural Machine Translation.
- Section translation research preparations: research brief, test protocol, and recruitment communications (participant screener, community messages, and direct messages)
October 2019[edit]
- Initial design ideas and prototypes for section translation.
- Started conversations with all selected communities (Malayalam, Bengali, Tagalog, Javanese, and Mongolian) with potential to grow with translation.
- Improved discoverability by basic support for surfacing specific translation suggestion as an alternative to create an article from scratch.
- Content translation enabled by default for Tagalog, Central Bikol, Malayalam, Bengali, and Mongolian Wikipedias.
- Better guidance by encouraging to complete old translations with a notification after 3 months.
- Expand machine translation by allowing external tools to use all language pairs even if restricted by their Wikipedia communities.
September 2019[edit]
- Content translation enabled by default for Javanese Wikipedia (announcement)
- Clearer entry points by removing redundant invite to translate.
- Improved entry points with more reliable placement for the interlanguage links dialog and reduced performance penalty for readers.
August 2019[edit]
- Improved discoverability by providing an updated invite to propose translating instead of creating an article from scratch, avoid unnecessary steps when searching for the article to translate, and surfacing the persistent entry points in a more reliable way using global preferences.
- Improved suggestions by surfacing suggestions as the default view when users are not working on any translation already, making it possible to link to any specific view in the dashboard, providing smarter services to find potential source articles for a given title in another language and potential translators, and use a more clear visual metaphor to keep suggestions for later.
- Better control for users to keep suggestions for later by keeping the "for later" list visible for all languages.
- Exposing better content gaps by showing the suggestions view by default when the user has no in-progress translations.
July 2019[edit]
- Better support for exposing the tool as default by supporting settings to disable the entry points for wikis where the tool is out of beta.
- Better guidance for users by showing an invite to try the main entry point to those users that discovered the tool through a non-persistent invite so that they learn how to find the tool the next time.