Content translation/V2

Content translation version 2 (CX2) is a major refactoring and architectural update of  (CX).

The goal is to provide a solid and reliable translation tool that is aligned with the Wikimedia standards in technology and design, and provides a great way to contribute for newcomers.

Version 2 uses the  editing surface, uses a  based front end, and follows the [https://design.wikimedia.org/style-guide/ Wikimedia design guidelines].

In addition, learnings from and new research on the experience of new editors will be used to identify improvements to make translation a great way to start contributing to Wikipedia.

The plan is to gradually replace version 1 with version 2.

A backwards compatibility plan will make sure that content created by users during the transition period won't be affected.

Try the new version
The new version is now the default, you can just access the tool from Special:ContentTranslation from Wikipedia in any language. When you start a new translation with Content translation you will be using the new version. Note that the previous version will be still used when opening old translations that you started with such version.

The new version is still in active development. Please, use it to translate articles and report feedback. We love to hear what works for you, and what still needs improvement.

Using version 2 on production will create content in real wikis when you publish your translation, so it is not suited for experiments that don't create a quality translation as a result. For more experimental testing, you can try the new feature in our testing servers. They are a separate wiki (you need to create a new user account since the log-in service is not integrated with Wikimedia projects). Although the content you translate in the test servers comes from real Wikipedias, the published content will be only created in the test server. This allows to experiment without interfering with the work done in those projects.

Provide feedback
We are interested in hearing how well the new version works for both new and existing users of Content Translation. If you are providing specific feedback about version 2, make sure to mention it refers to CX2. Otherwise, it may be harder to identify the issue. You can use the CX2 discussion page for specific feedback on CX2, and report issues in Phabricator.

Track created articles
A dashboard shows the translations published with the new version and the number of users publishing these. In addition, articles published with the new version are marked with the #contenttranslation-v2 edit tag to facilitate finding them (e.g., using Recent Changes) and evaluate the quality of the content.

Features
The new version will include a more powerful editing surface, which will bring new possibilities that were repeatedly requested by translators using the tool. However, other features form version 1 won't be initially available with the new version.

Major features
A general CX2 roadmap describes the planned interventions in more detail. The main areas of intervention are:


 * Align with the Wikimedia standards in technology and design
 * Visual Editor's editing surface with more editing tools to insert and edit templates, tables, multimedia, categories, etc.
 * Reliable undo/redo support.
 * UI revamp based on UI Standardisation initiative and OOUI components
 * Quality control mechanisms. Control user modifications in more detail to encourage translators to create quality content.
 * A great way to contribute for newcomers
 * Machine translation support for Template params, reference texts and practically all kind of elements in screen. In version 1, machine translation was limited to paragraphs alone.
 * Better support for References and Templates
 * Ability to add and remove categories
 * Solid and reliable
 * Fixing lots of bugs that was too difficult to handle with previous version

Missing features from the current version
The features listed above are possible with the new technology architecture. However, in order to be able to deliver those improvements soon, we have to limit the efforts of rewriting all the existing tools CX1 has. Thus, some of the existing tools won't be available in CX2 initially. We selected those based on our observations of current use, the value they provide in version 1 and their complexity, but we are looking for your feedback during this process.

These are the tools currently in CX1 that will be missing initially for CX2:


 * Custom template translation editor. CX1 added support for a side-by-side editor of templates that allowed translators to map their parameters. The initial implementation allowed to evaluate a promising concept but it was far from being complete, and rewriting this for CX2 will require significant effort. Initially, the standard template editor dialog provided by Visual Editor will be available in CX2 instead. Although it is not optimized for transferring information across languages, it provides basic support for editing all parameters of a template in the translation.
 * Dictionaries. CX1 had experimental support for dictionary information lookup for a few language pairs. Dictionaries are a very relevant tool for translators, and we'll keep track the progress of Wikimedia projects in this area that will enable their integration in the future. However, providing support for CX2 makes more sense when there is a clear plan to integrate more dictionaries.
 * Progress indicator in the editor. A progress bar showed in CX1 how much of the article was translated and how much was missing. This information will be still visible from the dashboard, but not while editing the article. Based on our observations from users, having it on the editor was not providing much value.
 * Announcements of new machine translation services. The automatic translation card became highlighted when a new machine translation service was made available for the current language. This was especially useful in the initial stages of the tool where new services were added regularly. We can reconsider this feature once the migration to version 2 is completed and we plan to integrate new machine translation services in the future.




 * The new version is the only version available for new translations, even when started through the URL. Old translations started with version 1 will still use it for backwards compatibility.
 * Improved support for references by not translating reference contents automatically since information such as book titles is often kept untranslated, and avoid warnings insisting to translate the contents of the reference list.
 * Support for standard visual components on the new translation dialog.
 * Better guidance by communicating that publishing is only allowed to experienced editors for wikis configured in such way. Configuration has not been applied to any yet, and should make obsolete the use of abuse filters that some communities implemented.
 * Better guidance by providing more detailed publish options in a dialog and more detailed information when users are not allowed to publish translations with too much unmodified contents.
 * Improved persistency for templates by preventing user changes to get lost across sessions.
 * Better support for exploring links in the source document.
 * Performance improvements by reducing the number of code modules delivered to users, and regressions corrected for placeholder icons, and for event logging of suggestions.
 * Better user guidance by communicating the English Wikipedia limitation to only allow publishing to extended confirmed users in a more detailed way and before the user makes the translation effort.


 * Better guidance on how to get started by inviting users to add the first paragraph to the translation.
 * Support for the automatic removal of old translation drafts.
 * Better tracking of user modifications to the content by ignoring the list of references when computing unmodified machine translation.
 * Improved machine translation support by correcting extra spaces introduced by Google translate, and avoiding references lists to be sent to translation services.
 * Better guidance by showing an option to reset the translation only when the user made modifications to the translation.
 * Improved access to data on published translations by fixing wrongly formatted JSON exports.
 * Better guidance by communicating session has expired.
 * Improved development environment by reducing the building time due to unnecessary code dependencies, enable testing for local wiki restrictions and infrastructure (service runner) updates.
 * More reliable rendering of the content by preventing spell-checker and input methods to show for the read-only source article.


 * Support to inspect elements in the source article such as references, templates and other content with non visible information.
 * More reliable generation of the published content by avoiding useless divs and nowiki tags inside references to be generated.
 * Better support for galleries by showing the associated contextual tools, and correcting visual alignment for the new translation dialog language selector.
 * More scalable support for translation services by simplifying the default configuration, and prevention of translation errors with external services.
 * Better handling of translation contents by saving links reliably, dividing the content in sections without the "italic title" template to get in the way, avoid extra spaces to be added around headers and improved support for identifying and highlighting corresponding sentences.
 * Corrected regressions in transferring templates across languages which was not working after the integration of the new read-only mode and style glitches in the article title.
 * Improved testing environment by correcting fatal errors when logging-in in the beta cluster.
 * Cleaner representation of the source article by avoiding the input cursor to show for read-only content, and preventing links to wrongly show as red inside audio templates.
 * More clear statistics, by clarifying the information shown and the description of the output.
 * Completed the clean-up of the backlog of old translation drafts, allowing to automate the process next.
 * Better support for references by providing appropriate forms depending on the type of template they use.
 * More reliable navigation to the translation tile and the associated information by focusing the input in the title when navigating to issues related to it, and preventing the instructions card to disappear when completing the title edit.


 * Published deletion ratio comparison showing the percentage of deleted articles for those created with and without Content translation and how these evolved in time for the target wikis.
 * Published the percentage of translations compared to total article creations, showing how much translation has been used as a way to create articles since Content Translation was available in different communities.




 * Improved error communication by surfacing pending issues when publishing.
 * More reliable navigation in the new translation dialog by opening links on a new window.
 * More reliable control of unreviewed content by adjusting the thresholds of the warning, reducing false positives and more strictly checking content copied from the source article.
 * Version 2 enabled by default, users migrated are shown an explanatory message. It's still possible to switch to version 1.
 * Better support for inline templates, keeping their contents in the translation as plain-text when the corresponding template is missing in the target wiki.
 * More clear language for the translation options with a clarifying message when machine translation is not available for the selected languages.
 * More fluent experience by avoiding scroll to jump after adding a link.
 * Improved support for transferring templates across languages by mapping parameters based on their names when template metadata is lacking.
 * More reliable generation of the published content by avoiding ID attributes to be included for tables, and keeping HTML entities (including spaces) when transferred into the translation.


 * Improved saving process by avoiding saving to be unnecessarily triggered when content is transcluded.
 * More reliable notification for automatically deleted translations when the user is missing a local account in the current wiki to support backwards compatibility.
 * Better guidance when the original article has changed too much and cannot be updated without affecting the current translation, and facilitating access to the updated content in a separate tab/window.
 * Invite is shown to encourage users to try the new version.
 * Improved support for transferring templates by communicating when mandatory parameters (or no parameter at all) could not be mapped.
 * More effective warnings for unreviewed content, not interfering with content that the user started from scratch.
 * More solid integration of translation services by supporting a rate-limiter to prevent excessive request peaks and avoiding issues when the rate-limiter is unconfigured.
 * Improved entry points by allowing url-based campaigns to configure an edit tag to be used when the resulting translation is published.
 * Improved navigation across different errors and warnings on the translation.


 * Measurements for performance improvements in the access to external translation services.
 * More reliable template processing to avoid issues with templates to block paragraphs to be added to the translation.
 * Better control of content quality by adding to a tracking category published translations with unreviewed contents ( you can check the tracking category list on any language for the category with the code "cx-unreviewed-translation-category").
 * Improved support for links by correcting a styling regression.
 * More reliable saving process by preventing errors in the initial save attempt.
 * Support for references added by name, references are correctly transferred to the translation even if their definition is in a different paragraph.
 * Improved support for Apertium by making tests independent of the testing servers infrastructure, and updated dependencies (lex-tools, separable, and others) that were needed for updating the French-Catalan support.
 * Support for blocked users, and partially-blocked users to access the entry points.
 * Performance improved for the API access to translation services.
 * Better monitoring of translation services for real-time tracking of errors and availability (view dashboard), and weekly use of translation services over time (view dashboard).
 * Improved support for links by updating the link information after a link is added.
 * Better guidance by allowing access to additional details from the issue summary.
 * More reliable approach to measure content modifications by avoiding the content typed by the user to be counted as machine translated content when starting from scratch.


 * Improved performance by prefetching requests to translation services in advance to deliver results immediately when users add a new paragraph, caching of API requests and an optimization of the caching system.
 * More reliable saving process by retrying with an increased timeout after initial failure.
 * More solid approach to divide the content into sections for the tool to handle, and preventing errors when adding sections to the translation.
 * Improved support for inserting media, preventing the media dialog to block the article.
 * Better support for references solving issues when editing references and displaying them.
 * Improved control for excessive unmodified content by excluding types of content where the warning is not relevant in order to prevent false positives.
 * Better support for red links, generating as red links only for the missing articles the user confirmed, and avoid rendering them as regular links in the source document.
 * Better guidance by delaying content warnings only while the user is editing but showing previous warnings immediately when a translation is resumed.
 * Improved alignment on RTL languages and a cleaner source article by avoiding noise characters showing from Visual Editor elements.
 * More solid control of anonymous access by gracefully handling access errors.
 * More reliable saving and recovering process to persist the contents when user switches between translation services, and avoid moving translations to in-progress when there are no further changes since they were published.
 * Better guidance by communicating when templates could not be adapted.
 * Better mapping references across languages by using Citoid metadata when mapping template parameters, avoiding issues with empty references and partially adapted ones.
 * More effective integration with translation services by sending a more compact version of the content HTML to prevent exceeding the translation limits, improving efficiency and security. A regression in this area was also fixed.
 * Facilitate the access to the new version by creating an outreach campaign link to be used in interactions with Wikipedia communities.
 * Instrumentation improvements to better measure the issues translators experience.
 * More reliable application of machine translation by avoiding quick manual modifications to be counted as machine translation, improving support for quote templates by avoiding blank text blocks to be sent for translation, avoid the adaptation of transclusion fragments, improved logging for Apertium, and better handling of templates and link attributes.
 * More reliable visual alignment of paragraphs between original and translated content.




 * Basic control for too much unmodified content warning, showing a warning at the paragraph level when machine translation has not been edited enough.
 * Basic control to prevent publishing translation with very little modifications in the whole document showing an error.
 * More reliable handling of failing Machine Translation services.
 * Improved behaviour for "undo" when marking links as missing, and avoiding interference with saving.
 * A #contenttranslation-v2 tag is added to published articles with version 2 to facilitate analytics. The usual #contenttranslation tag still applies to articles published with any version.
 * Improvements in the issue communication system to support general issues that are not attached to a specific paragraph, and code refactoring to reduce duplications with the progress tracking system.
 * More reliable application of machine translation to gracefully handle the failure of translation services, its application to references by avoiding repeated cite tags and repeated reference contents, a more reliable integration with Apertium.
 * Communication of session expiration in the translation dashboard.
 * Dashboard to show the translations published with the new version and the number of users publishing these.
 * Support for switching between version 1 and 2 to allow users to test the new version, and provide access to information about the new version.
 * Graphs for measuring the number of translators and articles published with the new version.
 * Optimization to reduce the amount of data required to adapt templates improving the time to save contents, and better visual display of templates.
 * Improved support for templates avoiding infoboxes to be missing on source articles, better support for coordinate templates avoiding these to break the visual alignment of paragraphs, and support for adding the list of references without errors.
 * More reliable saving process avoiding blanked sections to remain unsaved.
 * Improved visual alignment of paragraphs to avoid new templates added to the document to break it.


 * Warning for too much unmodified content per paragraph to prevent machine translation misuse.
 * Improved template translation by avoiding the mapping of parameters that do not exist in the target language.
 * Support for videos, and inline images.
 * Published a list of representative wikis that will facilitate focusing on the impact our work has towards the stated goals.
 * Improved support for references that appear in several paragraphs, and citation templates to avoid dirty markup.
 * Performance improvements to avoid delay when typing in large article translation.
 * More reliable application of initial translations by avoiding to leave the paragraph unusable when Machine Translation fails and when switching to not use machine translation.
 * Improved issue communication system by allowing to mark issues as resolved for the current session, showing information about issues when they are relevant, and more clear message when there is too much unmodified machine translation.
 * More reliable processing of the content to avoid some images not being displayed in the source article, bullet points to be skipped, wide images overflowing their column, and stub templates to avoid displaying them.
 * Better support for content persistency when using the browser back button.
 * Adjusted look for link information to improve consistency with Visual Editor and avoid code duplication.
 * Support for keyboard navigation across paragraphs in the translation.
 * Support for calculating translation progress at section level.


 * Improved handling of sections with transcluded content, that were not recognised as separate paragraphs.
 * Enabling access to tools when editing rich content inside a dialog.
 * Communicate the default machine translation service and waiting due to changes of service more clearly.
 * Notifications for the deletion of old translation drafts, configurable period for deleting old drafts automatically and initial runs of the script to support backwards compatibility.
 * Adjustments in the dialog layout used to edit complex content to preserve the document metaphor, enabling the use of the tools column.
 * More reliable loading of articles avoiding instances of "page not found" error for existing articles.
 * Better support to transfer content to the translation, including references, templates, links pointing to the same target, and avoiding errors when adding new paragraphs.
 * Improved auto-save to better deal with sections that were added while the saving process was running.
 * More precise timestamps for translations to prevent discrepancies due to local timezone differences.
 * Linking automatically corresponding versions in other languages through Wikidata when the article is published.
 * Refactoring of styles to better anticipate the effect of changes in the styling code and better align to Wikimedia standards.




 * Script to clear very old translation drafts to facilitate future backwards compatibility, reduce conflicts due to outdated content and better use of database space.
 * Showing a cleaner source article by the removal of irrelevant templates, preventing links for existing pages to be shown as redlinks, and preventing navigating away when clicking images.
 * Optimization of the saving process to be applied only when changes occur, and more reliable category manipulation.
 * Improved machine translation use by showing the loading indicator persistently, not showing machine translation options on dialogs, fluently switching to an empty paragraph when machine translation is not used, and avoiding unexpected scrolling and technical errors when applying machine translation.
 * Communicate errors when translation title contains invalid characters and the translation title is empty.
 * Communicate warnings when a page already exists with the same title as the translation.
 * Improved link card allowing to mark links as missing and avoiding duplicate cards when double clicking a link.
 * Support for machine translation extended for Simple English.
 * Discourage publishing empty translations by showing the publish button initially disabled.


 * Support for Captcha confirmation when target wikis request it.
 * Layout adjustments to keep article titles visually aligned, make visual alignment reliable when resizing the browser window, and avoid menus to hide behind the main content.
 * Showing a clean source article by removing irrelevant sections and hidden categories.
 * Definition of success metrics: number of newcomers, percentage of successful completion, number of errors, and translation survival.
 * Support for links with new link cards allowing to preview their content in both languages and manipulate links, support to search for links in the target wiki, and better control for text input cursor placement when editing links.
 * Use of updated API for short descriptions which allows local wikis to override Wikidata default descriptions.
 * Starting translations from a link is more fluent by automatically selecting the article to translate.
 * More reliable saving by including source information, and language selection to avoid duplicates.
 * Improved testability: editor version is persistent when navigating from dashboard, and captcha support added in the test environment.
 * Automatic translation card with options to select the Machine Translation provider from those available, copying the source text or starting from scratch. Modifications on the original automatic translations are kept persistent with an option to reset them.
 * Improving the persistence of categories and solving issues with missing categories.
 * Improvements in code modularization, API parameter consistency, and regressions in the visual alignment of paragraphs.




 * Improved approach to keep source and target paragraphs visually aligned and highlighting consistently represented and applied.
 * Improved support for galleries, images (to prevent accidental navigation), timelines, math formulas, and tables.
 * Polishing on category support, including layout adjustments and support for long labels.
 * Asking for confirmation when the translation is going to overwrite an existing article.
 * Loading process improvements to communicate the loading status.
 * Backwards compatibility support by opening translations with the version of the editor used to create them.
 * Tool support improvements to show tool cards in the tools column (including Visual Editor Inspectors) and show initial instructions only when relevant.




 * Infrastructure changes to improve links metadata (to facilitate adapting links across languages), and image support.
 * Support for basic publishing of translations (further work on communicating issues to be done).
 * Support for category adaptation. Categories get added to the translation automatically based on the existing ones, and users can remove them or add new ones.
 * Layout adjustments to customize the editing toolbar, the sticky headers and how both fit together.
 * Frequently requested features from VE are now available in CX2 with the new editing surface such as copy&paste links, edit link labels, reliable undo/redo, insert new templates, and converting wiki-syntax.
 * Clean-up of the source article to remove irrelevant sections for the translation such as hat notes, metadata or links to sister projects.
 * The version used remains persistent as part of each translation to support backwards compatibility, and as users navigate between the dashboard and the editor to facilitate testing.




 * A basic side-by-side editor that would allow users to do a very manual translation. Many regressions after integrating the Visual Editor editing surface have been solved.
 * Content persistency. The editor allows loading articles, automatic saving, and basic restoring of translations.
 * Work with source and translation next to each other: add content paragraph by paragraph to the translation and keep paragraphs aligned.
 * Layout reorganisation to align with the style guide.
 * Improved testability. CX2 Test servers working again, and plans to support backwards compatibility.

Plans
Content Translation was developed iteratively for last 2+ years. During that time, the focus was to evaluate the core ideas on how to improve the translation experience for Wikipedia editors. The architecture was a flexible one where modules can be plugged and try these concepts. This allowed to move fast, but the approach and cut corners affected the code organisation, maintainability and reliability of the tool. The proposed refactoring and architectural update will contribute to provide a tool solid and reliable translation tool that is aligned with the Wikimedia standards in technology and design.

At the end of this intervention we want Content Translation to be a tool that:

The way to get there is detailed in different plans below.
 * Is aligned with the Wikimedia standards in technology and design. Uses the editing surface technology of Visual Editor (VE), and follows the Design style guide principles.
 * Is a great way to contribute for newcomers. The tool provides a quick and easy way for new editors to start contributing. Even if the tool does not support dealing with complex content or situations, it always provides a clear path forward for new editors.
 * Is solid and reliable. The tool is reliable enough to go out of beta for at least one community.

Development plan
Starting in February 2018, the CX2 roadmap defines the incremental stages to complete the development of the tool.

Rollout plan
A rough plan is to enable version 2 in smaller wikis or subset of wikis to do QA and gradually rollout to more wikis. The list of representative wikis can be useful to identify candidates.

Backwards compatibility plan
Versions 1 and 2 will coexist during a transition period. Given that the translations each version produce are not expected to be compatible, the following steps are considered to avoid issues related to breaking backwards compatibility:


 * 1) Translations started with one version of the editor will be alway opened with the same version, regardless of which is the current default editor. That is, when version 2 is the default, old translations started with the version1 will still be opened with version 1.
 * 2) Once version 2 is considered the stable default, creating new translations with older versions will be prevented. That is, version 1 will not be available to create new translations, but it will be still available to edit the old ones.
 * 3) With a process in place to automatically discard translations after one year, version 1 could be safely removed after such period pases since no new articles can be started with it.