Content translation/Documentation/FAQ

These are the frequently asked questions about the Content translation feature.

What is the Content Translation tool?
It's a tool that helps editors create a new article based on a corresponding article about the same topic in a different language.

What is CX?
"CX" is an abbreviation for "ContentTranslation". It couldn't abbreviated as "CT" because these letters are already used for the CategoryTree extension.

Is there a user manual for this tool?
Yes! See the page Content translation user guide. You can also translate it to your language.

Is there interest in this feature?
Definitely! It was pretty clear even before the development started: In the past there were so many attempts at making similar tools that it's impossible to count them. Some are listed at Machine translation (please add there any you know of).

As of November 2015, over 7,000 people created over 30,000 articles using it since it was enabled, so it is more certain than ever that there is demand for it.

How does the Content Translation tool differ from the Translate extension?
The Translate extension was initially built with focus on translating software user interface messages for MediaWiki and other programs. As such, it's built to keep translations in sync with the source language, while each language edition of a Wikimedia project is supposed to be independent and not reflect a source language.

To provide its advanced features on MediaWiki wiki pages, Translate requires preparation of the source page for translation with some additional markup, which can't yet be handled visually. Said wikitext is distracting to most editors and we did not want to expose Content Translation translators to it.

Is it available for all users of a wiki?
It is available for logged-in users of a wiki where it's enabled, and it must be enabled as a beta feature in the preferences.

Can it be used only to translate Wikipedia articles?
The focus for initial development is articles in Wikipedia and possibly Wikivoyage. It may be enhanced to other sites and types of pages later.

What are the steps to create a new article with the Content Translation tool?
The main entry point to Content Translation is a button on your contributions page:
 * 1) Click "Contributions" in your personal bar (near "Log out").
 * 2) Click "New contribution" and select "Translation".
 * 3) Click "Create new translation".
 * 4) Select the language from which you want to translate in the "From:" field, and type the name of the article in that language.
 * 5) Select the language to which you want to translate and type how the new article will be called.
 * 6) Click "Start translation". This will take you to the translation interface.
 * 7) Type the translation of each paragraph in the translation column. You don't have to translate all the paragraphs. Translate as much as needed for the wiki in your language.
 * 8) Until you publish, the translation is regularly saved automatically, so you don't have to worry that you'll lose it. To come back to an article that you started translating, repeat steps 1 and 2 and select the article from the list that you'll see.
 * 9) When you wrote everything you want for the first version of the new translated article, click "Publish translation". Depending on the configuration on your wiki, this will either create a new article in the main space or a draft page under your user page.

How does ContentTranslation handle links?
Links will be automatically inserted when a corresponding link can be found using interlanguage links.

Other way to add links are:
 * Selecting a word in the translation which is spelled the same as the target article, and clicking "Add link" in the sidebar.
 * Selecting a word in the translation, and clicking a link in the source column. If a corresponding article exists in the target language, a link to it will be added.
 * Selecting a word in the translation and adding a link to any page using the tool that appears in the sidebar. By default, a list of available adaptable will be shown under the link tool.

How does ContentTranslation handle references?
The tool will try to adapt references as much as possible between the source and target languages.

If you delete a reference from the translation, you can add it back by placing the caret where you want to add the reference, clicking the reference in the source column and then clicking "Add reference".

Adapting references may be challenging given that different languages use different citation formats. If there is a reference that you cannot adapt, or that is adapted incorrectly, please report a bug.

Is it possible to edit references in ContentTranslation?
No, only to copy them as-is. You can change them or add your own references after creating the article.

This may become possible in the future.

Can I copy images over from the source article?
Yes, images will be copied just like paragraphs - simply by clicking. The translator will have to type the caption, of course.

It works only if the image is stored in a common media repository (for Wikimedia projects this is Commons). It doesn't work for files stored in the projects locally. It also won't work if the image is a part of an infobox.

What does it mean to publish the article?
The publishing of the article is the same as creating a revision of a wiki page - it has a date, the revision author's username and an edit summary, it appears in Recent Changes, etc.

How are the writers of the source article credited?
The creators of the source article are credited by adding a link to the revision of the source article which was used when the translation started in the edit summary of the published version. This is compatible with the CC-BY-SA license and with Wikimedia's Terms of Use, which require you as the re-user to give attribution "through hyperlink [...] to the page or pages that you are re-using".

Can I continue translating after publishing?
At the moment Content Translation is focused on creating the first version of the article. After publishing, the translation cannot be loaded as such from the dashboard, and the published page must be edited as a usual wiki page.

There is a plan to make it possible to add translated paragraphs to already-published pages.

Will Content Translation use information from Wikidata?
Yes, in several ways:
 * Interlanguage links from Wikidata are used to auto-fill the links in the translated article.
 * When a translated article is published, a sitelink to it is added to the corresponding Wikidata item, so that an interlanguage to it appears immediately.
 * Descriptions from Wikidata are shown in the linking tool and in the article selection tool.
 * The dashboard has a link to a Wikidata-based tool that helps you find articles that don't exist in your language.

When templates in different Wikipedias will use data from Wikidata more, they will be simply picked up by Content Translation without any effort from the translator.

There are also plans to use labels and aliases in smart ways in the future.

What are the translation aids that will be made available?
The current plan is:
 * Dictionaries: translation and definitions of words.
 * Link adaptation: Links will be adapted automatically when they will be available as interlanguage links to the target languages. It will be possible to make basic manipulation on them - remove them and pick them from other sources.
 * Category adaptation: Categories that have a directly corresponding category page in the target language linked by an interlanguage link will be added to the translated page.
 * Image adaptation: Images are copied to the translated article in one click.
 * Machine translation and translation memory: These are similar to what is used in the Translate extension.
 * Automatic interlanguage link adding. An interlanguage link will appear immediately after publishing the article.

Can anybody read the text that is saved automatically while I am writing the translation?
No. It is accessible only to the translator until the publishing.

Will you provide suggestions from translation memory?
This is planned as a future feature.

The data for translation memory will have to be filled from some initial translations, so it may take a while from the time that translation memory is enabled for Content Translation until it becomes useful.

How good are the articles created using Content Translation?
As good as any other articles are created in Wikipedias in the respective languages.

Since the deployment of Content Translation as a beta feature in some languages in January 2015 until November 2015 about 30,000 articles were created. In July 2015 CX became enabled as a beta in all languages, and since then till November 2015 less than 10% of the articles created using CX were deleted. For comparison, the rate of deletion of articles that are created using the wiki syntax editor goes up to 50% in English.

The articles that were not deleted developed as usual Wikipedia articles: people fixed layout, added or edited paragraphs, added templates, improved references, and so on. Usually these improvements were done both by the person who created the first version and by other wikipedians.

There is no machine translation for my language. How is Content Translation useful to me and my wiki?
By itself Content Translation is not a machine translation tool. Its primary focus is to help people to create translated wiki pages as efficiently as possible. It includes tools that are tightly integrated with MediaWiki and its usual content creation and editing workflow: display of the source and the translation side-by-side; adaptation of links, categories, images and text formatting; publishing to different namespaces; interlanguage links. These features are already supposed to make typing translated articles by hand easier.

This is not just theory. Content Translation was enabled in the French Wikipedia on March 31 2015 and by June 7 it was used to create 500 articles, even though machine translation was not available.

The fact is that machine translation is not available for the majority of languages in which there are Wikipedias, so most language pairs will only be able to use Content Translation as a tool to translate articles manually with the above adaptation tools. If you want to help create a machine translation engine for your language, see How can I improve machine translation support for my language?

Machine translation to my language is bad, and it's easier to translate manually. How is Content Translation useful to me and my wiki?
As written in the previous answer, Content Translation is not by itself a machine translation tool, but a tool to create translated wiki pages. It is designed to be useful even without machine translation.

Machine translation works quite well in some languages, and then it can make the translators' work even more efficient. Machine translation support for a language pair is enabled after testing and approval from people who know the language well.

If machine translation support for your language is enabled, but you don't want to use it, you can disable it and still enjoy the other tools, such as link, category, and image adaptation, as well as dictionaries (if available for your language).

How are you integrating machine translations?
For language in which machine translation is supported in Content Translation, machine translation will be auto-filled upon clicking a paragraph in the translation area.

Initially we're using the Apertium engine, which is free software and can be installed and maintained on our own servers. At a later point we may use Moses and other engines. We have recently added Yandex for limited use between English to Russian.

What languages are being handled by Yandex? Are there plans to add more?
Yandex is available at present only for English to Russian translations, for users who will be creating pages for the Russian Wikipedia through Content Translation. Although Yandex provides translation capability for nearly 60 languages, we do not have any immediate plans to activate it for other language pairs. However, we are open to requests from the Wikipedia communities if they would like Yandex to be made available for their languages.

How is using Yandex different than using Apertium?
As a user of Content Translation you will not feel any difference on the translation interface as the machine translation system of Yandex will display the translated content in the same way Apertium currently does for the supported 45 language pairs.

How is the machine translation being done if I choose Yandex?
Yandex provides a free for use API key that allows websites and other other services to use their translation system. Content Translation also uses a unique API key to access this service on Yandex’s server. When a user starts translating an article, the HTML content of each section of the source article is sent to the Yandex server and a translated version is obtained and displayed on the respective translation column of Content Translation. Links and references are adapted as usual and users can modify the content as required.

This process continues for all the sections of the article being translated. For better performance, the translations for consecutive sections are pre-fetched. The user can save the unpublished translation (to work on it again at a later time) or publish the article in the usual manner. The article is published on Wikipedia like any other normal article with appropriate attribution and licenses.

You can view a diagram of the process.

Yandex is not based on open source software. Why are we using it?
Content Translation evolved from a long-standing need to bridge the gap in the amount of content between Wikipedias in different languages. Like all other software used on Wikimedia sites, Content Translation is also open source. In this particular case as well, we are using an open source client to interact with the external service and import freely licensed content in order to help users expand our free knowledge.

To use Yandex’s machine translation system we are not adding any proprietary software in the Content Translation code, or on the Wikimedia websites and servers. The service is free of charge and available for everyone.

Only the freely available Wikipedia article content (in segments) is sent to the Yandex service and the obtained translated content is freely usable on Wikipedia pages. The translated content can be modified by users and this data is also available publicly under a free license through the Content Translation API. This is a valuable resource made available for the community to develop open source translation services for those languages where they don't exist yet.

After studying the implications carefully, we found the fact that the content was stored previously in a closed source service does not limit the freedom of our knowledge or our software in the present or the future. We have taken special care to make sure that the content provided is freely licensed to make sure it complies with Wikipedia policies. This includes a long process for legal and technical evaluation and compliance. The summary of the terms of use is also available.

From user feedback we have seen that machine translation support is really helpful for users and we want to support all languages in the best way. Guided by the principles of Wikimedia Foundation’s resolution to support free and open source software, we will prioritise the integration of open source services whenever they are available for a language. Apertium has been a critical part of Content Translation since its inception, but currently it only provides machine translations for 45 of the numerous possible language combination that Wikipedia can support.

Should I be worried about my personal information when using Yandex?
Irrespective of the service being used, you can be sure that only Wikipedia content from existing articles is sent and only freely licensed content will be added back to the translation. No personal information is sent and communication with those services happen at the server side, so they are isolated from the user device. Please refer to this diagram for more details.

What if Yandex is the only machine translation tool available and I don’t want to use it?
Machine Translation is an optional feature in Content Translation that you can easily disable at will. If more machine translation systems are added for your languages, you can choose to enable MT again and select the MT service of your choice.

Will the content translated by Yandex be free for use in Wikipedia?
Yes. The content received from Yandex is otherwise freely available on the Yandex web translation platform. Content Translation receives it via an API key to make it seamlessly available on the translation interface. This content can be modified by the users (if necessary) and used in Wikipedia articles under free licenses.

Can this content be used for improving machine translation systems in general?
Yes. Translations made in Content Translation are saved in our database. This information will be made publicly available for anyone to use as translation examples to improve their translation services (from University research groups, open source projects to commercial companies, anyone!). The content can be accessed via the Content Translation API. Please note, only information related to translated text is publicly available. This includes - source and translated text, source and target language information and an identifier for the segment of text.

How can I improve machine translation support for my language?
Contribute to an existing Apertium pair, or create a new one!

Get in contact with the Apertium community with IRC,, or many other ways.

Why doesn't Content Translation use the wiki syntax editor?
Because it should be easier for translators who are beginners with Wikipedia editing, and because it was much easier to implement features like link adaptation, reference adaptation, image adaptation and machine translation integration in an HTML-based WYSIWYG editor. Content Translation is an article creation tool rather than an article editing tool. Because it is not supposed to be a full-fledged article editing environment, it only provides the most basic formatting tools. After an article is created, it can be edited in the VisualEditor or in the source editor, just like any other article.

In more technical terms, Content Translation uses a simple HTML "contenteditable" element that is available in modern browsers, it transforms the source article's HTML to the translation, and when publishing the article as a wiki page, it converts the translation to wikitext using Parsoid. At the moment, Content Translation does not use the VisualEditor for editing the translation, though this may be done in the future.

Are you building on other efforts as well?
There was a lot of research on the topic, see Machine translation. For instance: «The quantitative results show that the contributions can improve the accuracy of a combination of RBMT-SPE pipeline at around 10 %, after the post-edition of 50,000 words in the Computer Science domain. We believe that these conclusions can be extended to MT engines involving other less-resourced languages lacking big parallel corpora or frequently updated lexical knowledge» (10.1007/978-3-642-35085-6_4).

Can the machine-translated content be edited manually?
Yes, and it should be!

We treat machine translation only as a tool that may help a human translator be faster. Publishing machine-translated articles is not the intention of Content Translation, and it is actively discouraged.

Will there be a feature to prevent bulk publishing of unedited machine translated text?
Yes!

We take article quality seriously. Machine translation is only a tool that helps the translator be more efficient, and the developers understand well that all translations must be edited by a human. The translation interface will show a warning if the translator will try to publish an article that only has machine translation. The developers will work with the editing communities to adjust this for the needs of every language.

What dictionaries will be available?
The dictionaries will be initially taken from free dictionaries from the freedict project. Later other dictionaries may be added, such as Wiktionary, OmegaWiki, terminology collections, and possibly other open sites.

How will templates be handled? How are you handling infoboxes?
Initially, all block-level templates, such as infoboxes, will be simply blacklisted by default. They will not even be shown in the source column of translation interface. Templates can be added after the first version of the translated article is created, just as they are usually.

A small number of templates in the Spanish Wikipedia are white-listed and their parameters are mapped to the corresponding templates in the Catalan Wikipedia, so they can be adapted automatically. However, this is only an experiment and the way to adapt infoboxes may change in the future.

Inline templates, such as IPA pronunciation, "citation needed", etc., will be auto-adapted if a corresponding template exists in both languages, or copied as substituted wiki syntax.

Smart and automatic ways to adapt templates are definitely on the roadmap for Content Translation.

Will I be able to use the ULS input methods?
Yes!

When will this be available on Wikipedia in my language?
All Wikipedias are supported since July 8, 2015.

Where can I find more technical details about the tool?
Start from the following pages:
 * Extension:ContentTranslation
 * Content translation/Setup
 * Content translation/Technical Architecture

Can I set up the Content Translation extension on my local wiki?
Yes.

Just install the extension and follow the configuration guide. The default configuration has a bias for Wikipedia, so be sure to set it up correctly for your wiki.

What is cxserver?
ContentTranslation works from the outset with multiple wikis and it needs to synchronize information between them. To make this possible, it uses an additional component called "ContentTranslation server" or "cxserver" for short. It also optimizes much of the connection to translation tools, such as dictionaries, machine translation, etc.

Does it work in Microsoft Internet Explorer?
Similarly to VisualEditor, Content Translation works in Microsoft Internet Explorer 10 and newer versions. It doesn't work in version nine or older, but support for them may be added in the future.

Glossary

 * annotation: A markup applied to some part of text. Basically, it is html tags like anchor, bold, italic, underline etc.
 * card : a box which appears in the tools column on the special page and provides translation tools for specific context, e.g. a box that allows editing links
 * columns : vertical areas in which Special:ContentTranslation is divided: there are currently three columns (source, translation, tools)
 * Content Translation (CX) : This tool consisting of ContentTranslation extension and cxserver backend.
 * cxserver : Backend for CX written in Node.js, handling text segmentation and providing consistent API for services like machine translation, dictionaries and translation memories.
 * glossary:A list of terms with definitions or translations.
 * GWT (Given-When-Then): GWT is a semi-structured way to write down test cases. They can either be tested manually or automated as browser tests with Selenium.
 * lemmatization : also called stemming. Mapping multiple grammatical variants of the same word to a root form; e.g. (swim, swims, swimming, swam, swum) -> swim. Derivational variants are not usually mapped to the same form (so happiness !-> happy).
 * link localization : Converting a wiki article link from one language to another language with the help of wikidata. Example: http://en.wikipedia.org/wiki/Sea becomes http://es.wikipedia.org/wiki/Mar
 * machine translation (MT) : Initial translation made by computer algorithms to help translating faster.
 * morphological analysis : mapping words into morphemes, e.g. swims -> swim/3rdperson_present
 * parallel bilingual text : two versions of the same content, each written in a different language.
 * segmented : reduced in segments
 * segment : Smallest unit of text which is fairly self-contained grammatically. This usually means a sentence, a title, a phrase in a bulleted list, etc.
 * segmentation algorithm : rules to split a paragraph into segments. Weakly language-dependent (sensible default rules work quite well for many languages).
 * sentence alignment : matching corresponding sentences in parallel bilingual text. In general this is a many-many mapping, but it is approximately one-one if the texts are quite strict translations.
 * service : Things like MT, TM, Glossary
 * service providers : External systems which provide a service. Example: Google
 * source column : the column showing the segmented article in source language.
 * template destruction : inlining a template contents when suitable template does not exist in the target wiki
 * tools column : the column where cards appear
 * translation column : the column where the translation is done.
 * translation memory (TM) : A service which suggests translations based on previous translations.
 * translation tools (translation support tools, translation aids) : Context-aware translation tools like MT, Dictionary, link localization
 * word alignment : matching corresponding words in parallel bilingual text. This is strongly many-many.
 * translation dashboard: Listing of all translations of a user. A new translation can also be started from here.