Content translation/Documentation/FAQ

These are the frequently asked questions about the Content translation feature.

What is the Content Translation tool?
It's a tool that helps editors create a new article based on a corresponding article about the same topic in a different language.

What is CX?
"CX" is an abbreviation for "ContentTranslation". It couldn't abbreviated as "CT" because these letters are already used for the CategoryTree extension.

Is there a user manual for this tool?
Yes! See the page Content translation user guide. You can also translate it to your language.

Is there interest in this feature?
Definitely! It was pretty clear even before the development started: In the past there were so many attempts at making similar tools that it's impossible to count them. Some are listed at Machine translation (please add there any you know of).

As of June 2015, over a thousand people created over 5000 articles using it since it was enabled, so it is more certain than ever that there is demand for it.

How does the Content Translation tool differ from the Translate extension?
The Translate extension was initially built with focus on translating software user interface messages for MediaWiki and other programs. It can also translate MediaWiki pages, but experience shows that it's not so practical for translating articles of the kind that you can find in Wikipedia, Wikivoyage or similar sites: it requires adding markup to the source article to prepare it for translation, and it can mess things up if the source article changes drastically, as it often happens in Wikipedia. This works fairly well for documentation in mediawiki.org, meta and many other sites, but it doesn't scale for Wikipedia.

Is it available for all users of a wiki?
It is available for logged-in users of a wiki where it's enabled, and it must be enabled as a beta feature in the preferences.

Can it be used only to translate Wikipedia articles?
The focus for initial development is articles in Wikipedia and possibly Wikivoyage. It may be enhanced to other sites and types of pages later.

What are the steps to create a new article with the Content Translation tool?
The main entry point to Content Translation is a button on your contributions page:
 * 1) Click "Contributions" in your personal bar (near "Log out").
 * 2) Click "New contribution" and select "Translation".
 * 3) Click "Create new translation".
 * 4) Select the language from which you want to translate in the "From:" field, and type the name of the article in that language.
 * 5) Select the language to which you want to translate and type how the new article will be called.
 * 6) Click "Start translation". This will take you to the translation interface.
 * 7) Type the translation of each paragraph in the translation column. You don't have to translate all the paragraphs. Translate as much as needed for the wiki in your language.
 * 8) Until you publish, the translation is regularly saved automatically, so you don't have to worry that you'll lose it. To come back to an article that you started translating, repeat steps 1 and 2 and select the article from the list that you'll see.
 * 9) When you wrote everything you want for the first version of the new translated article, click "Publish translation". Depending on the configuration on your wiki, this will either create a new article in the main space or a draft page under your user page.

Will there be special features to insert links and references from the original article?
Links will be automatically inserted when a corresponding link can be found using interlanguage links.

The tool will try to adapt references as much as possible between the source and target languages. This may be challenging given that different languages use different citation formats.

Can I copy images over from the source article?
Yes, images will be copied just like paragraphs - simply by clicking. The translator will have to type the caption, of course.

It works only if the image is stored in a common media repository (for Wikimedia projects this is Commons). It doesn't work for files stored in the projects locally. It also won't work if the image is a part of an infobox.

Can I continue translating after publishing?
At the moment Content Translation is focused on creating the first version of the article. After publishing, the translation cannot be loaded as such from the dashboard, and the published page must be edited as a usual wiki page.

There is a plan to make it possible to add translated paragraphs to already-published pages.

Will Content Translation use information from Wikidata?
Yes.

The earliest release will use interlanguage links from Wikidata to auto-fill the links in the translated article. Also, when a translated article is published, an interlanguage link is automatically added using Wikidata.

It is likely that when templates in different Wikipedias will use data from Wikidata more, it will be simply picked up by Content Translation.

There are plans to use labels and aliases in smart ways in the future.

What are the translation aids that will be made available?
The current plan is:
 * Dictionaries: translation and definitions of words.
 * Link adaptation: Links will be adapted automatically when they will be available as interlanguage links to the target languages. It will be possible to make basic manipulation on them - remove them and pick them from other sources.
 * Category adaptation: Categories that have a directly corresponding category page in the target language linked by an interlanguage link will be added to the translated page.
 * Image adaptation: Images are copied to the translated article in one click.
 * Machine translation and translation memory: These are similar to what is used in the Translate extension.

Will you provide suggestions from translation memory?
This is planned as a future feature.

The data for translation memory will have to be filled from some initial translations, so it may take a while from the time that translation memory is enabled for Content Translation until it becomes useful.

How good are the articles created using Content Translation?
As good as any other articles are created in Wikipedias in the respective languages.

Since the deployment of Content Translation for early testing in the summer of 2014 until early June 2015 more than a 5000 articles were created using Content Translation, and a little over 200 of them were deleted as very bad translations. This is less than 7%. In comparison, about half of the articles created every day in the English Wikipedia are speedily deleted.

The articles that were not deleted developed as usual Wikipedia articles: people fixed layout, added or edited paragraphs, added templates, improved references, and so on. Usually these improvements were done both by the person who created the first version and by other wikipedians.

There is no machine translation for my language. How is Content Translation useful to me and my wiki?
By itself Content Translation is not a machine translation tool. Its primary focus is to help people to create translated wiki pages as efficiently as possible. It includes tools that are tightly integrated with MediaWiki and its usual content creation and editing workflow: display of the source and the translation side-by-side; adaptation of links, categories, images and text formatting; publishing to different namespaces; interlanguage links. These features are already supposed to make typing translated articles by hand easier.

This is not just theory. Content Translation was enabled in the French Wikipedia on March 31 2015 and by June 7 it was used to create 500 articles, even though machine translation was not available.

The fact is that machine translation is not available for the majority of languages in which there are Wikipedias, so most language pairs will only be able to use Content Translation as a tool to translate articles manually with the above adaptation tools. If you want to help create a machine translation engine for your language, see How can I improve machine translation support for my language?

Machine translation to my language is bad, and it's easier to translate manually. How is Content Translation useful to me and my wiki?
As written in the previous answer, Content Translation is not by itself a machine translation tool, but a tool to create translated wiki pages. It is designed to be useful even without machine translation.

Machine translation works quite well in some languages, and then it can make the translators' work even more efficient. Machine translation support for a language pair is enabled after testing and approval from people who know the language well.

If machine translation support for your language is enabled, but you don't want to use it, you can disable it and still enjoy the other tools, such as link, category, and image adaptation, as well as dictionaries (if available for your language).

How are you integrating machine translations?
For language in which machine translation is supported in Content Translation, machine translation will be auto-filled upon clicking a paragraph in the translation area.

Initially we're using the Apertium engine, which is free software and can be installed and maintained on our own servers. At a later point we may use Moses and other engines.

How can I improve machine translation support for my language?
Contribute to an existing Apertium pair, or create a new one!

Get in contact with the Apertium community with IRC,, or many other ways.

Why doesn't Content Translation use the wiki syntax editor?
Because it should be easier for translators who are beginners with Wikipedia editing, and because it was much easier to implement features like link adaptation, reference adaptation, image adaptation and machine translation integration in an HTML-based WYSIWYG editor. Content Translation is an article creation tool rather than an article editing tool. Because it is not supposed to be a full-fledged article editing environment, it only provides the most basic formatting tools. After an article is created, it can be edited in the VisualEditor or in the source editor, just like any other article.

In more technical terms, Content Translation uses a simple HTML "contenteditable" element that is available in modern browsers, it transforms the source article's HTML to the translation, and when publishing the article as a wiki page, it converts the translation to wikitext using Parsoid. At the moment, Content Translation does not use the VisualEditor for editing the translation, though this may be done in the future.

Are you building on other efforts as well?
There was a lot of research on the topic, see Machine translation. For instance: «The quantitative results show that the contributions can improve the accuracy of a combination of RBMT-SPE pipeline at around 10 %, after the post-edition of 50,000 words in the Computer Science domain. We believe that these conclusions can be extended to MT engines involving other less-resourced languages lacking big parallel corpora or frequently updated lexical knowledge» (10.1007/978-3-642-35085-6_4).

Can the machine-translated content be edited manually?
Yes.

We treat machine translation only as a tool that may help a human translator be faster. Publishing machine-translated articles is not the intention of Content Translation, and it is actively discouraged.

Will there be a feature to prevent bulk publishing of unedited machine translated text?
Yes!

We take article quality seriously. Machine translation is only a tool that helps the translator be more efficient, and the developers understand well that all translations must be edited by a human. The translation interface will show a warning if the translator will try to publish an article that only has machine translation. The developers will work with the editing communities to adjust this for the needs of every language.

What dictionaries will be available?
The dictionaries will be initially taken from free dictionaries from the freedict project. Later other dictionaries may be added, such as Wiktionary, OmegaWiki, terminology collections, and possibly other open sites.

How will templates be handled? How are you handling infoboxes?
Initially, all block-level templates, such as infoboxes, will be simply blacklisted by default. They will not even be shown in the source column of translation interface. Templates can be added after the first version of the translated article is created, just as they are usually.

A small number of templates in the Spanish Wikipedia are white-listed and their parameters are mapped to the corresponding templates in the Catalan Wikipedia, so they can be adapted automatically. However, this is only an experiment and the way to adapt infoboxes may change in the future.

Inline templates, such as IPA pronunciation, "citation needed", etc., will be auto-adapted if a corresponding template exists in both languages, or copied as substituted wiki syntax.

Smart and automatic ways to adapt templates are definitely on the roadmap for Content Translation.

Will I be able to use the ULS input methods?
Yes!

When will this be available on Wikipedia in my language?
See the current list at Content translation/Languages.

The plan is to enable the beta feature in Wikipedia in all languages during June 2015.

Where can I find more technical details about the tool?
Start from the following pages:
 * Extension:ContentTranslation
 * Content translation/Setup
 * Content translation/Technical Architecture

Can I set up the Content Translation extension on my local wiki?
Yes.

Just install the extension and follow the configuration guide. The default configuration has a bias for Wikipedia, so be sure to set it up correctly for your wiki.

What is cxserver?
ContentTranslation works from the outset with multiple wikis and it needs to synchronize information between them. To make this possible, it uses an additional component called "ContentTranslation server" or "cxserver" for short. It also optimizes much of the connection to translation tools, such as dictionaries, machine translation, etc.

Does it work in Microsoft Internet Explorer?
Similarly to VisualEditor, Content Translation works in Microsoft Internet Explorer 10 and newer versions. It doesn't work in version nine or older, but support for them may be added in the future.

Glossary

 * annotation: A markup applied to some part of text. Basically, it is html tags like anchor, bold, italic, underline etc.
 * card : a box which appears in the tools column on the special page and provides translation tools for specific context, e.g. a box that allows editing links
 * columns : vertical areas in which Special:ContentTranslation is divided: there are currently three columns (source, translation, tools)
 * Content Translation (CX) : This tool consisting of ContentTranslation extension and cxserver backend.
 * cxserver : Backend for CX written in Node.js, handling text segmentation and providing consistent API for services like machine translation, dictionaries and translation memories.
 * glossary:A list of terms with definitions or translations.
 * GWT (Given-When-Then): GWT is a semi-structured way to write down test cases. They can either be tested manually or automated as browser tests with Selenium.
 * lemmatization : also called stemming. Mapping multiple grammatical variants of the same word to a root form; e.g. (swim, swims, swimming, swam, swum) -> swim. Derivational variants are not usually mapped to the same form (so happiness !-> happy).
 * link localization : Converting a wiki article link from one language to another language with the help of wikidata. Example: http://en.wikipedia.org/wiki/Sea becomes http://es.wikipedia.org/wiki/Mar
 * machine translation (MT) : Initial translation made by computer algorithms to help translating faster.
 * morphological analysis : mapping words into morphemes, e.g. swims -> swim/3rdperson_present
 * parallel bilingual text : two versions of the same content, each written in a different language.
 * segmented : reduced in segments
 * segment : Smallest unit of text which is fairly self-contained grammatically. This usually means a sentence, a title, a phrase in a bulleted list, etc.
 * segmentation algorithm : rules to split a paragraph into segments. Weakly language-dependent (sensible default rules work quite well for many languages).
 * sentence alignment : matching corresponding sentences in parallel bilingual text. In general this is a many-many mapping, but it is approximately one-one if the texts are quite strict translations.
 * service : Things like MT, TM, Glossary
 * service providers : External systems which provide a service. Example: Google
 * source column : the column showing the segmented article in source language.
 * template destruction : inlining a template contents when suitable template does not exist in the target wiki
 * tools column : the column where cards appear
 * translation column : the column where the translation is done.
 * translation memory (TM) : A service which suggests translations based on previous translations.
 * translation tools (translation support tools, translation aids) : Context-aware translation tools like MT, Dictionary, link localization
 * word alignment : matching corresponding words in parallel bilingual text. This is strongly many-many.
 * translation dashboard: Listing of all translations of a user. A new translation can also be started from here.