Content translation/Documentation/FAQ

What is the Content Translation tool?
It's a tool that helps editors create a new article based on a corresponding article about the same topic in a different language.

What is CX?
"CX" is an abbreviation for "ContentTranslation". See glossary below.

How does the Content Translation tool differ from the Translate extension?
The Translate extension was initially built with focus on translating software user interface messages for MediaWiki and other programs. It can also translate MediaWiki pages, but experience shows that it's not so practical for translating articles of the kind that you can find in Wikipedia, Wikivoyage or similar sites: it requires adding markup to the source article to prepare it for translation, and it can mess things up if the source article changes drastically, as it often happens in Wikipedia. This works fairly well for documentation in mediawiki.org, meta and many other sites, but it doesn't scale for Wikipedia.

Is it available for all users of a wiki?
It is available for logged-in users of a wiki where it's enabled, and it must be enabled as a beta feature in the preferences.

Can it be used only to translate articles?
The focus for initial development is articles in Wikipedia and possibly Wikivoyage. It may be enhanced to articles in the style of other sites later.

Will there be special features to insert links and references from the original article?
Links will be automatically inserted when a corresponding link can be found using interlanguage links.

The tool will try to adapt references as much as possible between the source and target languages. This may be challenging given that different languages use different citation formats.

Will Content Translation use information from Wikidata?
Yes.

The earliest release will use interlanguage links from Wikidata to auto-fill the links in the translated article. There are plans to use labels, aliases soon laterwards.

It is likely that when templates in different Wikipedias will use data from Wikidata more, it will be simply picked up by ContentTranslation.

What are the translation aids that will be made available?
The current plan is:
 * Dictionaries: translation and definitions of words.
 * Link adaptation: Links will be adapted automatically when they will be available as interlanguage links to the target languages. It will be possible to make basic manipulation on them - remove them and pick them from other sources.
 * Machine translation and translation memory: These are similar to what is used in the Translate extension.

Will you provide suggestions from translation memory?
Yes, in the future.

The data for translation memory will have to be filled from some initial translations, so it may take a while from the time that translation memory is enabled for ContentTranslation until it becomes useful.

There is no machine translation for my language. How is ContentTranslation useful to me and my wiki?
By itself ContentTranslation is not a machine translation tool. Its primary focus is to help people to translate wiki pages as efficiently as possible. It includes tools that are tightly integrated with MediaWiki and its usual content editing workflow: display of the source and the translation side-by-side; adaptation of links, categories, images and text formatting; publishing to different namespaces; interlanguage links. These features are already supposed to make typing translated articles by hand easier.

Machine translation is not available for the majority of languages in which there are Wikipedias, so most language pairs will only be able to use ContentTranslation as a tool to translate articles manually with the above adaptation tools. If you want to help create a machine translation engine for your language, see How can I improve machine translation support for my language?

Machine translation to my language is bad, and it's easier to translate manually. How is ContentTranslation useful to me and my wiki?
As written in the previous question, ContentTranslation is not by itself a machine translation tool, but a tool to translate pages wiki pages. It is designed to be useful even without machine translation.

Machine translation works well in some languages, and then it can make the translators' work even more efficient. Machine translation support for a language pair is enabled only after testing and approval from people who know the language well.

If machine translation support for your language is enabled, but you don't want to use it, you can disable it and still enjoy the other tools, such as link, category, and image adaptation, as well as dictionaries (if available for your language).

How are you integrating machine translations?
For language in which machine translation is supported in ContentTranslation, machine translation will be auto-filled upon clicking a paragraph in the translation area.

Initially we're using the Apertium engine, which is free software and can be installed and maintained on our own servers. At a later point we may use Moses and other engines.

How can I improve machine translation support for my language?
Contribute to an existing Apertium pair, or create a new one!

Get in contact with the Apertium community with IRC,, or many other ways.

Are you building on other efforts as well?
There was a lot of research on the topic, see Machine translation. For instance: «The quantitative results show that the contributions can improve the accuracy of a combination of RBMT-SPE pipeline at around 10 %, after the post-edition of 50,000 words in the Computer Science domain. We believe that these conclusions can be extended to MT engines involving other less-resourced languages lacking big parallel corpora or frequently updated lexical knowledge» (10.1007/978-3-642-35085-6_4).

Can the machine-translated content be edited manually?
Yes.

We treat machine translation only as a tool that may help a human translator be faster. Publishing machine-translated articles is not the intention of ContentTranslation.

Will there be a feature to prevent bulk publishing of unedited machine translated text?
Yes!

We take article quality seriously. Machine translation is only a tool that helps the translator be more efficient, and the developers understand well that all translations must be edited by a human. The translation interface will show a warning if the translator will try to publish an article that only has machine translation. The developers will work with the editing communities to adjust this for the needs of every language.

What dictionaries will be available?
The dictionaries will be initially taken from free dictionaries from the freedict project. Later other dictionaries may be added, such as Wiktionary, OmegaWiki, terminology collections, and possibly other open sites.

Can I copy images over from the source article?
Yes, images will be copied just like paragraphs - simply by clicking.

How will templates be handled? How are you handling infoboxes?
Initially, all block-level templates, such as infoboxes, will be simply blacklisted by default. They will not even be shown in the source column of translation interface. Templates can be added after the first version of the translated article is created, just as they are usually.

A small number of templates in the Spanish Wikipedia are white-listed and their parameters are mapped to the corresponding templates in the Catalan Wikipedia, so they can be adapted automatically. However, this is only an experiment and the way to adapt infoboxes may change in the future.

Inline templates, such as IPA pronunciation, "citation needed", etc., will be auto-adapted if a corresponding template exists in both languages, or copied as substituted wiki syntax.

Smart and automatic ways to adapt templates are definitely on the roadmap for ContentTranslation.

Will I be able to use the ULS input methods?
Not at the moment, because these input methods do not work well with the editing environment that ContentTranslation (a simple browser contenteditable element). There is a plan to fix this.

When will this be available on a Wikipedia and which one?
It is now available in the following Wikipedias: Two more languages are currently enabled only as source languages: English and Swedish. The plan is to enable them as target languages as well.
 * 1) Catalan
 * 2) Danish
 * 3) Esperanto
 * 4) Indonesian
 * 5) Malaysian
 * 6) Norwegian (bokmål)
 * 7) Portuguese
 * 8) Spanish

Many more languages are planned for enabling in the coming weeks, so watch this space.

Where can I find more technical details about the tool?
Start from the following pages:
 * Extension:ContentTranslation
 * Content translation/Setup
 * Content translation/Technical Architecture

Can I set up the Content Translation extension on my local wiki?
Yes.

Just install the extension and follow the configuration guide. The default configuration has a bias for Wikipedia, so be sure to set it up correctly for your wiki.

What is cxserver?
ContentTranslation by definition works with multiple wikis and it needs to synchronize information between them. To make this possible, it uses an additional component called "ContentTranslation server" or "cxserver" for short. It also optimizes much of the connection to translation tools, such as dictionaries, machine translation, etc.

Is there interest in this feature?
Definitely! In the past there were so many attempts at making similar tools that it's impossible to count them. Some are listed at Machine translation (please add there any you know of).

Glossary

 * annotation: A markup applied to some part of text. Basically, it is html tags like anchor, bold, italic, underline etc.
 * card : a box which appears in the tools column on the special page and provides translation tools for specific context, e.g. a box that allows editing links
 * columns : vertical areas in which Special:ContentTranslation is divided: there are currently three columns (source, translation, tools)
 * Content Tanslation (CX) : This tool consisting of ContentTranslation extension and cxserver backend. It could be more intuitive to abbreviate it as "CT", but this is already used for CategoryTree.
 * cxserver : Backend for CX written in Node.js, handling text segmentation and providing consistent API for services like machine translation, dictionaries and translation memories.
 * glossary:A list of terms with definitions or translations.
 * GWT (Given-When-Then): GWT is a semi-structured way to write down test cases. They can either be tested manually or automated as browser tests with Selenium.
 * lemmatization : also called stemming. Mapping multiple grammatical variants of the same word to a root form; e.g. (swim, swims, swimming, swam, swum) -> swim. Derivational variants are not usually mapped to the same form (so happiness !-> happy).
 * link localization : Converting a wiki article link from one language to another language with the help of wikidata. Example: http://en.wikipedia.org/wiki/Sea becomes http://es.wikipedia.org/wiki/Mar
 * machine translation (MT) : Initial translation made by computer algorithms to help translating faster.
 * morphological analysis : mapping words into morphemes, e.g. swims -> swim/3rdperson_present
 * parallel bilingual text : two versions of the same content, each written in a different language.
 * segmented : reduced in segments
 * segment : Smallest unit of text which is fairly self-contained grammatically. This usually means a sentence, a title, a phrase in a bulleted list, etc.
 * segmentation algorithm : rules to split a paragraph into segments. Weakly language-dependent (sensible default rules work quite well for many languages).
 * sentence alignment : matching corresponding sentences in parallel bilingual text. In general this is a many-many mapping, but it is approximately one-one if the texts are quite strict translations.
 * service : Things like MT, TM, Glossary
 * service providers : External systems which provide a service. Example: Google
 * source column : the column showing the segmented article in source language.
 * template destruction : inlining a template contents when suitable template does not exist in the target wiki
 * tools column : the column where cards appear
 * translation column : the column where the translation is done.
 * translation memory (TM) : A service which suggests translations based on previous translations.
 * translation tools (translation support tools, translation aids) : Context-aware translation tools like MT, Dictionary, link localization
 * word alignment : matching corresponding words in parallel bilingual text. This is strongly many-many.
 * translation dashboard: Listing of all translations of a user. A new translation can also be started from here.