Help:Content translation/Translating/Translation quality

When creating a translation, it is essential to review the contents before publishing them. You need to make sure that the content produced is not altering the original meaning, and check that it reads natural in the destination language. The initial machine translation provided helps to speed the translation process by providing a useful starting point, but the tool encourages users to review and edit significantly the initial contents.

Different mechanisms guarantee that translators edit the initial translations appropriately. The translation editor tracks how much the initial translation is modified by the user, and defines different limits to either prevent publishing, or warn users to encourage them to further review the contents.

In this way the tool makes it possible for users to use machine translation when making a good use of it, while preventing the creation of lightly reviewed low quality translations. More detail about how these limits work, how they can adjusted to the needs of each language, and how to measure the quality of the content produced with the tool is provided below.

Limits to encourage reviewing the translation
Content translation measures the percentage of modifications that users make on the initial automatic translation provided. In this way the system knows how many words have been added, removed or modified from the initial translation. These measurements are made at two different levels: for each paragraph and for the whole translation. Different limits are applied at each level:

Limits for the whole translation
You cannot publish a translation with 99% or more of unmodified contents for the whole document. This limit is intended to block the most clear vandalism. This prevents users from just adding paragraphs to the translation and publishing them with no edits at all. As detailed below, this limit can be adjusted on a per language basis.

Limits for each paragraph
For each paragraph, the percentage of user modifications is also measured. A paragraph is considered problematic when the paragraph contains more than 80% of the initial contents if machine translation is used (or more than 60% when copying the contents from the source document).

The translation editor will show a warning for each paragraph that is considered problematic, encouraging the user to edit it further. In some cases users are still able to publish but the resulting page may get added to a tracking category of potentially unreviewed translations for the community to review. In other cases, users may not be allowed to publish.

These are some of the factors considered for determining whether to allow the user to publish or not (some of these are still in development):


 * The number of problematic paragraphs. Users are prevented from publishing sections with 10 or more problematic paragraphs. Users can still publish translations with less than 10 problematic paragraphs, but translations with 2 to 9 problematic paragraphs will be added to a tracking category of potentially unreviewed translations for the community to review.
 * Previous deleted translations. For users with some translations deleted in the last 30 days, the limits will be more strict to prevent recurring problems. In those cases, translations with 5 problematic paragraphs or more will be prevented from publishing, while those with 4 or less problematic paragraphs will be added to a tracking category of potentially unreviewed translations for the community to review.
 * User confirmation. A less strict threshold is considered for paragraphs that users marked as resolved, as a signal that the user reviewed and confirmed the status of the translation. For paragraphs where the unmodified content warning was shown but the user marked it as resolved, we apply a less strict threshold (95% of Machine translation or 75% of source content). This will provide a way to accommodate cases where the automatic translation was exceptionally good, but still avoid potential abuse of the feature (i.e., not following blindly the user confirmation).

Contents not affected by the limits
Some contents are not expected to be edited significantly, and they are not considered when applying the limits described above. Very short section titles, citations, or the list of references are excluded from the checks. Otherwise, users may get misleading warnings because of contents such as the book titles in their references that they were not expected to translate.

Adjusting the limits
The provided limits try to provide a set of default mechanisms that are useful, but each language is different and different wikis may need to customize these limits in different ways.

The quality level of machine translation can be very different from language to language. While some language pair may require at least a 70% of modifications to make the final result useful, other may just need 10% of modifications.

Content in a wiki is also very diverse, content full of numeric data and technical names may require less editing.

Tracking potentially unreviewed translations
A tracking category is provided for the community to easily find the articles that were published with some content exceeding the recommended limits.

Measuring the quality
Evaluate content quality automatically is not trivial.

Limits based on user expertise
Some wikis have implemented limits based on the user rights.