Topic on Talk:Content translation

Presentation of language pairs

3
Trondtr (talkcontribs)

Now, the tool presents the languages but not the language pairs. As a result, we will have to find viable pairs (pairs for which the MT resource exists) via trial-and-error. The tool is good (when it works), so I see the need for using it also without the MT component, but there should be a list of language pairs in stead of (in addition to?) the list of languages.

Also: I know the Apertium MT system, and there are language pairs there (e.g. sme-nob) that do not work in this tool. So: The tool seems to present only a subset of the Apertium pairs (reasonably enough). But we want to know which ones.

Pginer-WMF (talkcontribs)

Thanks for the feedback, @Trondtr.

As part of facilitating the process of starting a new article we want to surface information that may help users to make such choice (more details in this ticket). From the feedback we got, aspects such as the quality of the content seem relevant but the availability of tools does not seem to have a big role in that process, since the languages users can select are very limited by the languages they know. So we have not considered to surface this info in the language selection process.

However, we considered to better indicate when MT is not available once users land into the translation editor (ticket), to better set user expectations (some users assume that MT is available for all pairs). I'm interested in knowing more about the specific usecase you have in mind. So feel free to add more details on the decision process and in which way knowing about the availability of MT is useful to make decisions.

If you are interested in the list of supported language pairs from a technical perspective, this configuration file shows the language pairs supported (in the format "source: [target languages]"). If you are interested in getting the list through an API for machine consumption, it is also available. We also have a wiki page, but the former resources are expected to be more up to date since they are part of the code.

Regarding Apertium, our goal is to expose all languages supported by Apertium. I reopened this ticket to capture the missing one you mentioned (sme-nob), but feel free to add more examples to the ticket if you find more missing pairs.

Trondtr (talkcontribs)

I made a suggeston for indicating pairs (in the bug database you refer to).

Use cases: People understand more languages than they are able to edit on. The use case is thus "I want more articles for my language, what languages may I get help in translating from". From bigger to smaller WP versions I then go through categories (e.g. Members of Parliaments across centuries, geographical categories, ...) in the source language, check that the article is not present in the target language, and translate.

I have earlier used Apertium to translate from Norwegian Bokmål to Nynorsk, and time saving has been appr. 75% compared to manual translation (back then I had my own perl script for pre- and postprocessing, it had its defects, e.g. inflected words with baseforms being hyperlinks came out wrong, etc. With this new tool time saving is more like 90-95% (the most annoying problem is that the target language has links to the source language (i.e. [[:nb:linkname|linkname]] instead of (a possibly red) [[linkname]]). We want the latter, not the former.

More language pairs: From my perspetive, Apertium pairs between the north Germanic languages come high on the list, as would fin-sme. Several of them are still poor, but work is underway, and including them here would also inspire more people to participate in improving those translation pairs. I myself work with translation between Saami languages (sme-smn, sme-smj, sme-sma), these are still in an embryonic phase, but a bigger problem is that the quality of the articles on the sme WP is not too good, and sma, smn, smj are in the incubator. So the principled question, both in this case, and for the bulk of the Apertium pairs, is whether one should enlarge this software to make it possible to translate to incubator WPs. The argument in favour is that it will support the development of both the WPs and not the least the Apertium pairs, and the argument against it is that it will make it easier for the "language collectors" (people writing on a large number of minority language WP versions although they neither know the language nor understand its structure) to make minority language articles, thereby effectively destroying both the WP and the internet literacy of the language in question. Not google-indexing incubators + a hard voting procedure coule perhaps minimize negative effects. Today's practice of adding a category for these articles will also make it easier for native speakers to patrol and minimize possible damage caused by writers not knowing the target language.

The biggest problem now (working on nb->nn) is that only 1 in 3 (1 in 10, I have the feeling of just now) of the article I start translating get a green PUBLISH TRANSLATION button. For the bulk of my attempts, the button just stay grey. Imperative for me now is to understand why, so that I can either avoid the article in beforehand, or leave out the problematic part of the source article (if that helps). If the problem is my browser setting (I have turned off and on AdBlock, e.g.), I would have liked feedback on that. I intend to demo this tool on a course in Apertium MT next monday, and would really not like to stand in front of the audience and test 8-10 articles for every article I can publish.

Reply to "Presentation of language pairs"