Tradução de Conteúdo/Tradução Automática/Clientes de TA
Os serviços de tradução automática (TA) são acedidos utilizando os módulos de cliente na tradução de 'Conteúdo'. Nós temos clientes Apertium e Yandex já escritos no código. É possível adicionar qualquer número de tais clientes de serviço de TA e mapear pares de idiomas. Esta documentação explica a arquitetura do cliente da 'Máquina'.
Requisitos técnicos

A new MT client can be a locally hosted machine translation system or a remote machine translation system accessed through API. API based services are recommended since that allows to isolate it as a service. If the client is free licensed and better packaged for Linux distros, we can consider hosting it in Wikimedia cluster. For example, Apertium is hosted inside wmflabs. On the other hand, Yandex is not hosted by Wikimedia. Both apertium and yandex are accessed using the web APIs.
API de Tradução
Uma API de tradução automática toma o idioma fonte, o idioma de destino, o conteúdo fonte e as saídas do conteúdo traduzido.
- Se a API não for pública, pode aceitar um código de autenticação, principalmente uma chave.
- O formato de saída pode ser JSON para conveniência.
- A API deveria aceitar POST.
- API should not demand any user identifiable information such as user name. CXServer does not provide it to MT Client.
- API should be capable of accepting a reasonable number of requests per minute.
- API should accept a reasonable amount of content per request.
- It is recommended to have a dashboard to analyse the usage of API including requests per day/week/month and Number of characters translated per day/week/month
API must be publicly documented including the error codes.
Linhas diretrizes de desempenho
Content translation is still a beta feature, available only for opt-in logged in users. So the current usage pattern may not be the right assessment for future. Moreover, when we expand the machine translation to more languages, there will be more users and requests. Depending on our current usage, some baselines are given below. Note that this is never going to be the final assessment. APIs must be designed to accept more than this.
- Pelo menos 10.000 pedidos por dia
- Pelo menos 10 milhões de carateres por dia
- Pelo menos 5.000 carateres por pedido
Formato de entrada
The content to translate from CX is HTML formatted. Translating HTML while preserving markup is challenging, but some MT Engines are capable of that (example: Yandex). Apertium does not handle HTML markup. Depending on the capability, CX can send plain text version or HTML of the content.
= Qualidade da tradução
We evaluate the quality of MT by requesting feedback from Wikipedia contributors from the language in context. CX uses MT as an intial translation template and encourage translators to improve it. Because of that unless the quality is quite bad as per the feedback we get, we can use it.
Desenvolver um novo Módulo de Cliente de TA
The best way to learn this is to refer an existing client module like Yandex or Apertium. The client modules are present in cxserver's lib/mt folder. Let us call our client as BabelFish MT Client. Create a file named BabelFish.js in lib/mt folder.
const MTClient = require( './MTClient.js' );
// Construtor de classe
class BabelFish extends MTClient {
/**
* Traduzir conteúdo com BabelFish
*
* @param {string} sourceLang Código de idioma fonte
* @param {string} targetLang Código de idioma de destino
* @param {string} content Conteúdo de idioma fonte
* @return {Promise} Texto traduzido
*/
translate( sourceLang, targetLang, content ) {
// Add your API request to the service. return a Promise object.
}
}
module.exports = BabelFish;
If your BabelFish service is not capable of translating HTML by retaining all markup in appropriate position in translation, instead of translate, you will have to write translateText method in the above code. Refer Apertium.js for such an example. Yandex.js is an example for MT client that is capable of handling both html and text content.
You need to add an entry in lib/mt/index.js for your new client.
To map a language pair to use this client, create a config file in config folder. You may refer exiting configuration files for examples. Then enable this MT engine in the cxserver config.yaml. Here also follow the existing entries for examples.
Restart the cxserver and test your client. You may want to read some unit tests existing for Apertium to write your own tests.
Clientes de tradução automática
The following are machine translation clients that support Content Translation in different languages:
- Apertium (idiomas suportados)
- OpusMT (idiomas suportados)
- LingoCloud (idiomas suportados)
- Google Translate (idiomas suportados)
- Yandex (idiomas suportados)
- Youdao (idiomas suportados)
- Elia (antes conhecido como Matxin) (idiomas suportados)
- MinT (idiomas suportados)