Ajuda:Extensão:Translate/Memórias de tradução

This page is a translated version of the page Help:Extension:Translate/Translation memories and the translation is 48% complete.

A memória de tradução da extensão 'Translate' oferece suporte para ElasticSearch. Esta página tem como objetivo orientá-lo na instalação do ElasticSearch e na exploração das suas especificações com mais detalhes.

Unlike other translation aids, for instance external machine translation services, the translation memory is constantly updated by new translations in your wiki. Advanced search across translations is also available at Special:SearchTranslations if you choose to use ElasticSearch.

Comparação

The database backend is used by default: it has no dependencies and doesn't need configuration. The database backend can't be shared among multiple wikis and it does not scale to large amounts of translated content. Hence we also support ElasticSearch as a backend. It is also possible to use another wiki's translation memory if their web API is open. Unlike ElasticSearch, remote backends are not updated with translations from the current wiki.

	Base de dados	API remota	ElasticSearch
Ativado por predefinição	Yes	No	No
Pode ter múltiplas fontes	No	Yes	Yes
Atualizado com traduções locais	Yes	No	Yes
Acede diretamente à base de dados	Yes	No	No
Acesso à fonte	Editor	Hiperligação	Editor se local, ou hiperligação
Pode ser partilhado como um serviço da API	Yes	Yes	Yes
Desempenho	Prejudicado pelo aumento do volume	Desconhecido	Razoável

Requisitos

ElasticSearch backend

ElasticSearch is relatively easy to set up. If it is not available in your distribution packages, you can get it from their website. You will also need to get the Elastica extension. Finally, please see puppet/modules/elasticsearch/files/elasticsearch.yml for specific configuration needed by Translate.

The bootstrap script will create necessary schemas. If you are using ElasticSearch backend with multiple wikis, they will share the translation memory by default, unless you set the index parameter in the configuration.

When upgrading to the next major version of ElasticSearch (e.g. upgrading from 2.x to 5.x), it is highly recommended to read the release notes and the documentation regarding the upgrade process.

Instalação

After putting the requirements in place, installation requires you to tweak the configuration and then execute the bootstrap.

Configuração

All translation aids including translation memories are configured with the $wgTranslateTranslationServices configuration variable.

The primary translation memory backend must use the key TTMServer. The primary backend receives translation updates and is used by Special:SearchTranslations.

Configuração de exemplo da configuração de TTMServers:

Configuração predefinida
$wgTranslateTranslationServices['TTMServer'] = array( 'database' => false, // Passed to wfGetDB 'cutoff' => 0.75, 'type' => 'ttmserver', 'public' => false, );
Configuração da API remota
$wgTranslateTranslationServices['example'] = array( 'url' => 'http://example.com/w/api.php', 'displayname' => 'example.com', 'cutoff' => 0.75, 'timeout' => 3, 'type' => 'ttmserver', 'class' => 'RemoteTTMServer', );
ElasticSearch backend configuration
In this case the single back-end service will be used both for reads & writes. $wgTranslateTranslationServices['TTMServer'] = array( 'type' => 'ttmserver', 'class' => 'ElasticSearchTTMServer', 'cutoff' => 0.75, /* * See http://elastica.io/getting-started/installation.html * See https://github.com/ruflin/Elastica/blob/8.x/src/Client.php 'config' => This will be passed to \Elastica\Client */ );
ElasticSearch multiple backends configuration (supported by MLEB 2017.04, dropped in MLEB 2023.10)
// Defines the default service used for read operations // Allows to quickly switch to another backend // 'mirrors' configuration option is no longer supported since MLEB 2023.10 $wgTranslateTranslationDefaultService = 'cluster1'; $wgTranslateTranslationServices['cluster1'] = array( 'type' => 'ttmserver', 'class' => 'ElasticSearchTTMServer', 'cutoff' => 0.75, /* * Defines the list of services to replicate writes to. * Only "writable" services are allowed here. / 'mirrors' => [ 'cluster2' ], 'config' => [ 'servers' => [ 'host' => 'elastic1001.cluster1.mynet' ] ] ); $wgTranslateTranslationServices['cluster2'] = array( 'type' => 'ttmserver', 'class' => 'ElasticSearchTTMServer', 'cutoff' => 0.75, / * if "cluster2" is defined as the default service it will start to replicate writes to "cluster1". */ 'mirrors' => [ 'cluster1' ], 'config' => [ 'servers' => [ 'host' => 'elastic2001.cluster2.mynet' ] ] );
ElasticSearch multiple services with single readable service using `writable` configuration (supported by MLEB 2023.04)
With `writable` configuration the following rules are enforced: If `writable` is specified, services marked as `writable` are considered write only and others are considered read only. If no service is specified as `writable` then services are considered both readable and writable. The default service must always be readable. If a service is marked as writable, the mirrors configuration will not be allowed. // Three services configured with one being readable and the others being writable. $wgTranslateTranslationServices['dc0'] = [ 'type' => 'ttmserver', 'class' => 'ElasticSearchTTMServer', 'cutoff' => 0.75, // Default service cannot be marked as write-only ]; $wgTranslateTranslationServices['dc1'] = [ 'type' => 'ttmserver', 'class' => 'ElasticSearchTTMServer', 'cutoff' => 0.75, // Marks this service as write-only 'writable' => true, ]; $wgTranslateTranslationServices['dc2'] = [ 'type' => 'ttmserver', 'class' => 'ElasticSearchTTMServer', 'cutoff' => 0.75, 'writable' => true ]; $wgTranslateTranslationDefaultService = 'dc0';

Possible keys and values are:

Chave	Aplica-se a	Descrição
`config`	ElasticSearch	Configuration passed to Elastica.
`cutoff`	Todos	Minimum threshold for matching suggestion. Only a few best suggestions are shown even if there would be more above the threshold.
`database`	Local	If you want to store the translation memory in a different location, you can specify the database name here. You also have to configure MediaWiki's load balancer to know how to connect to that database.
`displayname`	Remoto	The text shown in the tooltip when hovering the suggestion source link (the bullets).
`index`	ElasticSearch	The index to use in ElasticSearch. Default: ttmserver.
`public`	Todos	Whether this TTMServer can be queried through the api.php of this wiki.
`replicas`	ElasticSearch	If you are running a cluster, you can increase the number of replicas. Default: 0.
`shards`	ElasticSearch	How many shards to use. Default: 5.
`timeout`	Remoto	How long in seconds to wait for an answer from remote service.
`type`	Todos	Type of the TTMServer in terms of results format.
`url`	Remoto	URL to api.php of the remote TTMServer.
`use_wikimedia_extra`	ElasticSearch	Boolean, when the extra plugin is deployed you can disable dynamic scripting on Elastic v1.x. This plugin is now mandatory for Elastic 2.x clusters.
`mirrors` (DEPRECATED Since MLEB 2023.04)	Writable services	Array of strings, defines the list of services to replicate writes to, it allows to keep multiple TTM services up to date. Useful for fast switch-overs or to reduce downtime during planned maintenance operations (Added in MLEB 2017.04). Cannot be used along with the newly added `writable` configuration.
`writable` (Added in MLEB 2023.04)	Write-only services	Boolean value, defined for a service if that service is write-only. The default service (`wgTranslateTranslationDefaultService`) cannot be marked as write-only. If out of all the translation memory services configured, none are marked as `writable` then all services are considered to be readable and writable. See task T322284

You must use the key TTMServer as the array index to $wgTranslateTranslationServices if you want the translation memory to be updated with new translations. Remote TTMServers cannot be used for that, because they cannot be updated. As of MLEB 2017.04 the key TTMServer can be configured with the configuration variable $wgTranslateTranslationDefaultService. Support for Solr backend was dropped in MLEB-2019.10, in October, 2019.

Currently only MySQL is supported for the database backend.

Bootstrap

Once you have chosen ElasticSearch and set up the requirements and configuration, run ttmserver-export.php to bootstrap the translation memory. Bootstrapping is also required when changing translation memory backend. If you are using a shared translation memory backend for multiple wikis, you'll need to bootstrap each of them separately.

Sites with lots of translations should consider using multiple threads with the --thread parameter to speed up the process. The time depends heavily on how complete the message group completion stats are (incomplete ones will be calculated during the bootstrap). New translations are automatically added by a hook. New sources (message definitions) are added when the first translation is created.

Bootstrap does the following things, which don't happen otherwise:

adding and updating the translation memory schema;
populating the translation memory with existing translations;
cleaning up unused translation entries by emptying and re-populating the translation memory.

When the translation of a message is updated, the previous translation is removed from the translation memory. However, when translations are updated against a new definition, a new entry is added but the old definition and its old translations remain in the database until purged. When a message changes definition or is removed from all message groups, nothing happens immediately. Saving a translation as fuzzy does not add a new translation nor delete an old one in the translation memory.

API de TTMServer

Se pretender implementar o seu próprio serviço de TTMServer, aqui tem as especificações.

Parâmetros da consulta:

O seu serviço deve aceitar os seguintes parâmetros:

Código	Valor
`format`	json
`action`	ttmserver
`service`	Identificador de serviço opcional se existirem várias memórias de tradução partilhadas. Se não fornecido, será assumido o serviço predefinido.
`sourcelanguage`	Código de idioma utilizado no MediaWiki, consulte as etiquetas de idioma de IETF e ISO693?
`targetlanguage`	Código de idioma utilizado no MediaWiki, consulte as etiquetas de idioma de IETF e ISO693?
`test`	Texto de origem no idioma de origem

Seu serviço deve fornecer um objeto JSON que deve ter a chave ttmserver com uma matriz de objetos. Esses objetos devem conter os seguintes dados:

Código	Valor
`source`	Texto fonte original.
`target`	Sugestão de tradução.
`context`	Identificador local para a fonte, opcional.
`location`	URL para a página onde se pode ver a sugestão em utilização.
`quality`	Número decimal no intervalo [0..1] que descreve a qualidade da sugestão. 1 significa correspondência perfeita.

Exemplo:

{
        "ttmserver": [
                {
                        "source": "January",
                        "target": "tammikuu",
                        "context": "Wikimedia:Messages\\x5b'January'\\x5d\/en",
                        "location": "https:\/\/translatewiki.net\/wiki\/Wikimedia:Messages%5Cx5b%27January%27%5Cx5d\/fi",
                        "quality": 0.85714285714286
                },
                {
                        "source": "January",
                        "target": "tammikuu",
                        "context": "Mantis:S month january\/en",
                        "location": "https:\/\/translatewiki.net\/wiki\/Mantis:S_month_january\/fi",
                        "quality": 0.85714285714286
                },
                {
                        "source": "January",
                        "target": "Tammikuu",
                        "context": "FUDforum:Month 1\/en",
                        "location": "https:\/\/translatewiki.net\/wiki\/FUDforum:Month_1\/fi",
                        "quality": 0.85714285714286
                },
                {
                        "source": "January",
                        "target": "tammikuun",
                        "context": "MediaWiki:January-gen\/en",
                        "location": "https:\/\/translatewiki.net\/wiki\/MediaWiki:January-gen\/fi",
                        "quality": 0.85714285714286
                },
                {
                        "source": "January",
                        "target": "tammikuu",
                        "context": "MediaWiki:January\/en",
                        "location": "https:\/\/translatewiki.net\/wiki\/MediaWiki:January\/fi",
                        "quality": 0.85714285714286
                }
        ]
}

Backend da base de dados

The backend contains three tables: translate_tms, translate_tmt and translate_tmf. Those correspond to sources, targets and fulltext. You can find the table definitions in sql/translate_tm.sql. The sources contain all the message definitions. Even though usually they are always in the same language, say, English, the language of the text is also stored for the rare cases this is not true.

Each entry has a unique id and two extra fields, length and context. Length is used as the first pass filter, so that when querying we don't need to compare the text we're searching with every entry in the database. The context stores the title of the page where the text comes from, for example "MediaWiki:Jan/en". From this information we can link the suggestions back to "MediaWiki:Jan/de", which makes it possible for translators to quickly fix things, or just to determine where that kind of translation was used.

The second pass of filtering comes from the fulltext search. The definitions are mingled with an ad hoc algorithm. First the text is segmented into segments (words) with MediaWiki's Language::segmentByWord. If there are enough segments, we strip basically everything that is not word letters and normalize the case. Then we take the first ten unique words, which are at least 5 bytes long (5 letters in English, but even shorter words for languages with multibyte code points). Those words are then stored in the fulltext index for further filtering for longer strings.

When we have filtered the list of candidates, we fetch the matching targets from the targets table. Then we apply the levenshtein edit distance algorithm to do the final filtering and ranking. Let's define:

E: edit distance
S: the text we are searching suggestions for
Tc: the suggestion text
To: the original text which the Tc is translation of

The quality of suggestion Tc is calculated as E/min(length(Tc),length(To)). Depending on the length of the strings, we use: either PHP's native levenshtein function; or, if either of the strings is longer than 255 bytes, the PHP implementation of levenshtein algorithm.[1] It has not been tested whether the native implementation of levenshtein handles multibyte characters correctly. This might be another weak point when source language is not English (the others being the fulltext search and segmentation).

Documentação para Extensão:Translate

Tradutores (página da ajuda principal )

Administradores de tradução

Administradores de sistema e programadores

Traduzir este modelo