Help:Extension:Translate/Translation memories/zh

TTMServer 是翻译扩展自带的翻译记忆服务器. 它不依赖外部，默认是启用的，用以代替了由 translatetoolkit （难以建立）提供的 tmserver 支持. TTMServer 是个简单的翻译记忆，它没有使用高级算法，但它利用了 MediaWiki 优良的语言支持和数据库抽象功能.

TTMServer 有三种不同的使用方法：

配置
包含翻译记忆的所有翻译辅助功能都通过  设置来配置. TTMServer 配置示例：

可能的键值为：

目前只支持 MySQL 数据库.

TTMServer API
如果您想实现自己的 TTMServer 数据库，请看详细说明.

查询参数：

您的服务必须接受下列参数： 您的服务必须提供对象数组中含有键  的 JSON 对象. 这些对象必须包含下列数据： 例如：


 * 链接：http://translatewiki.net/w/api.php?action=ttmserver&sourcelanguage=en&targetlanguage=fi&text=january&format=jsonfm
 * 应答：

TTMServer 架构
后端包含了三个表： 、 和  ，分别对应于源、目标和完整的文本. 您可以在  中看到表格的定义. 源包含了所有信息组定义. Even though usually they are always in the same language, say, English, the language of the text is also stored for the rare cases this is not true.

每个条目都有唯一的 ID 和两个附加字段：长度和上下文. 查询时使用长度作为首个过滤器，这样就无需把要搜索的文本和数据库中每个条目进行比较. 上下文中保存了文本来源的页面标题，例如“MediaWiki:Jan/en”. 根据该信息，我们可以把建议链接到“MediaWiki:Jan/de”，这样有助于译者快速修复问题或确定使用哪个译文.

第二个过滤器来自全文索引. 它的定义与 ad hoc 算法混合. 首先通过 MediaWiki 的  把文本分割为片段（词）. 如果有足够的片段，我们主要去除所有非单词字母的那些内容来常态化. 接着获取开头的十个唯一单词，必须至少五个字节长（英文中的五个字母，对于多字节字符则更少字数）. 然后把这些词保存在全文索引中供将来过滤更长的字符串.

过滤出候选列表后，则从目标表中获取匹配的目标. 然后使用编辑距离算法进行最后的过滤和排序. 定义如下：


 * E : 编辑距离
 * S : 用于搜索建议的文本
 * Tc : 建议文本
 * To : 译文 Tc 的原始文本

通过 E/min(length(Tc),length(To)) 计算 Tc 建议的质量. 我们使用 PHP 内置的 levenshtein 函数，但当某个字符串长于 255 字节时，则使用 PHP 实现的 levenshtein 算法. 尚未测试内置的 levenshtein 是否能正确处理多字节字符. 当源语言不是英文时，这可能是另一个问题（全文索引和分割时）.

有个脚本把活动信息组中的译文填充到翻译记忆. 即使是大站点在通过  参数使用多线程时也能在半小时内自行引导内存. 该时间深深取决于完成信息组的工作量多大（未完成的将会在自举时计算出来）. 通过钩子自动添加新译文. 添加首个译文时添加新源（定义）.

Old translations which are no longer used and do not belong to any message groups are not purged automatically, unless you rerun the bootstrap script. When the translation of a message is updated, the previous translation is removed from the memory. When the definition is updated nothing happens immediately. When translations are updated against the new definition, a new entry will be added. The old definition and its old translations remain in the database until purged by rerunning the bootstrap script. Also fuzzy translations will not be added to the translation memory, but neither are the translations removed from the memory when they are fuzzied.

Solr 后端
Much of the above also applies to the TTMServer using the Solr search platform as backend, except the details on database layout and queries. The results are by default ranked with the levenshtein algorithm on the Solr side, but other available string matching algorithms can also be used, like ngram matching for example.

In Solr there are no tables. Instead we have documents with fields. Here is an example document: Each translation has its own document and message documentation has one too. To actually get suggestions we first perform the search sorted by string similarity algorithm for all documents in the source language. Then we do another query to fetch translations if any for those messages.

We are using lots of hooks to keep the translation memory database updated in almost real time. If user translates similar messages one after another, the previous translation can (in the best case) be displayed as suggestion for the next message.

Initial import
 * 1) Execute ttmserver-export.php command line script for each wiki using the shared translation memory.

New translation (if not fuzzy)
 * 1) Create document

Updated translation (if not fuzzy)
 * 1) Delete wiki:X language:Y message:Z
 * 2) Create document

Updated message definition All existing documents for the message stay around because globalid is different.
 * 1) Create new document

Translation is fuzzied
 * 1) Delete wiki:X language:Y message:Z

Messages changes group membership
 * 1) Delete wiki:Z message:Z
 * 2) Create document (for all languages)

Message goes out of use Any further changes to definitions or translations are not updated to TM.
 * 1) Delete wiki:Z message:Z
 * 2) Create document (for all languages)

Translation memory query
 * 1) Collect similar messages with strdist("message definition",content)
 * 2) Collect translation with globalid:[A,B,C]

Search query Can be narrowed further by facets on language or group field.
 * 1) Find all matches with text:"search query"

Identifier fields Field  uniquely identifies the translation or message definition by combining the following fields: The used format is.
 * wiki identifier (MediaWiki database id)
 * message identifier (Title of the base page)
 * message version identifier (Revision id of the message definition page)
 * message language

In addition we have separate fields for wiki id, message id and language to make the delete queries listed above possible.

安装
Here are the general quick steps for installing and configuring Solr for TTMServer. You should adapt them to your situation. To use Solrbackend you also need Solarium library. Easiest way is to install the Solarium MediaWiki extension. See the example configuration for Solr backend at the configuration section of this page. You can pass extra configuration to Solarium via the  key as done for example in the.

And finally we can populate the translation memory with content.