Toolserver:Translatewiki.net interface

translatewiki.net is a wiki and community for translating user interfaces. This page describes how the toolserver could make use of the services of translatewiki.net for translating the user interfaces of web based tools.

This proposal aims to establish a minimal standard as a starting point, to be extended later. It consists of several parts, described in separate sections below.

This proposal covers, for now, only tools written in PHP and message files represented in YAML.

Flow of Information
The flow of information is as follows:


 * 1) messages are defined by a tool's maintainer as entries in message files
 * 2) messages get imported into translatewiki.net
 * 3) messages get translated at translatewiki.net
 * 4) messages get exported from translatewiki.net to the toolserver, into an intermediate database layer
 * 5) messages are used by the tool by looking them up in that intermediate database layer. If not found there, they are taken from the message file
 * 6) messages get exported to message files and merged with the tool's repository version

Client Library
The client library provides a standardized way for tools to access localized messages. When the localized text of a message is requested, it is first looked up in the database. If it does not exist in the database in the requested language, it is taken from a message file. If it does not exits in the message file, the process is repeated along a fallback chain of languages or language variants.

A prototype implementation for this client library written in PHP is presented in TSMessages.php.

Message Files
Message files contain mappings from a message keys to message texts in different languages. For a start, the following format will be supported for message files:


 * Message files use the YAML format
 * There is one file per language
 * The name of the message files is the ISO 639 code for the language this file contains, with the extension ".yml" appended.
 * pending: spec for language variant. Allow IANA codes.
 * See MediaWiki's Names.php. If it's in there, it is supported, if it's not in there, it's not.
 * All message files are in one directory, preferably called "i18n".
 * All files in that directory that have the extension ".yml" and do not start with "." are considered message files.
 * The YAML structure inside the file is as follows:
 * The root node uses the language code (see above) as a key.
 * The value assigned to the root key is a map that assigned values to keys, where each value is either a string value (a message text) or itself such a map as described here.

Example of a message file for english (en) called i18n/en.yml:

en: login: "Login" welcome_msg: "Welcome!" colors: red: "red" green: "green" blue: "blue"

When accessed through the TSMessages interface, the messages on the top level would be called "login" and "welcome_msg". The nested messages would have their full "path" as a name, using a dot (".") as a separator, e.g. "color.red", etc.

Message Database
The message will be stored in the database in a table with the following structure:

language VARCHAR(12) NOT NULL component VARCHAR(64) NOT NULL name VARCHAR(64) NOT NULL message MEDIUMTEXT DEFAULT NULL

The combination of language, component and name identifies the message unambiguously. Note that for "personal" tools (as opposed to multi-maintainer-tools), the component name should contain the user name. So, the message for the color red in john's cool tool would have component = "~john/cool" and name = "color.red".

The database may later be extended with flags for fuzzing or fields for versioning.

Message Import
translatewiki.net will import messages from the toolserver at regular intervals (probably daily). Files are loaded from the Toolserver using SVN checkout/update via HTTP(S). All messages from the toolserver will be part of one meta-project ("out-toolserver-org"), and each project (tool) will be handled as a sub-module. If a toolserver user wants to maintain all his/her tools as a single project, or as multiple projects, is left to the user.

The list of files or directories to check out is read from a file located at a canonical URL, namely . This file is compiled automatically by a script on the toolserver, see below.

The translatewiki.yml contains a list of modules (projects), and the SVN URL of the localization directory for that project, among other information. Example:

"~john/cool": i18n_svn: "https://svn.toolserver.org/svnroot/john/cool/trunk/i18n" contact: "john" format: "yaml" "awesome": i18n_svn: "https://svn.toolserver.org/svnroot/awesome/trunk/messages" contact: "sam" format: "yaml"

This defines two projects, one multi-maintainer ("awesome") and one user-specific ("~john/cool"). For each project, there's an SVN URL in the i18n_svn field, specifying where the message files for that project are located. The "contact" field specifies a user to contact, either as a full email address, or as a toolserver user name (which can be expanded to an email address by appending "@toolserver.org"). The "format" field specifies how the message files are to be read. This will always be "yaml" for now, but other options may become available in the future.

For now, as well, the proof-of-concept-ish PHP script has been made for the Citations' template generator and is available at http://toolserver.org/~holek/test/yaml.php. The script is very simple: it takes the whole messages array and produces output in YAML format to import. In the future, it will be expanded to import i18n files of toolserver users and will be available to use at regular intervals to update imported messages.

Message Export
translatewiki.net will export all messages for toolserver projects in regular intervals (perhaps daily). The export will be done as a single file in TSV format (more precisely, the variant used by MySQL's LOAD DATA INFILE command). The columns in this file correspond to the columns in the database table described above.

The exported message table will be made available for download via HTTP in compressed form, at a well known URL, perhaps . It will be downloaded automatically by a cron job running on the toolserver (see below).

Building the repository list
The file translatewiki.yml is used by translatewiki.net to determine the location of the localization files for the individual toolserver projects. It is build automatically by combining the information from yml files from individual user directories: each user my create a file called .translatewiki.yml in their home directory, containing a list of projects and the corresponding information for these projects, as specified in the section describing the message import above. The script collecting all these files into a single, global translatewiki.yml shall first apply the following to each per-user .translatewiki.yml:
 * check yml syntax. Try to parse the file.
 * add missing information, e.g. the user's name in the contact field.
 * apply sanity checks - perhaps prohibit references outside the user's own SVN repo.

This process for building the global translatewiki.yml shall by run from a cron job on a regular basis, perhaps daily.

Importing translated messages
translatewiki.net exports all the messages for toolserver projects on a regular basis, into a (compressed) TSV file. This file will be downloaded to the toolserver via HTTP, and then imported directly into MySQL, into the table used by TSMessages for looking up messages. Safeguards shall be applied to protect against importing empty or damaged export files.

This import script shall be run from cron on a regular basis, perhaps daily.

Outlook

 * Support more formats: Java .properties, gettext, ...
 * Support export to message file (on translatewiki, or on the toolserver?)