Help:Extension:Translate/File format support

From mediawiki.org

FFS stands for file format support. This is the general name for a group of classes for reading and writing translated messages in different file formats in the Translate extension.

Software developers reinvented the wheel with localization technologies many times, so there are many different formats for storing translatable software messages. There are two main groups of such formats.

  • Key-based formats: each message has a key, usually a more-or-less meaningful string. Translation to every language is a map of keys pointing to values. Most formats fall into this group, including DTD, JSON, and MediaWiki's old format until 2014 (essentially a PHP array).
  • Gettext-like: The message in the original language of the program, usually English, is itself used as a key that points to translations into other languages. This requires generation of inherently non-stable pseudo-keys for storing the messages in a different format.

Introduction to the FFS classes[edit]

With some exceptions, most FFS class derive from the FFS interface, which defines the basic methods every class must implement:

setWritePath( $target )
Setter for the file name.
getWritePath()
Getter for the file name.
read( $code )
Read the messages from the file and parse it.
readFromVariable( $data )
Read the messages from a string variable that has the same format as the file and return them as an array of AUTHORS and MESSAGES. This is where the actual parsing of the file's text is supposed to happen.
write( MessageCollection $collection )
Write the messages to the file.
writeIntoVariable( MessageCollection $collection )
Write the messages to a string variable that has the same format as the file. This is where the careful construction of the resulting messages file is supposed to happen.

MediaWiki translations[edit]

Classes for storing MediaWiki translations are not currently handled by FFS interface derivatives, but by autonomous classes: there are configuration instructions. New FFS-like classes are being developed.

class SimpleFFS[edit]

The class SimpleFFS is the ancestor of all the other FFS classes, and it is also a simple example of how an FFS class should be written. It implements a simplistic key-based format:

  • each file has two sections, separated by "\0\0\0\0";
  • one section has the translators' names separated by "\0";
  • the other has the translations in "key=value" format, also separated by "\0".

Since SimpleFFS is intentionally simplistic, it demonstrates possible bugs and complications. For example, the "=" character is not escaped, so the key and the value may not contain them. Obviously, this is not something that is suitable for real-world programs. SimpleFFS also implements useful utility methods:

exists( $code )
Tests whether the file exists.
writeReal( $collection )
Implements internals of file format writing, apart from the more generic writeIntoVariable.
filterAuthors
Filter some defined authors from the file according to a custom blacklist. This is useful for filtering usernames of bots, developers and translation administrators, for example.
fixNewLines( $data )
Fix all line endings to Unix-style.

Writing new FFS classes[edit]

All the above SimpleFFS methods can be overridden. Most implementations, however, only need to implement writeIntoVariable and readFromVariable.

General tips when writing new classes:

  • Avoid executing executable file formats. Parse them.
  • Remember to mangle and unmangle message keys.
  • Do not assume message keys don't include problematic characters. They will.
  • The output is usually expected to be pretty and readable. Some people like to poke in them manually.
  • Most formats don't support fuzzy markers, some add them as comments on export only and ignore them on import.

Supported file formats[edit]

The existing FFS classes are:

  • AndroidXml – for use in Android apps
  • AppleFFS – for iOS/Mac OS X Localizable.strings
  • Dtd – for DTD-based projects, like Okawix and Mozilla.
  • FlatPhpFFS – for future use in MediaWiki
  • Gettext – for Gettext-based projects
  • Ini – for INI-based projects
  • JavaScript – for all JavaScript formatted files
  • Java properties – for *.properties files, used in some Java and JavaScript projects, often along with Dtd
  • Json – used in jquery.i18n based projects, such as the portable Universal Language Selector library
  • Yaml – used in Waymarked Trails
  • Ruby (Yaml) – used in OpenStreetMap and Shapado
  • AMD i18n bundle

Examples of Translate exports in those formats are available at translatewiki.net.

FFS classes hierarchy

Mangling the message keys to ensure correct title handling[edit]

The Translate extension is MediaWiki-based and every message is stored as a MediaWiki page, so the key must be a valid MediaWiki page title. Mangling takes care of this by escaping the key names a manner similar to the quoted-printable encoding, but with some modifications before storing the message as a wiki page. Before the message is written back to the file, the message is unmangled.

When an FFS class overrides the functions that call the mangling routines, it must make sure the roundtrip is done correctly – that is, that the key is mangled before writing to MediaWiki and unmangled before writing the translation back to the file.

Mangling is done in the StringMatcher class.

Testing FFS classes[edit]

If you create a new FFS class, create a corresponding testing file in the tests directory. The important things to test are:

  • Parsing of the format: Essentially testing that the readFromVariable function returns the right keys and values for AUTHORS and MESSAGES.
  • Roundtrip: Test that the keys and the messages are written and read correctly.

You can use existing test routines, such as JavaFFSTest, as examples.