Extension:LocalisationUpdate/LUv2

Project Outline
Wikimedia wikis use the LocalisationUpdate extension to get the updated localised messages from translatewiki.net. To accomplish this, the extension downloads the translation files for mediawiki and installed extensions, stores them to local cache and then processes them to check which messages that can be updated.

The described flow is not very efficient since not all the wikis require support for ~300 languages and the extension downloads complete files instead of only downloading the deltas. This process consumes quite some time. Plus the extension requires setting up cron jobs and other manual configuration which can be avoided.

LUv2 aims to fix this with creation of a new service which will keep track of the updates and make them accessible via a push interface and a RESTful api.


 * Bugzilla report: #46653
 * Announcement: to be made
 * Mentors: Kartik Mistry, Niklas Laxström

Approach
The main component of the project is the LUv2 service.

Service
The service comprises an database update mechanism and an api. The service will store the latest version of the translation messages and make them available to clients via the api. The service will be written in node.js and it'll use redis as database.

API
There will be two api endpoints. One with which the clients will communicate in order to fetch the updates and the second endpoint will be used to trigger the database update. The first endpoint will require the clients to provide the project_id, language_id, message_id->message pairs. It'll find out the messages that client needs to update and respond with the message_id->message pairs.

Database Update
To automate database updates the post-receive hook on the projects' remote repo will be configured to hit an api endpoint. This will trigger the database update process. The update script will download the modified/new files to a temporary directory, parse and compare them with the old files, and insert/update the messages in the database. This script will ideally support all the formats that are currently supported by the translate extension.

In case one server goes down the clients should have a secondary source that they can query. Having a read only replica would solve this.

The database will primarily store the following data:

component_id   - Project identifier lang           - language code hash           - Hash of the message id             - Message identifier message        - the message

This service will ideally work with all the projects that are hosted on translatewiki.net. The service will primarily be beneficial to web applications. Using this service these applications will be able to serve the latest translation messages instead of depending on software updates.

Extension
The Localisation Update extension will be rewritten to fetch the updates from the new service.

Future
I plan on maintaining the service and the extension after the gsoc period ends. I'd work with other projects willing to use this service.

The proposal doesn't address the requirement to set up cron jobs. Eliminating this requirement will be beneficial to other mediawiki users since not all shared hosting providers let users set up cron jobs. Push based solution will be best suited for this which will have to be heavily scrutinized in terms of security.

Deliverables

 * The LUv2 service (API + database update mechanism)
 * New LocalisationUpdate extension

Tentative timeline

 * May 12 to May 17: Finalize and document the flow for the database update mechanism.


 * May 19 to May 24: Code the initial update mechanism with support for php arrays, yaml and json.


 * May 26 to May 31: Code the service apis.


 * Jun 02 to Jun 07: Test the code in a sandbox for mediawiki, rails-port, jquery.uls. Set up redis, import translations.


 * Jun 09 to Jun 14: Code the new LocalisationUpdate extension.


 * Jun 14 to Jun 19: Test the all the components in the sandbox. Gather bugs, errors.


 * Jun 23 to Jun 28: Mid-term evaluations. Fixing bugs.
 * Jun 30 to Jul 05: Write missing tests, documentation. Gather-fix bugs.


 * Jul 07 to Jul 12: Discuss the possible solutions to tackle the cron dependency issue.


 * Jul 14 to Aug 02: write and test the code according to consensus of discussion


 * Aug 04 to Aug 09: Clean up, make improvements in code, documentation if needed.


 * Aug 11: Pencils down.