Extension:LocalisationUpdate/LUv2

Code: https://github.com/konarak/LUv2/tree/epicmess (please bear with me, it'd habe much nicer, proper commits in about a week)

LUv2: A generic, efficient localisation update service

 * Public URL: https://www.mediawiki.org/wiki/Extension:LocalisationUpdate/LUv2
 * Bugzilla report:
 * Announcement:
 * Updates: /Updates/

Identity and Contact

 * Name: Konarak Ratnakar
 * Email: konarak.11 at google dot com
 * Typical working hours: 0930 to 2130 UTC
 * IRC: kondi
 * Location: Ahmedabad, India

Project Outline
Wikimedia wikis use the LocalisationUpdate extension to get the updated localised messages from translatewiki.net. To accomplish this, the extension downloads the translation files for mediawiki and installed extensions, stores them to local cache and then processes them to check which messages that can be updated.

The described flow is not very efficient since not all the wikis require support for ~300 languages and the extension downloads complete files instead of only downloading the deltas. This process consumes quite some time. Plus the extension requires setting up cron jobs and other manual configuration which can be avoided.

LUv2 aims to fix this with creation of a new service which will keep track of the updates and make them accessible via a push interface and a RESTful API.


 * Assigned mentors: Kartik Mistry, Nikerabbit, Amir Aharoni

Deliverables
The main component of the project is the LUv2 service.

Service
The service comprises a database update mechanism and an API. The service will store the latest version of the translation messages and make them available to clients via the API. The service will be written in node.js and it'll use redis as database.

API
There will be two API endpoints. One with which the clients will communicate in order to fetch the updates and the second endpoint will be used to trigger the database update.

There are multiple ways in which we can know whether updates available for the client or not. The basic option would be to depend on the client to send the project_id, language_id, message_id->message pairs and comparing them with the ones in the database. This option is not most efficient, as Niklas hinted.

Another option would be to generate the hashes of localisation files on server on every database update and requiring the client to send its hash, comparing it with the server's. Yet another option would be to depend on clients to send the timestamp of last update and comparing it with messages' timestamps.

(To allow or not to allow multiple projects in single query?)


 * The client would make the following post request:

api.translatewiki.net ? project_id=mediawiki & language_ids=gu|hi & lastupdated=1396682587


 * The git hook would make the following post request:

api.translatewiki.net ?

Database Update
To automate database updates the post-receive hook on the projects' remote repo will be configured to hit an API endpoint. This will trigger the database update process. The update script will download the modified/new files to a temporary directory, parse and compare them with the old files, and insert/update the messages in the database. This script will ideally support all the formats that are currently supported by the translate extension.

(Alternatively, we can also get the messages directly from the twn elasticsearch, for that I need to think of a way by which I can notify the service that updates are available. Since we can get the same messages via the elasticsearch store in a standard format the different file format parsers won't be required)

In case one server goes down the clients should have a secondary source that they can query. Having a read only replica would solve this.

The database will primarily store the following data:


 * last modified timestamp
 * project identifier
 * message identifier
 * language code of the message
 * message string

This service will ideally work with all the projects that are hosted on translatewiki.net. The service will primarily be beneficial to web applications. Using this service these applications will be able to serve the latest translation messages instead of depending on software updates.

Cron dependency
I have left a significant part of the schedule open to research, discuss and implement the best possible solution for eliminating the cron requirement. Push based solution will be best suited for this which will have to be heavily scrutinized in terms of security. One possible solution would be to use an event based approach in which the client will register an endpoint with the service which will be notified when updates are availble. Another possible solution would be to use the PubSubHubBub protocol for pushing the updates. This functionality will be beneficial for the mediawiki users who use shared hosting (lot of them don't let users set up cron jobs) and who are not used to setting up cron jobs.

Extension
The Localisation Update extension will be rewritten to fetch the updates from the new service.

Future
I plan on maintaining the service and the extension after the gsoc period ends. I'd work with other projects willing to use this service.

About you
I'm based out of Ahmedabad and currently I'm pursuing a diploma in engineering in information technology. I've been involved in various wikimedia activities since early 2012, mostly outreach. I've contributed to English and Gujarati wikipedias, Wikidata and Commons.

Read somewhere on the internet.
 * How did you hear about this program?

I'll have exams in late May. Apart from that I'll be mostly working on the project itself, if selected.
 * Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?

Participation
I keep a log of `in progress` or `done` tasks, I plan on doing the same for this project. I'll publish these updates on mediawiki.org.
 * Progress tracking

I'll regularly publish the source code on a public git repository, either on gerrit or git.
 * Source code

I'm almost always online on IRC, particularly active on #wikipedia-en and #mediawiki-i18n. Emailing me would be the second best way to get a quick response.
 * Communication

Past experience
I've mostly been a consumer of open source software to this day and I wish to contribute code now, hopefully starting with this project. I've recently fixed a parsoid bug and gotten myself familiar with mediawiki's development process.
 * Please describe your experience with any other FOSS projects as a user and as a contributor:

Other than this project, I'm very much interested in the mediawiki API and wikidata.
 * What project(s) are you interested in (these can be in the same or different organizations)?