Kiwix/ZIM incremental updates

ZIM incremental updates for Kiwix (Offline Wikipedia)

 * Public URL: https://www.mediawiki.org/wiki/User:Kiran_mathew_1993/ZIM_incremental_updates_for_Kiwix
 * Bugzilla report: https://bugzilla.wikimedia.org/show_bug.cgi?id=47406
 * Announcement: http://lists.wikimedia.org/pipermail/wikitech-l/2013-May/069006.html (tech details: http://lists.wikimedia.org/pipermail/wikitech-l/2013-April/068931.html)

Name and contact information
Name: Kiran Mathew Koshy Email: kiranmathewkoshy@gmail.com, kiran.ee11@iitp.ac.in IRC or IM networks/handle(s): Kiran_ Location: Kerala, India Typical working hours: 11:00 A.M. to 6:00 P.M, 9:00 P.M to 3:00 A.M.

Tutors

 * Mentor 1: Emmanuel 'Kelson' Engelhart IRC: Kelson on #kiwix (Freenode)
 * Mentor 2: Tommi 'tntnet' Mäkitalo IRC: tntnet on #openzim (Freenode)

Synopsis
Wikipedia has played a tremendous role in making the world's information available for free, but so far, an active internet connection is required for accessing the latest information. This project was thought up in order to make Wikipedia available to remote places without a proper internet connection.

Using the Kiwix project, it is possible to have a local copy of Wikipedia. However, a feature that is missing is a proper update feature, by which the data is updated once in a while. As of now, users need to download the full database every time they need to update, and this is quite cumbersome and/or impractical for a user with a slow internet connection.

Once the project is finished, this would greatly benefit many schools/other institutes in developing regions of the world. It will enable them to keep a local cache of the data, which updates itself automatically.

Deliverables
1. Two tools, zimdiff and zimpatch will be implemented in C++. Their details are given below: a. zimdiff : This will be used to compute the difference between two ZIM files. This will be run on the server, and will be used every time a new reease is available, to compute the changes made to the ZIM file. Using this, a ZIM diff file is generated which will then be downloaded by the client. b. zimpatch : This tool will run on the client, and will be used to patch an existing ZIM file using the ZIM diff file as the input. There will be two different ways to implement zimpatch, and both will be implemented. Method 1: simple merge of the file and rewriting of the index(fast,requires more storage) Method 2: recompute a new file (slow, requires less storage).

c. Integrating zimpatch and zimdiff into the existing Kiwix code. The ZIM diff file will be generated automatically by the server, and once the ZIM diff file is downloaded, it will be automatically added to the existing ZIM file by the client-side Kiwix code.

Note that there will be two ways of downloading implemented, by which the program will either download the diff file automatically or will update from a file provided by the user. d: An additional functionality of notifying the users about available diff files through email(for clients opting for manual update) will also be provided.

Timeline:

Total duration: 3 months/13 weeks(excluding community bonding period).

Community Bonding period: Study the existing ZIM file format, the zimlib library and the Kiwix source code.

Phase 1: -coding  Implementing zimdiff-server code. The code will be developed as a separate C++ program. Duration: 1.5-2 weeks

Phase 2: -coding Implementing zimdpatch -client side code. This will also be implemented separately in C++. It will not be integrated into Kiwix source code. Duration: 1.5-2 weeks

Phase 3: -Bug Hunt Tests, Bug fixes and optimizations for the above tools- Duration: 2 weeks

Phase 4: -coding Integrating zimdiff and zimpatch with the server and client side code. Duration: 1-1.5 weeks

Phase 5: -testing, bug fixes Extensive testing, bug fixes if any. A full sets of tests will be done on a ZIM copy of Wikipedia. Documentation is done. Duration: 3 weeks

Phase 6: - Email notification feature Extremely simple to implement. Duration: 1 week

Phase 7: -Deployment The final code is deployed to the Kiwix project. Duration: 1.5 weeks

About you
By the time you evaluate this application, I would have completed 2 years of my undergraduate studies at IIT Patna, India. Programming has been my passion for the last 6 years. Languages: C/C++, Python, PHP, etc. Hobbies: CUDA programming, Robotics, etc. I'm a big fan of FOSS. By completing this project, I would be playing a good role in providing information to less privileged people around the globe, which is the reason I came up with this idea. The amount of knowledge in Wikipedia is so vast that I'm sure this project would help a lot of people. I have participated in a few FOSS activities in our campus in the past.

Participation
We don't just want to know what you plan to accomplish; we want to know how. Briefly describe your work style: how you plan to communicate progress, where you plan to publish your source code while you're working, how and where you plan to ask for help. (We will tend to favor applicants that demonstrate a clear vision for what it means to be an active participant in our development community.)

Past open source experience
Do you have any past experience working in open source projects (MediaWiki or otherwise)? If so, tell us about it! If you have already written a feature or bugfix in a Wikimedia technology such as MediaWiki, link to it here; we will give strong preference to candidates who have done so.

Any other info
Please add any other relevant information -- UI mockups, references to related projects, a link to your proof of concept code, whatever. There are no specific requirements, but we love to see people who love what they're doing. Show us you're excited about this project and have an interest in the background and are considering how best to make your idea work.