Kiwix/ZIM incremental updates

From mediawiki.org

ZIM incremental updates for Kiwix (Offline Wikipedia)[edit]

Name and contact information[edit]

Name: Kiran Mathew Koshy
Email: kiranmathewkoshy@gmail.com, kiran.ee11@iitp.ac.in
IRC or IM networks/handle(s): Kiran_
Location: Kerala, India
Typical working hours: 11:00 A.M. to 6:00 P.M, 9:00 P.M to 3:00 A.M.

Tutors[edit]

  • Mentor 1: Emmanuel 'Kelson' Engelhart IRC: Kelson on #kiwix (Freenode)
  • Mentor 2: Tommi 'tntnet' Mäkitalo IRC: tntnet on #openzim (Freenode)

Synopsis[edit]

Wikipedia has played a tremendous role in making the world's information available for free. Using Kiwix, with files in the ZIM format, it is possible to have a local copy of Wikipedia.

Unfortunately, a feature that is missing is a proper update feature. As of now, users need to download the full database every time they need to update. This is quite cumbersome and/or impractical for a user with a slow Internet connection.

This project was thought up in order to provide an efficient and easy solution to update local copy of Wikipedia. Once the project is finished, people will be able to get automatically incremental updates of their Wikipedia local copy. This will greatly benefit many schools/other institutes in developing regions.

To achieve to do that, we want to allow incremental update of ZIM files. This incremental update feature will be developed in a generic manner and then specifically integrated to the Kiwix solution. This improvement will save ~80% of user's bandwidth during Wikipedia update.

Deliverables[edit]

The incremental update system is based on the development of following stuff:

  • zimdiff, a command line tool able to compute the difference between two ZIM files. zimdiff must produce a ZIM diff file based on the ZIM format. zimdiff will be mostly run by ZIM file providers, on a server, every time a new release is available. The generated ZIM diff will then be downloaded by the final user, manually or directly by Kiwix (or any other ZIM reader). zimdiff must be developed in C++, be portable code (Windows, OSX, Linux) and be based on the zimlib where the whole zimdiff internal logic must be implemented.
  • zimpatch, is a command line tool able to merge a ZIM diff file with its corresponding ZIM file. It is used to patch an existing ZIM file using the ZIM diff file as the input. zimpatch will provide two different methods to patch: a simple merge with a rewriting of the index (fast, requires more storage) and a real merge which will recompute a new ZIM file (slow, requires less storage).
  • Kiwix integration is the last part of the work which will bring this new feature to the final user. This include the creation of:
    • server side, a script to keep an up2date database of ZIM diff files,
    • server side, a script to update the library.xml Kiwix catalog file when new ZIM diff files are available,
    • server side, a web oriented web solution allowing people to be informed per emails about updates
    • client side, a dialog box offering to download/open and merge if new updates are available.
    • client side, deal with manual update if ZIM diff files are provided separately

Timeline[edit]

The whole project is thought to be 13 weeks (3 months) and will start after Community Bonding period necessary to study the existing ZIM file format, the zimlib library and the Kiwix source code.

  1. Coding zimdiff, zimlib improvement and creation of binary (1 weeks)
  2. Coding zimdpatch, zimlib improvement and creation of binary. (3 weeks)
  3. Bug Hunt, tests, bug fixes and optimizations (2 weeks)
  4. Integrating in Kiwix, coding server side scripts and modify Kiwix (3 weeks)
  5. Bug Hunt, extensive testing, bug fixes (2 weeks)
  6. Email notification, small solution to allow people getting emails if ZIM file is updated (1 week)
  7. Deployment, The final code is deployed to the Kiwix project (1 weeks)

About Me[edit]

By the time you evaluate this application, I would have completed 2 years of my undergraduate studies at IIT Patna, India. Programming has been my passion for the last 6 years.

Languages: C/C++, Python, PHP, etc.

Hobbies: CUDA programming, Robotics, etc.

By completing this project, I would be playing a good role in providing information to less privileged people around the globe, which is the reason I came up with this idea. The amount of knowledge in Wikipedia is so vast that I'm sure this project would help a lot of people.I have participated in a few FOSS activities in our campus in the past.

Participation[edit]

I'm online on IRC during my work hours, and can be found on #wikimedia-dev, #wikimedia-wikidata, #kiwix and #openzim.

For discussions, I will be using the wikitech-l mailing list.

There is only a 3 hour time difference between my mentors and me, so communication should be quick and easy.

The project will require a lot of documentation and testing, so I will have something to do when the code is up for review.

Past open source experience[edit]

In the past, I have contributed to a few Open Source programs in our college, mostly in C++ and Python, and a couple of Android hacks.These include a network simulation for optical networks, an implementation for using the MediaFire API, a python script for controlling an Arduino micro controller. These are hosted in my GitHub profile.

Monthly: reports[edit]

Reports[edit]

See also[edit]