User:Sanja pavlovic/GSOC/OPW application

Contact info

 * Name: Sanja Pavlovic
 * E-mail: sanja.pavlovic@vikimedija.org, sanjoo911@gmail.com, pavlovic.sanja.91@gmail.com
 * IRC: sanjup
 * Location: Belgrade, Serbia

Project synopsis
Incremental data dumps


 * We offer data dumps of Wikipedia and other Wikimedia projects, allowing people to access this knowledge where Internet connection is missing, slow or expensive, to research edit patterns and data-mine our vast knowledge base. The dumps for the larger projects are only getting larger i.e. 40GB for English Wikipedia. What is more, the update a month later will be another 40GB or more. In fact, only a small subset of that information is actually changed in the form of new pages, new revisions, or deleted revisions. Imagine if users of these files could download just the changes, plus a script that applied the changes. Imagine if the dumps could be written out using the previous month's dumps with such a scheme. Imagine running the German language Wikipedia dumps in 3 days instead of the current 16. This could be achieved by designing the right output format for the XML files containing text for all revisions.

Possible mentor
Ariel Glenn

The timeline of the project

 * First 2 weeks:
 * Getting familiar with the present code for database dumps.


 * Week 3:
 * Thinking about posible solutions and testing them.


 * From week 4 to week 7:
 * Implementing the best solution


 * From week 8 to week 10:
 * Testing the code

About me
Hi, all!

My name is Sanja Pavlovic. I live in Belgrade, Serbia. I am currently in my third year of Journalism and Communicology studies at the Faculty of Political Sciences, University of Belgrade. I volunteer at Serbia's weekly magazine "Time".

Last year I started contributing to Wikinews, after which I soon wanted to become a member of Wikimedia Serbia. In a few months I was elected president of the Wikimedia Serbia's Media board. I held that position for about 7 months, after which I became a member of the Wikimedia Serbia Board. In that capacity, I actively participate in the Wikimedia Serbia's decision making process, also helping out in implementation of its projects and ideas.

During the last few months I became interested in programming, so I started learning HTML, Python, PHP, and administration skills by myself, and shortly after I started coming to workshops in Belgrade's hackerspace, Hacklab Belgrade, where I am learning a lot.