User:Nasirkhan/Summer of Code 2012

Identity
Name: Nasir Khan Saikat Email: nasir8891@undefinedgmail.com Project title: Improve the PDF Download tool

Contact/working info
Timezone: UTC+6 Typical working hours: 8pm - 1am(week days), IRC or IM networks/handle(s): nasir8891

Project summary
Wikipedia has a very good tool for download the Wikipedia articles as PDF. Along with the Collection extension, a number of articles could be merged together and create a pd PDF book. It works perfectly for many of the Wikipedia language versions. But unfortunately the support for the Bengali and other Indic language are not complete. The existing tool can format the page properly but the texts are not rendered properly. Here the project idea is to develop a complete solution by which user can download the PDF of the Indic languages Wikipedia articles.

Required deliverables

 * Improve the standalone PDF Download tool
 * Integrate this tool with the mediawiki Collection Extension

If time permits

 * Integrate the new tool in the Wikipedia Indic language Wikis
 * Improve the presentation of the PDF format

Project schedule

 * Community Bonding Period: Study and discuss with the mentor to select the library to extend
 * Milestone 1: 3 weeks: Complete the PDF download tool
 * Improve the page formatting
 * Improve the text warping
 * Fix the unicode url parsing
 * Milestone 2 : 3-4 weeks: PDF Collection tool
 * Integrate with the existing Collection Extension or build a similar tool
 * Make a complete solution for exporting PDF in Indic languages
 * Milestone 3 : 2 weeks : Test the tool
 * Test the features thoroughly
 * 1 week : Wikipedia integration test
 * 3 week : test and documentation

About you
I am studying Computer Science and Engineering at United International University, Dhaka, Bangladesh. I was involved in development in Java and recently studying PHP and Python. Fortunately the project idea and the GSoC it aligned to the same goal.

I am contributing to Wikipedia for last 4 years. I am active in Bengali Wikipedia and Wikimedia Commons. Recently we have formed the local chapter, Wikimedia Bangladesh of the Wikimedia Foundation. I am one of the founding members of the chapter.

We were planning for offline outreach activities and i found that PDF download can help us a lot. But due to some of the bugs it can not server the purpose. I think i have the ability to study and solve the issues and i have to do that. If i failed to resolve the problems i will not be able to execute the next plans.

Participation
The first thing is i am building a completely new tool. Form my initial search i found 2/3 partially completed tool. At the very beginning of the project i will select the one which will be better to use. Wikipedia extensions are hosted at mediawiki extension site, and another tools are hosted at sourceforge and nongnu site. for working with these tools i have to use a common space which either the Google code or the Github.

I have fix some of the issues of the pdf render library. there are a few libraries available which can render the unicode indic texts. Wikipedia is using the ReportLab library but it has some issue with indic texts, PyPDFLib is good for indic texts but it is integrated with the Wikipedia Collection Extension, and TCPDF says that it have the ability to render the unicode texts properly. So the initial task is to select one tool and start to improve the tool.

I am planning to start with PyPDFLib but before that i will study a little to be sure that the selection is right. Here i have to work on page formatting. Then the next step is to prepare the tool to combine multiple articles into one file. It could be done by integrating with the Collection extension or build the feature.

After this stage integration with the Wikipedia have be tested.

Past open source experience
Do you have any past experience working in open source projects (MediaWiki or otherwise)? If so, tell us about it! If you have already written a feature or bugfix in a Wikimedia technology such as MediaWiki, link to it here; we will give strong preference to candidates who have done so.

Any other info
Please add any other relevant information -- UI mockups, references to related projects, a link to your proof of concept code, whatever. There are no specific requirements, but we love to see people who love what they're doing. Show us you're excited about this project and have an interest in the background and are considering how best to make your idea work.