User:Nasirkhan/Summer of Code 2012

Identity
Name: Nasir Khan Saikat Email: nasir8891@undefinedgmail.com Project title: Improve the PDF Download tool

Contact/working info
Timezone: UTC+6 Typical working hours: 8pm - 1am, IRC or IM networks/handle(s): nasir8891

Project summary
Wikipedia has a gaet tool for download the Wikipedia articles as PDF. Along with the Collection extension, multiple articles could be merged together and create a PDF book. It works perfectly for many of the Wikipedia language versions. But unfortunately the support for the Bengali and other Indic language are not complete. The existing tool can format the page properly but the texts are not rendered properly. Characters are misplaced complex characters are displayed as multiple single characters. Here the project idea is to improve/develop a complete solution by which user can download the PDF of the Indic languages Wikipedia articles.


 * Screenshot of the original article page and the PDF file created by the Wikipedia PDF Download Tool

Required deliverables

 * Improve the standalone PDF Download tool
 * Integrate this tool with the mediawiki Collection Extension

If time permits

 * Integrate the new tool in the Wikipedia Indic language Wikis
 * Improve the presentation of the PDF format

Project schedule

 * Community Bonding Period: Study and discuss with the mentor to select the library to extend
 * Milestone 1: 3 weeks: Complete the PDF download tool
 * Improve the page formatting
 * Improve the text warping
 * Fix the Unicode URL parsing
 * Milestone 2 : 3-4 weeks: PDF Collection tool
 * Integrate with the existing Collection Extension or build a similar tool
 * Make a complete solution for exporting PDF in Indic languages
 * Milestone 3 : 2 weeks : Test the tool
 * Test the features thoroughly
 * 1 week : Wikipedia integration test
 * 3 week : test and documentation

About you
I am studying Computer Science and Engineering at United International University, Dhaka, Bangladesh. I was involved in development in Java and recently studying PHP and Python. Fortunately the project idea and the GSoC it aligned to the same goal.

I am contributing to Wikipedia for last 4 years. I am active in Bengali Wikipedia and Wikimedia Commons. Recently we have formed the local chapter, Wikimedia Bangladesh of the Wikimedia Foundation. I am one of the founding members of the chapter.

We were planning for offline outreach activities and i found that PDF download can help us a lot. But due to some of the bugs it can not server the purpose. I think i have the ability to study and solve the issues and i have to do that. If i failed to resolve the problems i will not be able to execute the next plans.

Participation
The first thing is i am not going to build a completely new tool. Form my initial search i found 2/3 partially completed tool. At the very beginning of the project i will select the one which will be better to use. Wikipedia extensions are hosted at Mediawiki extension site, and another tools are hosted at source-forge and non-gnu site. for working with these tools i have to use a common space which either the Google code or the Git-hub.

I have fix some of the issues of the PDF render library. there are a few libraries available which can render the Unicode Indic texts. Wikipedia is using the ReportLab library but it has some issue with Indic texts, PyPDFLib is good for Indic texts but it is integrated with the Wikipedia Collection Extension, and TCPDF says that it have the ability to render the Unicode texts properly. So the initial task is to select one tool and start to improve the tool.

I am planning to start with PyPDFLib but before that i will study a little to be sure that the selection is right. Here i have to work on page formatting. Then the next step is to prepare the tool to combine multiple articles into one file. It could be done by integrating with the Collection extension or build the feature.

After this stage integration with the Wikipedia have be tested.

Past open source experience
Mainly i develop modules for OpenERP and Joomla. I developed a churn management software which is open sourced and using at out university for study purpose. I am the coordinator of our university open source network and executive committee member of Bangladesh Open Source Network. Here i am responsible for arranging out reach events and maintain a free online support mailing list for open source software.

Any other info
I am a coder and at the same time i organize and conduct outreach events for Wikipedia. I am active in Bengali Wikipedia and member of Wikimedia Bangladesh executive committee. The only reason of this project proposal is, if i could complete the project successful then it help me to continue my outreach events properly.