User:Nasirkhan/Summer of Code 2012

Jump to navigation Jump to search


Name: Nasir Khan Saikat
Project title: Improve the PDF Download tool

Contact/working info[edit]

Timezone: UTC+6
Typical working hours: 8pm - 1am,
IRC or IM networks/handle(s): nasir8891

Project summary[edit]

Wikipedia has a great tool for download the Wikipedia articles as PDF. Along with the Collection extension, multiple articles could be merged together and create a PDF book. It works perfectly for many of the Wikipedia language versions. But unfortunately the support for the Bengali and other Indic language are not complete. The existing tool can format the page properly but the texts are not rendered properly. Characters are misplaced complex characters are displayed as multiple single characters. Here the project idea is to improve/develop a complete solution by which user can download the PDF of the Indic languages Wikipedia articles.

Screenshot of the original article page and the PDF file created by the Wikipedia PDF Download Tool


Required deliverables[edit]

  • Improve the standalone PDF Download tool
  • Integrate this tool with the mediawiki Collection Extension

If time permits[edit]

  • Integrate the new tool in the Wikipedia Indic language Wikis
  • Improve the presentation of the PDF format

Project schedule[edit]

  • Community Bonding Period: Study the library tool and discuss with the mentor about the issues of the existing tools and confirm the priority of the task list and milestones
  • Milestone 1: 3 weeks: Complete the PDF download tool
    • Improve the page formatting
    • Improve the text warping
    • Fix the Unicode URL parsing
  • Milestone 2 : 3-4 weeks: PDF Collection tool
    • Integrate with the existing Collection Extension or build a similar tool
    • Make a complete solution for exporting PDF in Indic languages
  • Milestone 3 : 2 weeks : Test the tool
    • Test the features thoroughly
    • 1 week : Wikipedia integration test
    • 3 week : test and documentation

About you[edit]

I am studying Computer Science and Engineering at United International University, Dhaka, Bangladesh. I was involved in development in Java and recently studying PHP and Python. Fortunately the project idea and the GSoC it aligned to the same goal.

I am contributing to Wikipedia for last 4 years. I am active in Bengali Wikipedia and Wikimedia Commons. Recently we have formed the local chapter, Wikimedia Bangladesh of the Wikimedia Foundation. I am one of the founding members of the chapter.

We were planning for offline outreach activities and i found that PDF download can help us a lot. But due to some of the bugs it can not server the purpose. I think i have the ability to study and solve the issues and i have to do that. If i failed to resolve the problems i will not be able to execute the next plans.


The first thing is i am not going to build a completely new tool. Form my initial search i found 2/3 partially completed tool. At this moment among all of them PyPDFLib is the only tool which can render the Indic texts properly. So i thing it is better to start form there. The source is available at the Git-hub and i can clone the project as initial code base.

PyPDFLib is not completed yet. It has some bugs and there is opportunities to integrate more features. Like the existing Wikipedia PDF tool can combine and export multiple articles in a single PDF file. But before that i want sure that it can properly export PDF files for Indic language articles .

When the pdf export will completed i will start integrating the this tool with the Wikipedia Collection extension. And Finally if i can finish these tasks successfully i will apply to the Wikimedia Foundation to include and enable this tool for Indic language wikis.

Past open source experience[edit]

Mainly i develop modules for OpenERP and Joomla. I have deployed the DHIS-2 software for a project of Bangladesh Government. I developed a churn management software which was open sourced and using at out university for study purpose. I am the coordinator of our university open source network and executive committee member of Bangladesh Open Source Network. Here i am responsible for arranging out reach events and maintain a free online support mailing list for open source software.

I have a few projects hosted at Github and Bitbucket.

Any other info[edit]

I have the knowledge of programming and at the same time i organize and conduct outreach events for Wikipedia. I am active in Bengali Wikipedia and member of Wikimedia Bangladesh executive committee. The only reason of this project proposal is, if i could complete the project successful then it help me to continue my outreach events properly.

See also[edit]