Google Books, Internet Archive, Commons upload cycle

Google Books > Internet Archive > Commons upload cycle

 * Public URL: https://www.mediawiki.org/wiki/User:8ohit.dua/GSoC_proposal_2014
 * Bugzilla report: Bug - 57813
 * Announcement: https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Google_Books_.3E_Internet_Archive_.3E_Commons_upload_cycle

Name and contact information
Name: Rohit Dua Email: 8ohit.dua@gmail.com IRC or IM networks/handle(s): rohit-dua Location: New Delhi, India Time-zone: UTC+5:30 Typical working hours: 12:00 pm to 5:00 pm, 8:00 pm to 3:00am (IST)

Synopsis
Wikisources all around the world use heavily Google-Books digitizations for transcription and proofreading. The books often are disappeared from the GB database. Currently the users have to manually download a book from GB, then upload them to IA(if they want to preserve) or directly upload to Wikimedia-Commons(again manual task) with appropriate meta-data.

This project focuses on automating all the three altogether! The user will just have to give appropriate description(or identifiers) for the book(s) they wish to upload, and all other task is just automated, notifying user only when their intervention is needed.

Core Libraries/tools used:
 * internetarchive
 * smtplib
 * urllib2
 * IA-Upload

Deliverables
Goals of this project : Required goals :
 * Tool hosted on Tool-Labs with a JavaScript front-end and python core.
 * Check if a book is available on IA
 * If not, search it on GB, check if it is Public Domain
 * Download all its pages and convert to PDF/ZIP
 * Upload to IA with appropriate meta-data
 * Wait for its OCR, when completed notify user via email
 * Upload to Commons using IA-Upload tool.

Project schedule
The above plan could go as expected or invariably re-distribute among the tasks.

Participation
During my work hours, I would always be logged in IRC (channels: #mediawiki, #wikimedia-dev, #mediawiki-labs) and also can always be reached at my email. I'm an computer addict and have hard time staying off of it. All source code I write will be published to my Github repo, although my tool will be hosted on Tool-Labs. At each stage of development I would like to discuss implementation details with the mentors so that there are no delays/issues later on. If face some other doubts or needed feedback I would head over to the mailing list(Wikitech-I).

About you
My name is Rohit Dua, and I'm currently pursuing my B.Tech in Electronics and Communication at the Jaypee Institute of Information Technology, Noida at India. My home-town is New-Delhi, India. I code in Python/JavaScript/C/C++. I'm passionate about computer-security/automation and Coding gets me high! I am new to world of open-source and its community bonding. When I first heard about Open Source at a Linux User Group Meetup at my university, I went crazy about it as I always thought there's no such thing as a free bread, but then there always was free knowledge. Prior to this I never used to go to someone with my programming issues/bugs(online or offline). But now I feel I can grow and learn much faster with community-bondings in the Open Source universe. This project is my first opportunity to bond with an open source organization. GSoC will be my bridge to the open-source community.

Past open source experience
GitHub profile: rohit-dua

Proof of concept code
For the sake of demonstration, as I don't have much past open-source experience(being new!), I have the script to - download any public domain book from GB - here(Python)