Google Books, Internet Archive, Commons upload cycle/Progress

(Automation Tool) Google Books > Internet Archive > Commons upload cycle

Public URL: //www.mediawiki.org/wiki/Google_Books,_Internet_Archive,_Commons_upload_cycle
Bugzilla report: Bug - 57813
Hosted on tools-lab: http://tools.wmflabs.org/bub/
Maintained on github: https://github.com/rohit-dua/bub
Ncert Books On exambeet:http://exambeet.in/ncert-books-free-download/

Create the front-end for the web-tool to be hosted on tools-lab server.
Develop bot that handles queries in database (time-out deletion/queue handling/IPC-communication messages).
Extract meta-data from Google-Books and introduce system to check if book already present in IA(Internet Archive).
Create script to download from Google-books.
- This will be done by extracting individual page image, and then converting'em to pdf.

I find IRC a quick way to contact to my mentors.
Email will be used when mentors are not available.
plan to have involvement of interested parties for testing/ suggestions.
- For this announcement on wikitech-l, wikisource-l, commons-l, will be made.

Every task becomes a piece of cake, if you love doing it.
For queries, google cannot be as good as a real-time chat/email with someone experienced.
Before the core-coding, the set-up work does take a lot of time and edits.
Discussions and feedback make thing better.

Worked on the back-end python script.
- Added script to verify Commons Name and the Google-books ID.
- Cookie/session handling
Linked the DB to the tool.
Set up a cron-job to delete unconfirmed requests.
The tool can now be tested(for the frontend only) here
Understood the redis-queue implementation.

Deployed the above worked scripts to tools-labs:
- IA upload verification.
  - Used a score value system (with thressholds) to check if book already present in IA.
- Google-books download and pdf conversion.
- IA upload with meta-data.
  - ia upload using internetarchive python module.
Improved on the database management.
- migrated from 2 databases( sessions + requests) to single main database.
Understood the grid usage, and deployed 2 continuous jobs to grid.(worker.py and upload-checker.py)
worked on the email notification (using exim)

Improved on the code structure.
- removed all global variables and moved to use of classes.
Resolved a bug in internetarchive python module relating to metadata overwrite
Resolved the google-Id and commonsName parsing bug.

Some Minor Changes:
- Added retry/off-line checker wrappers to the code.
- Added admin-login page for administrative tasks like mass-uploads.
- Improved the error/job logging mechanism.