Talk:Google Books, Internet Archive, Commons upload cycle

Any comments / suggestions are welcome.

Digital Library of India and West Bengal Public Library Network
As I mentioned in mailing list please consider to developed as general tool, not specific to google book and AI. We are more depends on Digital Library of India [2]  and West Bengal Public Library Network [1].


 * 1.http://dspace.wbpublibnet.gov.in:8080/jspui/
 * 2.http://www.dli.ernet.in/
 * Jayantanth (talk) 15:57, 12 March 2014 (UTC)

--- Hi Jayantanth, your suggestion is worth noting. The tool will not be exhaustive to Google Books. Since each library has its unique API, they will be added individually. I will add support to more libraries over time.--8ohit.dua (talk) 16:00, 13 March 2014 (UTC)

urllib2
Python experts usually recommend python requests nowadays. --Nemo 18:31, 12 March 2014 (UTC)

---Thank you Nemo, python-requests seems promising.--8ohit.dua (talk) 16:05, 13 March 2014 (UTC)

DSpace
Just a side note: the software that runs West Bengal Public Library Network is DSPace, used at large on many digital libraries. Focusing on the general software behaviour would be a great option instead of focusing on a very specific instance.

Some examples from my country: Brasiliana USP, Senado Federal rare works collection, UNESP.

DSpace relies on OAI-PMH (APIs URLs for all those mentioned by me:, , ; Brasiliana USP still don't have a public API URL but harvesting directly from the description pages shouldn't be a very difficult task, see the HTML source for as an example).

If you pick to harvest metadata from Simple Dublin Core you will get only metadata description, although linking those records to binary files isn't hard. If you pick to harvest metadata from Qualified Dublin Coreyou will be able to get metadata+binary files, except if the digital repository is running the bogus v3.1 (and the most majority of repositories from Brazil are, because it was the most recent version at time of implementation. Upgrading to 3.2 on these institutions isn't easy due to many bureaucracy needed to poke the firm that implemented DSpace on IT structure from a given institution, so beware of it when coding: if you get an error is possible that it isn't due to your code).

BTW Google Book Search is really our priority. Many thanks for adopting this project. I hope you learn a lot while doing it =) Lugusto (talk) 05:11, 17 March 2014 (UTC)


 * Good point. We adore DSpace and the smart folks (who hopefully have best content) usually use it. --Nemo 08:13, 17 March 2014 (UTC)
 * This is now filed as . --Nemo 08:35, 26 July 2014 (UTC)

i18n
Should the project have something to do with Internationalization and localization i.e. should it be converted to different languages, etc.?
 * Our mantra is that i18n must be done immediately as soon as software is born, not as an afterthought. Consider this in your plan. Of course if you develop a library for other software to use it's another matter, but if you make a Tool Labs tool you must use Intuition etc. --Nemo 08:13, 17 March 2014 (UTC)

SQLite or MySQL
What should be the choice of database between SQLite or MySQL.(python controller) ''SQLite is used for heavy reading purposes because it locks the entire databse while writing. But is easy to setup.''--8ohit.dua (talk) 11:39, 17 March 2014 (UTC)
 * Maybe SQLite as the db use will be extremely limited if I've well understand the proposal (only a list of emails) and  is installed on labs. But MySQL/MariaDB is fine too. Tpt (talk) 17:57, 17 March 2014 (UTC)

UI and Beginning code
I've created some UI and other verification snippets for the project. Please have a look at th repo (project BUB!!): https://github.com/rohit-dua/BUB. It is fully functional with index.cgi as the homepage.

I'm coding in such a way that more libraries can be easily added. Screenshot: https://upload.wikimedia.org/wikipedia/commons/5/59/Bub-ui.png --8ohit.dua (talk) 13:41, 21 March 2014 (UTC)

Code repository?
Where will be the code repository of this project be hosted? In our Gerrit instance or somewhere else? Will this project have a code review process? Thank you.--Qgil (talk) 20:57, 25 March 2014 (UTC)

Communication channels
In addition of the announce on wikitech-l, you may be interested in sending some announce mails to non-tech users on wikisource-l, commons-l, and/or on the respective Village Pumps (commons:Commons:Village_pump, wikisource:Wikisource:Scriptorium).

As a side note, there is also the Glamtools project (general-purpose mass uploader for Commons) you may be interested.

I wish you a great success in your project!

~ Seb35 [^_^]  13:01, 26 March 2014 (UTC)