Extension:BookManager/Improve support for book structures

'''As this project has been accepted, I have created Book management to organize this project. I plan to keep this proposal here for reference, but will not be reusing the page for the rest of the project.'''

Hi! I'm Molly White, or GorillaWarfare on the various projects. I am applying for Summer of Code 2013 and Outreach Program for Women. I would love to hear any feedback you have for me, and I can be easily contacted on IRC, by email, or by talk page message (see below for details).

Identity
Name: Molly White Email: gorillawarfarewikipedia@undefinedgmail.com or molly.white5@undefinedgmail.com Project title: Improve support for book structures

Contact/working info
Timezone: EDT (UTC -4:00) Typical working hours: Very flexible. I can adjust my work hours to anytime between 13:00–07:00 UTC (09:00–03:00 Eastern), but I anticipate working from 15:00–23:00 UTC (11:00–19:00 Eastern). IRC or IM networks/handle(s): GorillaWarfare (Freenode) Time constraints: I just want to be clear up front that I do have a few time constraints to work around. I will be working a full-time job up until June 21. I'm also in college, and classes start for me on September 4. Although I realize the overlap is somewhat significant, I'm fully prepared to dedicate most of my evenings/weekends to working on the project while I'm working or in classes. Per Sumana's and Quim's suggestions, I've prepared my schedule so that the main part of this project will be complete before September 4. Any remaining time will be dedicated to some of the many "if time permits" deliverables.

Project summary
I am interested in improving support for wikis like Wikisource and WikiBooks, whose content is structured in a book format. I intend to work on Extension:BookManager to allow these wikis to collect pages of a book into a single unit, which can then be easily navigated, exported/printed, and acted upon as a single unit.

Wikisource and Wikibooks both provide freely-available books, which is quite different from the article-type content of most wikis. Despite their very dissimilar content type, they are forced to adapt the wiki article structure to organize their content. They both accomplish this by using subpages, which are then collected into a group and made somehow navigable. Wikisource usually allows navigation by use of header templates and a main table of contents; Wikibooks sometimes does this, or sometimes simply requires its readers to return to a main table of contents before moving to another chapter. Online navigation of these books can be challenging, but the main issues arise when a user desires to perform other actions on the entire book. Printing a book, for example, is next to impossible: the Collection extension that provides the "Create a collection", "Create a book", and "Download as PDF" links in the Print/export sidebar group does not work with these types of wikis. Each page must be manually added to the collection or book, which can be a huge process for books of any length. Additionally, it's not possible to watchlist, move, delete, or protect an entire book; these actions too must be done per-page.

These problems can be solved by providing a simple, standard way to store the structure of the book. I plan to accomplish this by modifying the BookManager extension: an existing (but preliminary and currently unstable) attempt to address the issue. A user will be able to use a form (see right for a mockup) to organize the book into parts. If the book has an Index page for each page of the book (as on Wikisource works that have scans), these pages can be organized into chapters by specifying page ranges. These chapters can then be ordered and organized into the book. Once the book is organized as such, I hope to add an option to automatically create a table of contents, navigation bar, and/or a print version.

To organize the book, the extension could be modified to use a JSON structure, which would neatly collect all the organizational information, as well as any relevant book metadata. I have created an example at User:GorillaWarfare/Proposal/JSON. This data would be editable via a form (see right for a mock-up); users would not need to manipulate the raw JSON. Each book would have a single main page that could be used to interact with the book as a whole. These interactions would include improved support for exportation and printing, as well as technical changes such as deleting or protecting. There are quite a few enhancements that depend on this organizational structure (see Bugzilla), and I hope to tackle some of these as a part of the project.

Mentors
User:Raylton P. Sousa (maintainer of BookManager) and User:Mwalker (WMF) have offered to mentor me. User:Tpt has offered to co-mentor, if a Proofread Page GSoC project does not materialize. As of now, I am planning to at least work with Raylton on this project.

Required deliverables

 * Stabilize Extension:BookManager
 * Design a final JSON schema to represent each book (see the beginnings of this at User:GorillaWarfare/Proposal/JSON)
 * Modify the BookManager code to create and interact with a JSON representation of the book
 * Create a user-friendly form to allow a user to easily adjust the book structure without editing the JSON directly
 * Add functionality to automatically generate navigation bars similar to those generated by Wikisource's template. It could offer previous/next chapter navigation, as well as a link to the main landing page. It can also include similar information as that header template (for example, author, categories, portal...) Raylton has pointed out Módulo:Nav, a Lua module on Portuguese Wikibooks that creates semi-automatic navigation bars. He's also mentioned that BookManager has some functionality along these lines already, and after some discussion, we agreed that it would be wise to approach this by modifying the existing navigation bars to work with the JSON structure.
 * Write documentation for the BookManager extension

If time permits

 * Add functionality to automatically generate a table of contents on a separate page, which can then be transcluded.
 * Add functionality to create a simple print version of the book. This would be similar to the "print version" of articles (see for example the print version of the Wikipedia article "Book"): a simplified page that is printer-friendly. Eventually the functionality added by this extension should be used with Extension:Collection, but that is not something I plan to tackle in the main part of this project.
 * One-click events that handle an entire book
 * Watchlist
 * Delete
 * Move
 * Protect
 * View recent changes
 * Add an extension or patch to Extension:Collection that will allow it to print the entire book at once
 * Allow books to be collected on a "bookshelf" for later use

Pre-May 27
Familiarize with the MediaWiki core, BookManager, and possibly Collection extensions. Work on stabilizing the BookManager extension. I will also try to replace all deprecated functions with up-to-date ones, and improve the inline documentation.

May 27–June 17
Google allocates this time to the "community bonding period". I am already quite involved with the Wikimedia communities, so I will not need to spend much of this time familiarizing myself with them. I will, however, use this time to ensure that my project has support from the communities it will most dramatically affect (primarily Wikisource and Wikibooks). I will also use this time to become more familiar with the development processes, MediaWiki core, and the BookManager extension. During this time, I will also work with my mentor(s) to draft a very specific plan for the rest of the summer, create design documents, and begin working on the code.

Week 1: June 17–June 23
Finalize JSON schema, with feedback from the Wikisource, Wikibooks, and Wikidata communities on additional metadata they would like to include. Plan for the metadata to be configurable per-wiki, as fields like ISBN would not be as useful for a wiki like Wikibooks.

Week 2–4: June 24–July 14
Modify the BookManager backend so that it can read and store JSON data about each book. Add support for the automatic generation of navigation bars.

Week 5–6: July 15–July 28
Create the frontend for the extension. This includes both the form to create and modify the stored data, as well as the landing page for viewing the book. Continue working on the navigation bars backend, if necessary.

Week 7: July 29–August 4
Modify the existing navigation bar to use the JSON data instead of the existing code.

Week 8–11½: August 5–September 4
Clean-up stage. This involves polishing the code, finishing up any documentation, testing and bug fixing, and deployment. I will aim to have the complete, stable extension reviewed, merged, and deployed by September 4.

I am aiming to complete the main portion of the project by September 4 because, as I mentioned above, that is the beginning of my school year. I intend to continue contributing more or less full-time to this project until the official end of the GSoC period, but aiming to deploy by September 4 will ensure that I don't end the summer with a half-complete project.

Week 11½–Week 15: September 5–September 27
By now, the main project should be deployed. This period will be dedicated to any further bug fixing, and the smaller "if time permits" improvements. I will begin with the generation of a table of contents and print version.

About you
I am just completing my second year at Northeastern University, where I was studying computer engineering. I have just switched my major to computer science, as I've found I'm much more interested in writing code than I am in working with hardware. My programming language of choice is Python, although I also use C, C++, and Javascript regularly. I am working on becoming more proficient with PHP to prepare for this project.

I have been a Wikimedian for almost seven years. I am most active on English Wikipedia, where I am an administrator. I mainly do maintenance work there, reverting vandalism and cleaning up articles. I also help out with outreach; I've worked on the Wikipedia Education Program (hey, that's even me in the photo!), and I was one of the appeals featured during the 2011 fundraising drive. I also very much enjoy editing the English Wikisource, where I completed the initial proofreading of Sigmund Freud's The Interpretation of Dreams and am now working on proofreading the Pentagon Papers.

I enjoy programming in my free time. My current projects are a Python parser to convert the wikimarkup of the Pentagon Papers to LaTeX, and a plain text-to-wikitext parser for Supreme Court cases. When it's at all possible, I make all code I write freely available on GitHub.

Participation
I will work hard to communicate well, whether it be with my mentor, other developers, or the community. I already make a habit of ensuring that I am very easy to contact. While I'm awake, I respond to emails almost immediately and talk page messages within the day. While I'm at my home computer, I am always logged on to IRC and can be easily reached at #wikipedia-en, #mediawiki, #wikisource, or by private message. In terms of my coding style, I commit frequently (no, really, just look at my GitHub commit log), and plan to continue this habit while I work on this project. I will keep a repository for this project on GitHub both for my own use and so that my progress will be easily trackable—this way you will not have to wait for me to submit a finished patch to see what I'm up to. I also plan to blog about this project, probably weekly.

Regarding my interaction with my mentor, Raylton and I have agreed that we will communicate via email at least once daily so that he can keep up with my progress and give feedback. If I run into questions, I will be able to contact him via email as needed. We will also be using Book management as a planning page. I will also take advantage of IRC if I have smaller problems that may not necessarily require his specific expertise with BookManager.

Past open source experience
As I mentioned in the "About you" section above, I've been contributing to the Wikimedia projects as an editor for almost seven years. I'm very familiar with the various communities, particularly on English Wikipedia. This project is one of my first forays into contributing code to a large open source project. I've been working on familiarizing myself with the code base and beginning to contribute; I submitted my first patch at the beginning of April! Since then I've submitted a few more. I've also been communicating a lot with MarkTraceur, who has been very helpful in introducing me to the code, and an exceptional resource when I have questions. I do make my personal code freely available, but I tend to be the only contributor to these projects.

Submitted patches

 * LinkHandler doesn't understand $wgCapitalLinks, patch
 * grammatical error(s) in message MediaWiki:Permissionserrors, patch
 * $wgBookManagerNamespaces should default to $wgContentNamespaces instead of NS_MAIN, patch
 * Remove the obsolete "/client/jquery.hotkeys.js" file, patch

Bugs
There are quite a few bug reports and other various discussions that are related to this project:
 * Dependency tree for Bug 15071
 * Make Collection extension to automatically create collections for existing books on Wikibooks/Wikisources
 * Wikibooks/Wikisource needs means to associate separate pages with books
 * Protect, watchlist or delete a whole book at once
 * Create a set of special pages for handling meta-organization of books
 * List, count and search all books
 * BookManager design (automatic translation)

Feedback
I have requested input on this project in several places:
 * Multilingual Wikisource Scriptorium (also linked from English Wikisource Scriptorium)
 * Thread:User talk:Qgil/Summer of Code proposal feedback
 * Comment on bug 15071
 * On wikitech-l
 * English Wikibooks reading room

Endorsements

 * support. Excellent idea for an under-supported set of Wikimedia projects. Okeyes (WMF) (talk) 13:42, 3 May 2013 (UTC)
 * I've been hacking around with Molly for a bit on my lochner project and her brandeis project, and she's impressed me with her willingness to bounce ideas around and her enthusiasm for the projects (and programming in general). And this project, as Oliver said, is much needed, and focused at a site we don't normally see represented. --MarkTraceur (talk) 15:33, 3 May 2013 (UTC)
 * This area of MW is in heavy need of some serious code scrubbing and feature addition! Mwalker (WMF) (talk) 15:55, 3 May 2013 (UTC)
 * MediaWiki really needs support of the book notion and this proposal looks a good way to add this support. Tpt (talk) 19:32, 3 May 2013 (UTC)
 * Very important, because it reduces the learning time! - Raylton P. Sousa (talk) 14:14, 7 May 2013 (UTC)