User:Ckepper/Collection Extension 2

This document describes a new version for the "collection extension" a.k.a. "book creator".

In a recent survey among Wikipedia readers a large percentage of respondents expressed their interest in using Wikipedia offline (41%) and export articles to PDF (40%). Technically, this functionality is already provided by the collection extension. The collection extension has been active in most major language Wikipedias for quite some time now. But obviously, many users do not know about this feature and do not use it.

The goal of this effort is to make the collection and export functionality available to more Wikipedia users. We want to fundamentally question our assumptions on the way the collection extension works and come up with something that is significantly easier to use. This naturally entails changes in labeling, functionality and placement of the collection extension in mediawiki.

This document is a work in progress. Feedback is highly welcome.

Statistics for the collection extension
The collection extension was enabled in the English Wikipedia in May 2010. To understand the usage patterns, we analyzed HTTP logs covering 252 Days from Feb. 16, 2011 - Oct. 25, 2011. The logfiles contained anonymized session keys that made it possible to identify user behavior.

Data Source
The logfiles included only collection extension requests (special:Book). Page views to "regular" Wikipedia articles were not included. The findings were normalized per session: e.g. multiple PDF downloads of the same article in one session are counted only once. All numbers presented in the next section are averages per day over the 252 days of our analysis. We analyzed only data from the English Wikipedia.

A word of warning on the numbers: A cross-check of the number of downloads from HTTP logs with the logs of the PDF render servers (pdf[1-3].wikimedia.org) revealed significant differences. The PDF render servers reported about five times as many downloads. A plausible explanation might be that the HTTP logs captured only a fraction of the actual traffic.

The Data for uploads and orders for PediaPress books is taken from PediaPress logs.

Results
The collection extension is used only by a tiny fraction of Wikipedia users. Wikipedia receives about 8 billion page views and 400 million unique users per month. Just by dividing page views by unique users we assume an average of 20 page views per session. Approximately 250 million page views per day would therefor translate to about 12.5 million user sessions per day.



Compared to these 12.5 million sessions, the number of PDF download sessions is tiny: only 12,698 sessions (0.1%) interacted with the collection extension and only 7,450 sessions contained downloads (0.06%).



In the 7,450 download sessions the users downloaded 15,309 files. 15,230 of these files were PDF files, 66 were in OpenDocument format, and 13 in ZIM format. Only 462 of the PDF files contained a collection of more than one article.



Most of the users download single articles in PDF format. The book creator is used only by very few users. Our logfile recorded 2,958 clicks on the "Create a book" link in the left navigation column. Only 966 proceeded past the "Start book creator" page and actually activated the book creator toolbar. 232 uploaded their collection to PediaPress and 4 actually ordered a book.

Metaphor
The collection extension allows Wikipedia users to collect articles and store them for later. Right now the collection extension uses a book metaphor to present its functionality to the users ("create a book", "insert chapter"). The results from the user survey and the logfile analysis show that this metaphor does not work too well. Wikipedia users said they wanted to "save articles for offline reading" (41%) or "bookmark articles for later viewing or repeated viewings" (36%). So it does not come as a surprise that people do not look for this functionality under "Create a book".

The goal of this rebranding effort is to search for a metaphor that better relates to the mental model of users.

The concept or "interaction pattern" of collections is very well known on the web, but it is quite often disguised in different metaphors. Erin Malone for example divides Collecting into the related articles "Saving", "Favorites", "Tagging" and "Displaying". These activities only partially match what a user can do with the collection extension:


 * The Articles themselves are not "saved" by the collection extension- the user only saves a pointer to the original item and not a copy.
 * "Favorites" are a closer match in terms of functionality but they imply a rating of the article that might not be appropiate for our collections.
 * "Tagging" is a completely different activity and
 * "Displaying" is implemented in the "Manage your book" page.

Another solution might be the "shopping cart" metaphor. It fits pretty well in terms of functionality, but implies commercial activities that are not required and not intended by most users.

Only recently, a new variation of the collection pattern "Read later" emerged that was propagated by tools/companies like Instapaper, Readability and ReadItLater. "Read later" seems to be the best match to the user needs expressed in the survey. Although the collection extension offers a slightly different functionality, we will explore this concept further.

Information architecture
The following images show the current user flow when interacting with the collection extension.



When a user clicks on "create a book" the extension checks (via Cookie) whether the user already has an existing collection. If so, it displays a JavaScript popup and asks if he wants to continue his previous collection. If the user clicks "OK", he is taken to the "Manage Collection" page. Otherwise the "Start Book Creator" with an explanation of the functionality is shown. When the user clicks on "Start Book Creator" he is taken back to the previous article and the Collection Toolbar is enabled. The whole starting process seems to be overly complicated and could be greatly simplified.





The new collection extension should be always active and operate modeless. This means that the controls that are currently presented in the toolbar should be visible all the time and integrated into the layout. That way, articles can be added to and removed from the collection at any time.

The "Suggest Articles" feature should be integrated into the "Manage Collection" page. There is no need for a separate page.



Website Placement and Layout
Commercial websites often use a toolbar or toolbox to link to print versions and display social media features. This functionality is most often placed "above the fold" to be instantly accessible for readers. Wikipedia uses a top navigation area to display a search box and various buttons that are useful mainly for editors. Therefor, print and collection functionality should be placed inside the main content of the page.

Wireframe for the "Manage collection" page


