Reading/Web/PDF Functionality

From MediaWiki.org
< Reading‎ | Web
Jump to: navigation, search
Translate this page; This page contains changes which are not marked for translation.

Other languages:
English • ‎français • ‎polski • ‎português • ‎中文

Please read the plans for the PDF service below and tell us on the discussion page if we are missing anything, or if there's anything that we plan to work on that is actually unnecessary.

Introduction[edit]

Our current PDF rendering service, the offline content generator, is no-longer maintainable. Simply put, it's breaking down. Originally created by a third party, it currently runs on outdated code which may introduce security vulnerabilities and other major issues in the future. If we're to have the PDF functionality, we unfortunately have to replace it, or we might suddenly find ourselves in a situation where we'd have to take it down without having planned to do so.

Additionally, it does not support a number of rendering requests from the community, the main one being the ability to render tables. We have selected a new service, the electron rendering service, as a suitable replacement. Our next step is to duplicate the functionality provided by OCG using the electron rendering service. Below, we will describe the main portions of the functionality we have identified as necessary. We would like to invite conversation around what is missing or what is superfluous in the provided list. We would also like to highlight over our future plans for PDF rendering to gather initial feedback.

History[edit]

  • Rendering PDF articles and books from Wikipedia pages is handled by a service called OCG. When rendering "books" through the book creator, it uses OCG as embedded within the Collection extension. OCG has multiple issues, especially with tables.
  • Multiple issues with OCG are identified, including complaints from the community around OCG's inability to render tables.
  • Rendering of tables ranks as number 9 on the German-speaking Community Technical Wishlist.
  • Wikimedia Deutschland begins on working on a solution for rendering tables in PDF's, and introduces Electron. They do this planning to run it alongside OCG, not to replace it.
  • At the same time as Wikimedia Deutschland is working on the Electron service, the responsible maintainers of the OCG service at the Wikimedia Foundation come to the conclusion that OCG has to be replaced.
  • The WMF Reading Team takes over the responsibility for the long term maintenance of PDF rendering begins plans on implementing table rendering across all projects.
  • The Reading team launches a community consultation for gathering feedback on Electron.
  • The Reading Infrastructure and Web teams begin scoping the working necessary to port OCG functionality over to the Electron service.

Update After Consultation[edit]

Proposed PDF and print styles based on feedback from consultation

We launched a consultation on the current implementation of the PDF renderer in early June, 2017. After reviewing the consultation responses, we have made the following observations:

  • A larger number of users preferred the single-column format over the double column format
  • Users which prefered the double-column format highlighted that their preference was based in the styling and look and feel of double columns. Some users also expressed concerns with font size and wasting paper when printing PDF's in the single-column option
  • The following feature requests were made:
    • Functional hyperlinks
    • Date and url, 'this page downloaded [date] from [URL]'
    • Customizable css for layout, title, TOC
    • Option for 2 column format
    • Include/exclude images versions
    • Modifiable margins
    • print by section - allows you to remove references, paragraphs you don’t want, index, etc
    • allowing configurable text size

Based on the feedback, we have incorporated the following into our new print styles:

  • hyperlinks
  • article information
  • smaller font and book-like styling

The remainder of the requests above will be postponed until the second iteration of the PDF renderer, in which we plan to build a settings mode that will allow for customization of the available options.

Proposal[edit]

The following is a proposal for the scope of functionality necessary for PDF rendering:

  • Individual articles will be rendered to PDF using the "Download as PDF" link in the sidebar
  • Multiple articles will be rendered to PDF using the Book Creator tool
  • All articles will contain attribution for text and images
  • All PDF's rendered will be able to print tables
    • Users will be able to customize the layout of their PDF (optional)

Differences between current and future implementation[edit]

OCG New Service Notes
Rendering individual articles Yes Yes
Rendering multiple articles using the book creator Yes Yes
Contains table of contents for multiple articles Yes Yes
Renders tables No Yes
Attribution Yes Yes Open question: location of attribution within the new service
Styling Latex New styles
N-column layout Yes No
Default 2-column layout Yes Tentative Default one column or two-column layout will be chosen based on feedback and quantitative and/or qualitative testing
Output format PDF, Plaintext PDF Only

Design[edit]

The new PDF styles will be designed for increased readability. Based on community feedback and qualitative or quantitative testing, support for a 2-column layout may be built for the book creator and/or for individual PDFs.

Development and Deployment Roadmap[edit]

The following is a rough outline of the development and deployment roadmap. It is subject to change.

  1. April – May 2017:
    1. The Reading team builds back-end support for functionality identified above
    2. Communities are consulted on expanding or shrinking proposed functionality
    3. Qualitative test performed for styling
  2. June – July 2017:
    1. New styles implemented
    2. First iteration is launched along with OCG on all projects and performance is compared
    3. Iterations based on consultations and identified edge cases
  3. August 2017 – September 2017
    1. Additional changes made if necessary
  4. October 2017
    1. Second iteration launched without OCG on all projects


Current Functionality Requirements[edit]

The following is a list of the current requirements for PDF rendering for single-article PDF's and for books. The requirements different from the current implementation are displayed in bold.

Single Articles[edit]

  • A PDF for a single article will be created by selecting the "Download as PDF" link
  • Upon selecting "Download as PDF", the PDF file will be generated. To download the file, users will select the "Download the file link"
  • Each PDF file will contain the following:
    • Article title and text
    • Infobox(if any)
    • Tables (if any)
    • Single-column layout
    • Page number
    • All article images and captions
    • Links to pages linked from the article (blue links and external links)
    • Text and image sources, contributors, and licenses

Books[edit]

Note: no changes will be made to the current book creator workflow at this time

  • User will launch the books creator by selecting "Create a book"
  • This will navigate to the current book creation page
  • To download a book, users will select the "download" link from the books page
  • Users may only download books in PDF format
  • Books will contain all elements from single article format as well as:
    • Book title page
    • Table of contents with page numbers
      • Selecting a section from the table of contents will navigate the user to the corresponding section within the book
    • The references for each article from the book will appear at the end of the article
    • Each article will begin on a new page
    • A single section for text and image sources, contributors, and licenses, that contains the collected contributions from all articles