Talk:Reading/Web/PDF Functionality

Jump to: navigation, search

About this board

About giving feedback

Please read Reading/Web/PDF Functionality and comment on the plans we lay out there, to tell us what you need from the PDF service. We're especially interested in what you need in the future that doesn't exist in the plans laid out there – if there's a bug with something that should work right now (e.g. you get an error message when you try to create a PDF), we need to fix it, of course, but that would have been on the agenda.

Updates: (24 February 2018)

- Kerning and spacing issues ( there has been a few reports on spacing issues within PDF rendering. The readers web team is currently looking into a solution. We will first be updating the fonts for PDFs ( over the week of November 27. This will resolve some but not all of the spacing issues. We'll be looking further into the remaining issues after the initial fix.

- Update on the book creator. We're still in the process of performance testing the new renderer ( Once this stage is complete, we will be able to provide more details on its capacity to render books.

Debenben (talkcontribs)
Quiddity (WMF) (talkcontribs)

There is an occasional caching bug that the devs are looking at (phab:T190429). I posted a reply to that thread, which fixed this instance.

Re: the incorrect display of the diff, I've filed phab:T190466. Thanks for the bug-report. :)

This post was hidden by Clump (history)
Reply to "What happened to my post???"

out of the frying pan and into the fire

BerlinSight (talkcontribs)

While I appreciate that WP got rid of the two column layout and the lack of tables in PDF output, I miss the quality of the LaTeX generated PDFs. The typographical problems already are discussed but there is another issue. Even vector Images like (used on page "Spiral model", inserting the link doesn't work here "page does not exist") are included in a rasterized version. The resolution is appropriate only for screen display but generally too low, to e.g. read contained text in the printout. Thus they often are useless.

TheDJ (talkcontribs)

This is a known problem, tracked as T178664

BerlinSight (talkcontribs)

OK, I see. Yet I think WP is wasting time reinventing the wheel. TeX/LaTeX is a professional quality typesetting system, which is Free / Open Source Software and contains decades of work. My prediction is Electron will never reach the point TeX already is WRT typographic quality. IMHO leaving the user with two options for generating PDFs (one with high quality typesetting and one with tables) would be a better choice than the current status.

TheDJ (talkcontribs)

I think the problem here is that the systems are fundamentally different. HTML is made for flexible and dynamic layouting, adapting to any situation where it is asked to render and (these days) a lot of interactivity.

LaTeX is fundamentally designed for very reproducible and specific layouting in controlled circumstances, mostly for non-interactive situations. You can't make websites with LaTeX (it's hard to put jello into a straightjacket), and therefore you cannot print them with it either. And HTML cannot do what LaTeX can do.

BUT HTML is catching up. There are specs for adding print specific context (page size, pagebreak info, etc) to HTML for instance, but they are not yet supported. It's also a technology that is closer to what we are used to within our own ecosphere, making it easier to support for the engineers that have to do the incidental work to support it, and we have to duplicate less work in both stacks, since most of the time, the easy stuff will just work.

Neither is perfect, neither will be perfect, but one is sustainable for us, and the other is not.

BerlinSight (talkcontribs)

Sorry, but it looks like you are missing the point completely. I did not ask to rewrite WP in LaTeX. The former PDF engine used LaTeX as a backend to create high quality PDFs, alas lacking tables. As the new engine has tables but an awful typographic and image quality and quite likely will never match the output quality of the old PDF engine, I would prefer to have the choice, which one to use (or better the old one with tables and single column layout, but that does not seem possible).

TheDJ (talkcontribs)

I was just talking about the technology stack:

  • normal: wikicode -> html
  • old engine: wikicode -> LaTeX -> PDF
  • new engine: wikicode -> html -> PDF

We removed one very expensive translation step from the system, that had no maintainers and no experts available that were able to keep it online.

THAT is the only thing that matters. It's a resourcing decision. If you want to quit your existing job and for free improve the old system, then that's fine.

Dirk Hünniger (talkcontribs)
Debenben (talkcontribs)

@Dirk Hünniger great work!

I am also disappointed by the typographic quality of the chromium rendering engine. Especially mathematical formulas look horrible (). I did not know about mediawiki2latex, why don't we mention it as an alternative and let the user decide what they prefer?

Debenben (talkcontribs)

I tested the claim that it can handle tables on the article schwarzschild-metric which was mentioned somewhere below:

mediawiki2latex -m -g -u -o "schwarzschild.pdf"

result: All tables are rendered perfectly. Mathematical formulas look perfect, only one drawback: some urls don't get any line-breaks, so they sometimes extend beyond the page margins

Quiddity (WMF) (talkcontribs)

Posting to bump cache, and hopefully fix missing comments.

Reply to "out of the frying pan and into the fire" (talkcontribs)

When Producing file title, please use initial cap. and no underscore. TQ

This post was hidden by Clump (history)
This post was hidden by Clump (history)
This post was hidden by Clump (history) (talkcontribs)

شكرا لك على هذا الملف

Reply to "File Title"

Workaround for Adding Pages to Book Creator Project

Markhalsey (talkcontribs)

If you have been experiencing problems in trying to Add Pages to Book, as I have, I believe I have found a consistent workaround to allow this process to function correctly every attempt. Although this advice may be redundant, I thought I would share my experiences, so that the Wikipedia Community can continue to utilize the Create Book function, while the Project is being fine-tuned.

i noticed that I was only able to add certain pages to my book, so after much trial and error, I found a workaround, allowing a person the ability to Add any Page to their Book. First, when you are on a particular page, which you would like to add to your book, you need to click on the “Create Book” Link at the left of the page on the Desktop version of Wikipedia. It will then ask if you would like to continue with your current book, with “x” Number of Pages, at which point you select the “Ok” radio button. You will be brought to the “Book Creator” page, which describes the Create Book function and contains a list of all of your Saved Pages for your Book.

Once you find yourself on this page, you need to enter the Topic of the previous page, which you were looking to add to your book, into the Search Box at the Upper Top Right within the Desktop Version once again. You will see a List of Topics, however if you have typed correctly, the page you are looking for should be the Top Result. Once you click on that Topic, you will be brought back to the prior page, which you were attempting to Add to your Book. However, this time, the “Add This Page to My Book” link will appear at the top middle of the page. Once you see the link, you can add that Page, and others moving forward.

I have found that this somehow restores the Book Creator to its original form. Should you experience trouble getting this function to continue to work, you will need to repeat these steps, using the Desktop Vesion, if you are on Mobile, until the process reboots itself. This has worked flawlessly for me, and should also work for you as well, until this excellent feature is up and working again full time.

Please don’t hesitate to contact me if you would like further explanation or assistance with this matter, as I am more than willing to help out where I can. I would also like to take this opportunity to Give Thanks and Praises to the Members of the Group responsible for the Book and PDF Creation Project, as I believe this capability is by far, one of the most exciting achievements for the WikiMedia Foundation. With this ability, not only has it changed the landscape for Learning, but also has reached new heights for Internet based Encyclopedias. I would be extremely interested in being part of the Team assisting with this Project, please let me know how I can help.


~~~~Mark Halsey (talkcontribs)

It would be very helpful if the "Printable version", "Download as PDF", and simply printing to a color printer would all support color. Very many Wikipedia pages use colors in very important ways, especially to highlight different meanings in tables (e.g. I could be wrong, but it seems to me that pages used to print in color, but don't seem to now. A related or maybe even more useful functionality would be a link to export to Excel (including color formatting). I think I've copied and pasted color tables via HTML into Excel, then printed them. So that's a reasonable work-around. Thanks for everybody's very helpful efforts!

TheDJ (talkcontribs)

Unfortunately, this is a limitation of browser based printing (and Download as PDF, uses a browser on the server side to generate the PDF). The browser vendors do not give us options to fix this. Also, color should never be a critical information element, as that would mean that you have made the information inaccessible to those with color blindness, or making use of screenreader software.

This post was hidden by Clump (history)
Korriskoso-vnt (talkcontribs)

Utilizar un navegador en el servidor, nos permite solucionar problemas de PDF. Es algo interesante.

Reply to "Workaround for Adding Pages to Book Creator Project"
Steelpillow (talkcontribs)

This recent edit to the page suggests that there will be an option to export Wiki markdown instead of PDF. Is this correct? Steelpillow (talk) 17:16, 26 February 2018 (UTC)

Bert Niehaus (talkcontribs)
Steelpillow (talkcontribs)

So this appears to be about an alternative way for a client to import and convert raw wikitext from individual articles, that is wholly unrelated to the PDF export service and, as far as I can tell, from markdown as well. Steelpillow (talk) 09:43, 28 February 2018 (UTC)

Bert Niehaus (talkcontribs)

It just shows an option to create the PDF on the client side due to problems of PDF generation on the server side. Of course this work around enables the export of even more formats. If that is not appropriate as recommendation in this discussion, excuse me for being off track.

This post was hidden by Steelpillow (history)
Korriskoso-vnt (talkcontribs)

Gracias! Comprendido.

Reply to "Markdown?"

Wikipedia permite culturizar al mundo

Korriskoso-vnt (talkcontribs)

Wikipedia permite culturizar al mundo, desde la tierna infancia, hacia el más alto nivel intelectual.

Reply to "Wikipedia permite culturizar al mundo" (talkcontribs)

The message "We're currently preparing performance tests of the PDF to book function. We should know more in early February. " is still there, can you give us a estimate date when the PDF Funcionality starts working again?

Johan (WMF) (talkcontribs)

We'll update as soon as we know more!

This post was hidden by Clump (history)
2600:8800:7B05:F900:1884:58A7:3A22:8A35 (talkcontribs)

I am looking forward to the updated functionality. I'll just be patient another few weeks. Thank you!

This post was hidden by Albnaose (history)
Reply to "Any news about PDF Functionality?" (talkcontribs)

No PDF Availiable

Videlmo Núñez Tarrillo (talkcontribs)
This post was hidden by Clump (history)
This post was hidden by Clump (history) (talkcontribs)

I had no trouble with the PDF download. It was complete and of good quality. (talkcontribs)

Bonjour 10 3 2018 10h10

VIDIANI Fontaine les Dijon

j'ai aussi essayé car la page wiki telle quelle refuse de s'IMPRIMER

Steelpillow (talkcontribs)

When posting to this topic, please specify whether you mean no pdf for a single article or no pdf for a whole book. You should be able to download an individual article. You can not download a whole book at the moment because the software is disabled. This is expected. Please only post here if your experience differs. Steelpillow (talk) 11:59, 11 March 2018 (UTC)

Reply to "No PDF Availiable"

... the solution could be based on Add-on/in for .pdf

This post was hidden by (history)
This post was hidden by Clump (history)
This post was hidden by Clump (history)
This post was hidden by Clump (history)
Reply to "... the solution could be based on Add-on/in for .pdf"
Steelpillow (talkcontribs)

The old OCG could output in a variety of formats. Is it correct to assume that headless Chrome must first have all the wikitext pages, copyright small print, etc. pre-processed into HTML+CSS before rendering? If so, is it possible to intercept the intermediate HTML/CSS format and offer an HTML download option? That might be say CHTM or ePub or just a raw zip. Or if the book is assembled in the DOM or whatever, could that be persuaded to spit out the HTML? This would then allow client-side conversion to other formats, which is next to impossible from PDF.

Bert Niehaus (talkcontribs)

Yes, any format that can be parsed and post-process would be a great support for generating derived products as w:Open Educational Resources. Spencer Kelly is currently doing a great job in developing wtf_wikipedia.js further. It allows the generation of JSON for MediaWiki-article (see demo ) conversion to plain text and may be other formats will follow. The conversion can be done on client side even in a browser just by contacting the MediaWiki-API, download the Wiki source of the article and parse it into whatever is content product is needed. Nevertheless the PDF generation is great and very much appreciated.

Reply to "HTML output"