Talk:Reading/Web/PDF Functionality

Jump to navigation Jump to search

About this board

About giving feedback

Please read Reading/Web/PDF Functionality and comment on the plans we lay out there, to tell us what you need from the PDF service. We're especially interested in what you need in the future that doesn't exist in the plans laid out there – if there's a bug with something that should work right now (e.g. you get an error message when you try to create a PDF), we need to fix it, of course, but that would have been on the agenda.

Update: (23 April 2018) PediaPress will take over the development of the books-to-PDF functionality. See Reading/Web/PDF Functionality for more information.

Updates: (24 February 2018)

- Kerning and spacing issues ( there has been a few reports on spacing issues within PDF rendering. The readers web team is currently looking into a solution. We will first be updating the fonts for PDFs ( over the week of November 27. This will resolve some but not all of the spacing issues. We'll be looking further into the remaining issues after the initial fix.

Krauss (talkcontribs)

In nowadaws, with CSS3-break and stable implementations, we can use the full ecosystem of open standards, centered in HTML5 and CSS3, and producing high quality PDF (also EPUB and others) with professional layout. It is important to consider the use of the modern ecosystem of open standards when expressing stylesheets and templates of PDF Functionality.

Steelpillow (talkcontribs)

If it is HTML5 or ePub then it is not PDF, while PDF itself has been an open standard for a long time now. But yes, the original plan had been to provide a choice of output formats. As far as I know this is not forgotten, but it is best to take one step at a time and PDF remains the most widespread standard in use.

Krauss (talkcontribs)

Hi Steelpillow, sorry my English... The focus is not the final format (PDF of course in a "PDF Functionality"), but the process: not a specific professional tool like LaTeX (niche-specific non-standard and limited) or Adobe InDesign (patented system), ideal is to express here that is any tool that understands CSS3 and HTML3 specifications.

Steelpillow (talkcontribs)

Ah, I see what you mean now. I am not sure that the CSS3+HTML5 ecology is complete and stable enough yet. For example MathML has proved poor as an intermediate format for PDF rendering. There must also be a question over the maturity of tools and libraries for CSS3 break implementation.

Krauss (talkcontribs)

Hum... Do you tested with a professional tool like Prince v12?

Please show me a Wikipedia's article with an equation that we can't translate (to MathML or mathjax) and render with Prince... And I show that it is used in only one Wikipedia's article ;-)

To compare LaTeX with "the open ecosystem" we need to compare all the potential pros/cons, not only exoctic equations... In the context of all featured articles of all languages.

Reply to "PDF Functionality use open standards?"
Johan (WMF) (talkcontribs)

There's a new update for PDF-to-books now. Most importantly, it shows a sample of how it currently looks, which is much like what the final version will be. Take a look, and comment if you have feedback.

Alangi Derick (talkcontribs)

Wow, I just had a quick look at the sample PDF and it looks clean. If this is how the final version will look like, then I think it's of very good quality. Thanks for the update @Johan (WMF).

Gpc62 (talkcontribs)

It looks like none of the articles in the sample book involve equations, either in-line or displayed.

Gpc62 (talkcontribs)

Using download-as-pdf of Maxwell's Equations the result still has equations in a very heavy style that looks as if everything is boldface. Even characters like numbers, epsilon and mu appear to be bold in many places, but not everywhere. The characters that are supposed to be bold (eg, capital B, D, E, H as vectors) are sort of doubly bold.

There are also major kerning problems in the text in general. On page 2, I see the word "by" rendered with the two characters completely overlapping.

Johan (WMF) (talkcontribs)

I'll make sure PediaPress are aware of the discussion here – but please be aware that the single-page PDF (currently working) and PDF-to-books (what we're working on here) won't be rendered using the same technology.

Gpc62 (talkcontribs)

OK. I checked the download-as-pdf version of Maxwell's eqns since it's not possible to check the PDF-to-books version.

Gpc62 (talkcontribs)
Steelpillow (talkcontribs)

I have replied in a separate thread, because it is so long and pedantic.

Johan (WMF) (talkcontribs)

On the topic of equations: it's still suboptimal because it uses MathML instead of LaTeX, but this will be fixed.

Simon Villeneuve (talkcontribs)

Do it is possible to have an updated roadmap on the page ?

Reply to "Sample" (talkcontribs)

PDF export: Please include the widows and orphan rule.

TheDJ (talkcontribs)

As far as I know, that rule is included (as it is in normal print). However, the behaviour of widows and orphans only works WITHIN a single paragraph. As Wikipedia pages often have lots of paragraphs and headers on the page as well as many floating images and other elements, the behaviour you expect might not be possible to achieve for the renderer.

If you have specific examples that go wrong, you are welcome to upload and share them with us, for evaluation.

Dirk Hünniger (talkcontribs)

mediawiki2latex does respect the widows and orphan rule (talkcontribs)

@Dirk Hünnigerthank you for this, but it appears that the mediawiki2latex demo site does not work. It does not produce the pdf when the process is finished

Dirk Hünniger (talkcontribs)

Hi, I do get a result for The problem is likely that the page you are trying to process is too large. The server cancels every process after one hour. This is done to allow others to use the server too. The solution is to install the current version Ubuntu (possibly in a virtual machine) and run the mediawiki2latex command line application available for the Ubuntu package repository via apt-get install mediawiki2latex. If you want to process a large page you might need a lot of ram, possibly something like 32 GByte, but you will see by yourself. Yours Dirk

This post was hidden by Johan (WMF) (history)
Reply to "Widows and orphans rule"
Steelpillow (talkcontribs)

The old book format allowed articles to be organised into Chapters. The current Book Creator tool provides an interactive GUI for doing this:

The current PediaPress print-on-demand book tool for Wikipedia books on their own web site also supports Chapters.

Yet we are told that chapters will not be implemented in the forthcoming Wikipedia PDF builder from PediaPress.

I believe that this is a grave mistake, as it will bork a great many existing books, users accustomed to downloading books in the old format will become very confused, and the Book Creator will need rewriting to remove the feature.

I do not believe that a tool which plays so badly with the legacy base should be rolled out live, I believe very strongly that this feature should be reinstated before the new PDF tool goes live on Wikipedia.

Johan (WMF) (talkcontribs)

(This is just to say that the feedback has been noted and is not being ignored.)

Reply to "Chapters" (talkcontribs)

I don't have any problem.... What I want is just the writing in an Ordered way..and I hope the present pdf file format does this Job....

I would just Cheer up wikipedia for Presenting Collections of information on different Subjects/ fields :)

Plz keep doing this ..Wiki ")

Reply to ":)"

Table background color is omitted from PDFs

3 (talkcontribs)

Many technical articles on Wikipedia include background colors in tables to differentiate groups of cells in the tables. An example is the web page for "UTF-8". The table background color is not included in the PDF created for the UTF-8 page. This is a severe defect for technical articles.

Johan (WMF) (talkcontribs)

Yeah, I get this is a problem for some articles. It's a conscious decision to make it possible to print the PDFs (one of the main use cases) in black and white (which is a common way to print things).

Steelpillow (talkcontribs)

Printing a colour pdf in black and white will only be a problem if the background and text tones are close. That should never be done anyway, as the eye is much less sensitive to colour differences. Is pandering to awful content styling really a better idea than throwing away deliberate visual cues?

Reply to "Table background color is omitted from PDFs"
Salino01 (talkcontribs)

It is very nice to get a new feedback about the book function. The example already looks quite good. Here are some comments about Amphibious aircrafts.pdf from 08/17/2018:

1. the image size is too small for printing as a book (probably about 15x20cm book size). In the past, images took up the entire page width or half the page height and are therefore better suited for illustration.

2. the font size also seemed too small to me.

3. the link to Wikimedia Commons is too prominent.

4. the links are missing in the references. Without them, many references are useless. This applies in particular to the section "External links".

5. the articles are grouped in the preview. This results in "Article 5 of 4" in the footer on page 36.

Johan (WMF) (talkcontribs)


Reply to "Book printing function of August 8th" (talkcontribs)

Based on a previous discussion Johan noted:

> > must admit I'm not quite certain I even understand the main point of this message, but the sad truth is that not knowing when something will be fully solved one year later is not very uncommon in software development, because you never know exactly how to solve something in beforehand.

That might be true. But it is better to admit failure than leave a message like that indefinitely. Three or six months might be reasonable, but years is not. Aside from the fact that is a bad software development approach to create a link saying "create a book" only to tell the user that they can't create it anyway in this case there are two such links.

That's like creating a sign leading people to a "working toilet" only to have another sign on its door indicating that the toilet is no longer functional. In such a case a person might have a desperate need to relieve themselves, but might find another place to do so. In this case it just makes the site look unprofessional by needlessly wasting people's time.

My suggestion is to simply remove the link from the sidebar, disable the special page, and restore it if the book feature ever returns.

Gpc62 (talkcontribs)

One benefit of retaining the link and message: People with an interest in making these books learn that this process is underway and can provide input. Whether that benefit still outweighs the negatives you point out is another question, of course.

Johan (WMF) (talkcontribs)

The books-to-PDF function is also used by a group of users who have occasionally used it for years. Removing the link would confuse them, and have them spend a large amount of time trying to find it or figure out what's happened, which could waste far more of their time.

Johan (WMF) (talkcontribs)

(This is unprofessional, by the way, in the sense that it's largely done in people's spare time, unpaid. The Wikimedia wikis are, for most part, a volunteer effort. Including some of the software development.)

Reply to "Remove create a book link from sidebar" (talkcontribs)

El termino "infortunadamente", no existe en español. Una forma adecuada seria "desafortunadamente"

Johan (WMF) (talkcontribs)
Reply to "sobre traduccion al español"

tous les pdf se forment très lentement ou pas du tout

1 (talkcontribs)

tous les pdf se forment très lentement ou pas du tout (02-03.09.18)

Reply to "tous les pdf se forment très lentement ou pas du tout"