Talk:Reading/Web/PDF Functionality/2018/08
Add topic| This page used the Structured Discussions extension to give structured discussions. It has since been converted to wikitext, so the content and history here are only an approximation of what was actually displayed at the time these comments were made. |
About giving feedback
Please read Reading/Web/PDF Functionality and comment on the plans we lay out there, to tell us what you need from the PDF service. We're especially interested in what you need in the future that doesn't exist in the plans laid out there – if there's a bug with something that should work right now (e.g. you get an error message when you try to create a PDF), we need to fix it, of course, but that would have been on the agenda.
Update: (23 April 2018) PediaPress will take over the development of the books-to-PDF functionality. See Reading/Web/PDF Functionality for more information.
Updates: (24 February 2018)
- Kerning and spacing issues (https://phabricator.wikimedia.org/T178665): there has been a few reports on spacing issues within PDF rendering. The readers web team is currently looking into a solution. We will first be updating the fonts for PDFs (https://phabricator.wikimedia.org/T181200) over the week of November 27. This will resolve some but not all of the spacing issues. We'll be looking further into the remaining issues after the initial fix.
sobre traduccion al español
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
El termino "infortunadamente", no existe en español. Una forma adecuada seria "desafortunadamente" 83.36.104.125 (talk) 08:20, 1 August 2018 (UTC)
Portrait v Landscape OR Shrink to fit
[edit]I tried the Periodic Table, and there are several wide format tables. These tables are cut off. I thank you for the pdf version for other items. Maybe perhaps set the params of the pdf based on the page?
Thanks 134.56.49.180 (talk) 01:05, 2 August 2018 (UTC)
- Thanks for reporting. Johan (WMF) (talk) 13:28, 2 August 2018 (UTC)
Proton
[edit]Apologies for my ignorance. Is the Proton Chromium rendering still a live option? There is a lot of ongoing activity at https://phabricator.wikimedia.org/project/profile/2960/ which, given the announced handover to Pedia Press, confuses me. Steelpillow (talk) 20:49, 3 August 2018 (UTC)
- PediaPress will only handle the books-to-PDF function (an update on that soon), the reason being that the planned solution did not work for books for performance reasons. The WMF are still working on the single page PDF rendering. Johan (WMF) (talk) 23:30, 3 August 2018 (UTC)
bad Kerning, Spacing and Overlaps
[edit]In the PDF-Download from de:Pipeline-Hazard as well as en:Hazard_(computer_architecture), there are multiple instances were letters run into each other or have (very) bad kerning. This seems to be especially the case with 'W'.
Interestingly this is only the case at few locations, so it is probably not a general issue with the font. Nerilex (talk) 08:44, 8 August 2018 (UTC)
- Thank you for reporting. Johan (WMF) (talk) 15:43, 8 August 2018 (UTC)
- In de:Motoröl PDF download, both, serif as as well as sans-serif fonts suffer from running-in glyphs. 80.171.202.143 (talk) 10:19, 20 August 2018 (UTC)
- Noted. Johan (WMF) (talk) 11:17, 20 August 2018 (UTC)
Sample
[edit]There's a new update for PDF-to-books now. Most importantly, it shows a sample of how it currently looks, which is much like what the final version will be. Take a look, and comment if you have feedback. Johan (WMF) (talk) 14:58, 8 August 2018 (UTC)
- Wow, I just had a quick look at the sample PDF and it looks clean. If this is how the final version will look like, then I think it's of very good quality. Thanks for the update @Johan (WMF). X-Savitar (talk) 15:40, 8 August 2018 (UTC)
- It looks like none of the articles in the sample book involve equations, either in-line or displayed. Gpc62 (talk) 16:58, 8 August 2018 (UTC)
- Using download-as-pdf of Maxwell's Equations the result still has equations in a very heavy style that looks as if everything is boldface. Even characters like numbers, epsilon and mu appear to be bold in many places, but not everywhere. The characters that are supposed to be bold (eg, capital B, D, E, H as vectors) are sort of doubly bold.
- There are also major kerning problems in the text in general. On page 2, I see the word "by" rendered with the two characters completely overlapping. Gpc62 (talk) 17:31, 8 August 2018 (UTC)
- I'll make sure PediaPress are aware of the discussion here – but please be aware that the single-page PDF (currently working) and PDF-to-books (what we're working on here) won't be rendered using the same technology. Johan (WMF) (talk) 17:55, 8 August 2018 (UTC)
- OK. I checked the download-as-pdf version of Maxwell's eqns since it's not possible to check the PDF-to-books version. Gpc62 (talk) 18:04, 8 August 2018 (UTC)
- If PediaPress would like a shorter page than Maxwell's Equations to include in their test book, Surface integral and Divergence theorem have the same issues. Gpc62 (talk) 18:41, 8 August 2018 (UTC)
- I have replied in a separate thread, because it is so long and pedantic. Steelpillow (talk) 14:49, 9 August 2018 (UTC)
- On the topic of equations: it's still suboptimal because it uses MathML instead of LaTeX, but this will be fixed. Johan (WMF) (talk) 15:07, 9 August 2018 (UTC)
- Do it is possible to have an updated roadmap on the page ? Simon Villeneuve (talk) 18:57, 18 September 2018 (UTC)
Ecology
[edit]please provide PDf format of ecology Naushad960 (talk) 12:23, 9 August 2018 (UTC)
- You do that yourself: go to the Ecology article and, in the right hand menu, select Print/export > Download as PDF. Steelpillow (talk) 14:00, 9 August 2018 (UTC)
Feedback on sample of August 8 2018
[edit]First, thank you all and well done! This is already better than nothing.
The rest of this note highlights problems and makes some suggestions. Some of these are obvious, some fixes will not be practicable, but never mind. I have checked every article against its online source and recorded everything I noticed, in order. I have also checked against an old-style PDF. I can post a copy of it if people need to see what some of my comments mean.
1. Chapter headings (to group related articles together) are not used. If this is not implemented, then a lot of current wikibooks will not render intelligibly.
2. The table of contents does not list sections within articles. It would be good if =H1= and ==H2== levels could be listed, though not any lower. Note that H1 levels should not normally be used in article content, but they are not impossible and need to be dealt with. I would suggest that they be dropped to H2, to keep H1 for main titles and avoid corrupting the table of contents.
3. The body text font size is too small, either to read comfortably or to keep your place in a full-width A4 page. Conversely the line spacing is too large for a fixed-size page, it looks as if it were optimised for web browsing. But don't go too small, or the reader will lose the line they are on again.
4. Paragraph spacing is too similar to the line spacing. This is partly due to point 3. but needs reviewing once that is addressed.
5. New sections should not be thrown clear of right-aligned images or boxouts. This wastes a lot of space and looks bad.
6. Sections with a standard heading of "External links" appear at the end of many articles (e.g. Pages 4, 8, 17, etc). Since the links are stripped off in the PDF, the remaining text is often worthless. The whole section is best removed.
7. Wikimedia Commons boxouts using {{Commons category| ... }} (e.g. Pages 9, 17) should not be included.
8. The article on Elysian Valley, Los Angeles has thrown a whole page to make room for the long boxout on Page 15. This not only looks awful but would have probably borked or crashed if the boxout had been longer than a whole page. May I suggest that the default behaviour should be either to leave out the whole boxout or to split it across pages?
9. The Wikipedia online article on Elysian Valley, Los Angeles has a scrolling image. This has been dropped from the PDF (page 16, Section on Commercial Corridor) but its caption "Storefronts on Riverside" has been kept. Either the image should be re-scaled to fit or the caption should also be dropped.
10. The article on Voxtrot (album) shows a similar issue (pages 18-19), which is perhaps a mix of points 5. and 8. I am not sure if fixes to those would also fix this one.
11. A chapter-level heading for the Appendix would be appropriate.
12. None of; in-text tables, mathematical equations, or non-Roman alphabets are present in any of the pages sampled (I have not checked for animated GIFs). I do not know if this was deliberate, but even if it was then a more representative follow-on sample might be useful, if only to check for glitch-free handling. For example, a typical problem is the presence of an obscure Unicode character for which the PDF rendering server has no font installed. Such issues should be dealt with gracefully, without the process or the remaining presentation failing. Steelpillow (talk) 14:15, 9 August 2018 (UTC)
- Thank you! Johan (WMF) (talk) 15:08, 9 August 2018 (UTC)
- Thanks for your feedback, @Steelpillow. The sample presented consisted only of random articles and reflects the status of rendering at Wikimania (July 20). Given the heterogeneity of Wikipedia's markup, only an iterative approach is possible. My expectation is that the results will always be suboptimal as many elements (animations etc.) can never be adequately transferred to print. Having said that, let me briefly comment your bullets:
- Chapter Headings will come soon.
- Article Sections in TOC are not going to happen. Given the size and structure differences of articles they will just bloat the TOC.
- Body font size: The problem is rather that the content is too wide than that the font too is small. After doing some experiments with different sizes, my current solution is to use only 2/3 of the page for body copy. Eventually, we could think about a more accessible version with large print, but increasing the body font size will significantly increase the page count.
- Paragraph spacing will be addressed shortly.
- Clearing floated elements before headlines creates (wastes) more space. Floats are currently only allowed next to h3 - h6. I can test allowing floats only next to h4-h6.
- External Links are preserved in the PDF so the references still work on the screen. I was reluctant to remove those sections altogether as some users might find those valuable.
- Commons boxouts can be easily removed.
- Infobox elements weren't part of the original PDF output. As they often contain valuable information, I tried to include them. The odd page breaks in the "Elysian Valley" article are a bug that has since been fixed. Nevertheless, similar odd breaks can occur in other places as well.
- The leftover caption is a bug.
- Voxtrot also contains a few bugs.
- As I said, chapter level headings are work-in-progress.
- Math formulas will come as well. Currently, I am working with a MathML conversion which leads to unsatisfactory results in a number of cases. Going forward, I am planning to use LaTeX rendering for formulas.
- Animated Gifs will only show the first frame. Please let me know if there is a better way to handle this.
- Unicode support is a goal. The current version attempts to match Wikipedia's visual identity. Non-western characters haven't been tested yet. Suggest for test articles with mixed language characters are highly appreciated. Ckepper (talk) 15:55, 9 August 2018 (UTC)
- Thank you for the quick reply. Most of it is excellent news, but I have a couple of concerns:
- The body font size was bigger in the old renderer and it makes a big difference for me. I know it does for others too, I have 20 years experience as a technical author backing that. Please do reconsider.
- I think my comment on clearing floats came across the wrong way. I am advocating that you allow floats next to h2 as well.
- Also, yes I agree that the first frame of an animated GIF is the right thing to do.
- I will try to find some good test articles. Would you like one or two for in-flow tables as well? Steelpillow (talk) 16:12, 9 August 2018 (UTC)
- Here are some suggested test articles:
- Chinese script styles: short samples of several variants.
- Iotation: A short but busy test for sophisticated phonetic symbols and oddities within the Cyrillic script.
- Bengali alphabet: Bengali script, IPA phonetc symbols (with warning boxes for both, implemented in different ways making it hard to remove them). Also nested lists and some unusual in-text table layouts including one of impossibly wide fixed width. A long-ish article but an excellent challenge to any rendering engine. Steelpillow (talk) 19:35, 9 August 2018 (UTC)
- Thank you, Steelpillow for these detailed comments, and Ckepper for a quick and responsive reply.
- A couple of comments...
- 3. The font size for the main body text -- 8 pt -- is definitely too small. It ought to be in the range 11 to 12 pt., maybe 10 pt at a minimum.
- I disagree with this: "The problem is rather that the content is too wide than that the font too is small.[...] my current solution is to use only 2/3 of the page for body copy"
- In this context, that argument would equally justify using, say, 4 pt text with a column 1/3 of the page wide.
- For a print publication with 2- or 3-column format, you could pick a font size smaller than 10 pt (eg, at print pubs where I worked for decades, one used 9 pt serif and the other currently has 8.75 pt serif/8.25 pt san serif).
- But for pdfs that are not tied to a print publication format -- and which will routinely be viewed onscreen, including on small tablet or phone screens -- it's better not to go so small.
- A larger font size will also help ameliorate the problem of whitespace next to tall blocks of graphics.
- 6. I agree, keep the external links section. I suggest printing the url on the line below the link text, in addition to having the url actively linked. This makes the information usable for people who print the page onto paper. Gpc62 (talk) 19:49, 9 August 2018 (UTC)
- Thanks for your feedback. Here are links to the test articles @Steelpillow mentioned: https://www.dropbox.com/sh/ybypkj9du9cesbe/AAA5IJIeM7ts3vdfVHhAUrbba?dl=0
- They still have quite a few issues but international characters don't seem to be a great problem.
- I hear you on the font size. It's a balancing act between aesthetics, page count, and usability. Adjusting the font-size is a larger effort however as several elements have to be changed accordingly (and I might have to rethink the overall layout). Ckepper (talk) 20:52, 9 August 2018 (UTC)
- One new issue in particular:
- 13. In Bengali alphabet, the lead infobox text content runs off to the right. These infoboxes are common, so this needs to be fixed.
- Several other issues with this page are caused by bad markup and are problems in the original too. I am impressed that your renderer got past them all so gracefully, congratulations.
- On font size, these single-article PDFs seem to be rendered differently: the fonts are not the same. In the original sample, if the line spacing were reduced a little, this would enable a larger font without eating up so much extra space. Steelpillow (talk) 07:59, 10 August 2018 (UTC)
- @Ckepper --
- I think worrying about the difference in page count caused by 8 pt versus 10 pt main text on these pdfs is a false economy.
- On aesthetics: at the size I routinely read pdfs on screen, the sample's article titles and heds look unusually large, and the main text looks unusually small to me. The same is true on a printed page.
- Usability: to read it comfortably on screen, I have to zoom to a larger magnification than I usually do, and then the lines are too wide, as you said. It is just about unreadable down at "100%" size. In print, it would be readable as a 2-column format. The full columns are too wide.
- Larger margins might make it readable in print, but I expect the aesthetics of that will look pretty weird to me.
- What elements other than font sizes and line spacing would need to be changed if you increased the body text to 10 pt? Most elements currently at 7 pt through 10 or 12 pt could be bumped up 1 or 2 pt. Elements currently at 16 pt and up don't need to change. None of this requires changing the overall layout -- just tweaks to spacing. Gpc62 (talk) 19:24, 12 August 2018 (UTC)
- One other note: On p. 7 at present, looking at just the heds "Animals" and "Tourism", it's hard to tell which is the higher-level hed based on their appearance. Gpc62 (talk) 19:25, 12 August 2018 (UTC)
Table background color is omitted from PDFs
[edit]Many technical articles on Wikipedia include background colors in tables to differentiate groups of cells in the tables. An example is the web page for "UTF-8". The table background color is not included in the PDF created for the UTF-8 page. This is a severe defect for technical articles. 24.214.27.117 (talk) 14:19, 11 August 2018 (UTC)
- Yeah, I get this is a problem for some articles. It's a conscious decision to make it possible to print the PDFs (one of the main use cases) in black and white (which is a common way to print things). Johan (WMF) (talk) 14:25, 22 August 2018 (UTC)
- Printing a colour pdf in black and white will only be a problem if the background and text tones are close. That should never be done anyway, as the eye is much less sensitive to colour differences. Is pandering to awful content styling really a better idea than throwing away deliberate visual cues? Steelpillow (talk) 01:33, 11 September 2018 (UTC)
Remove create a book link from sidebar
[edit]Based on a previous discussion Johan noted:
> > must admit I'm not quite certain I even understand the main point of this message, but the sad truth is that not knowing when something will be fully solved one year later is not very uncommon in software development, because you never know exactly how to solve something in beforehand.
That might be true. But it is better to admit failure than leave a message like that indefinitely. Three or six months might be reasonable, but years is not. Aside from the fact that is a bad software development approach to create a link saying "create a book" only to tell the user that they can't create it anyway in this case there are two such links.
That's like creating a sign leading people to a "working toilet" only to have another sign on its door indicating that the toilet is no longer functional. In such a case a person might have a desperate need to relieve themselves, but might find another place to do so. In this case it just makes the site look unprofessional by needlessly wasting people's time.
My suggestion is to simply remove the link from the sidebar, disable the special page, and restore it if the book feature ever returns. 197.235.97.232 (talk) 12:03, 12 August 2018 (UTC)
- One benefit of retaining the link and message: People with an interest in making these books learn that this process is underway and can provide input. Whether that benefit still outweighs the negatives you point out is another question, of course. Gpc62 (talk) 19:15, 12 August 2018 (UTC)
- The books-to-PDF function is also used by a group of users who have occasionally used it for years. Removing the link would confuse them, and have them spend a large amount of time trying to find it or figure out what's happened, which could waste far more of their time. Johan (WMF) (talk) 22:35, 12 August 2018 (UTC)
- (This is unprofessional, by the way, in the sense that it's largely done in people's spare time, unpaid. The Wikimedia wikis are, for most part, a volunteer effort. Including some of the software development.) Johan (WMF) (talk) 22:38, 12 August 2018 (UTC)
Supprimer les mots "Problèmes" et "Malheureusement" dans la version française
[edit]Bonjour,
Merci pour tous ces super articles.
Je voulais vous faire remarquer qu'il est pénible de lire cette phrase en français
chaque fois que l'on télécharge un article en Pdf :
(Nous avons quelques problèmes techniques avec la fonction que nous utilisons pour générer les PDF. Malheureusement nous devons la remplacer.)
Lire chaque fois ces deux mots ; "Problèmes" et "Malheureusement"
sur le site de Wikipedia produit à répétition un sentiment lourd, triste, négatif
et inutile Vu que la fonction Pdf fonctionne toujours très bien.
Pourquoi ne pas simplement écrire une phrase sans termes négatifs ?
Merci pour votre attention et bonne continuation à toutes et tous.
Marc G 81.244.90.237 (talk) 19:53, 12 August 2018 (UTC)
- Merci, et merci pour votre commentaire. Johan (WMF) (talk) 22:26, 12 August 2018 (UTC)
[[de:Vorlage:Sitzverteilung|]]
[edit]I wanted to download [[de:Bundestagswahl 2017|]], but the seats-template wasn't there. ~ Habitator terrae (talk) 15:43, 17 August 2018 (UTC)
Book printing function of August 8th
[edit]- It is very nice to get a new feedback about the book function. The example already looks quite good. Here are some comments about Amphibious aircrafts.pdf from 08/17/2018:
- 1. the image size is too small for printing as a book (probably about 15x20cm book size). In the past, images took up the entire page width or half the page height and are therefore better suited for illustration.
- 2. the font size also seemed too small to me.
- 3. the link to Wikimedia Commons is too prominent.
- 4. the links are missing in the references. Without them, many references are useless. This applies in particular to the section "External links".
- 5. the articles are grouped in the preview. This results in "Article 5 of 4" in the footer on page 36. Salino01 (talk) 09:16, 26 August 2018 (UTC)
- Yes I agree and also text align is quite a little messy. The text should be justified. Also the document format in 2 columns, left text and right images is very boring, cold. BluAlien (talk) 19:43, 4 October 2018 (UTC)
- Thanks. Johan (WMF) (talk) 10:20, 26 August 2018 (UTC)