Talk:Reading/Web/PDF Functionality

Jump to navigation Jump to search

About this board

About giving feedback

Please read Reading/Web/PDF Functionality and comment on the plans we lay out there, to tell us what you need from the PDF service. We're especially interested in what you need in the future that doesn't exist in the plans laid out there – if there's a bug with something that should work right now (e.g. you get an error message when you try to create a PDF), we need to fix it, of course, but that would have been on the agenda.

Update: (23 April 2018) PediaPress will take over the development of the books-to-PDF functionality. See Reading/Web/PDF Functionality for more information.

Updates: (24 February 2018)

- Kerning and spacing issues (https://phabricator.wikimedia.org/T178665): there has been a few reports on spacing issues within PDF rendering. The readers web team is currently looking into a solution. We will first be updating the fonts for PDFs (https://phabricator.wikimedia.org/T181200) over the week of November 27. This will resolve some but not all of the spacing issues. We'll be looking further into the remaining issues after the initial fix.

Python 2 end of life 1st Januar 2020

5
Dirk Hünniger (talkcontribs)

see https://python3statement.org/

this is in 9 months from now. Please make sure that the new renderer an all required libraries work with Python 3. In particular make sure this also holds for also mwlib or ensure that mwlib is not used as part of the rendering process.

Yours Dirk

Johan (WMF) (talkcontribs)

This goes beyond my familiarity with the technical specs and planned future work, but I've pinged folks to give you a proper answer.

Dirk Hünniger (talkcontribs)

Hi, I was informed now what was going on. Indeed mwlib is not involved. The new renderer is going to be Proton: https://www.mediawiki.org/wiki/Proton. I would be nice if someone with better skills in English than me could write that as a update on the Wikipage this discussion page belongs to. (so the page you see when you click on the page tab on top of this discussion page)

Yours Dirk

Johan (WMF) (talkcontribs)
Johan (WMF) (talkcontribs)
Reply to "Python 2 end of life 1st Januar 2020"
Efes34 (talkcontribs)

pdf download isn' working!


Dirk Hünniger (talkcontribs)
Kghbln (talkcontribs)

While MediaWiki2Latex is a pretty cool solution (kudos) it does not address the issue reported here. It however appears to be a great alternative to what WMF should probably provide as it used to in the past. Keeping fingers crossed for a working solution.

Anyways I have tried it and it tells me that the conversion completed but it does not allow me to retrieve the PDF document. So for me this alternative was a dead end, too.

Kghbln (talkcontribs)

Ah probably a Firefox issue. I just tried with Chrome and here I am being presented with a file.

Dirk Hünniger (talkcontribs)

yes you have to consult the documentation of Firefox. As soon as the pdf has been downloaded the down pointing arrow in the upper right corner turns light blue. You have to click on that arrow in order to access the downloaded pdf file. You are already the second user reporting this issue to me, but I think I am the wrong person to fix it, and it should be solved by the Firefox team.

Yours Dirk

Dirk Hünniger (talkcontribs)

Hi, as soon as the conversion is finished the following message is diplayed.

"Conversion Finished. Click on the arrow in the right upper corner of your browser in order to view the result."

This has been implemented and deployed on the servers.

Yours Dirk

Kghbln (talkcontribs)

Indeed. Firefox silently downloads here. Somehow I have the feeling that it is different from other downloads with Firefox.

> "Conversion Finished. Click on the arrow in the right upper corner of your browser in order to view the result."

Yeah, this will help a lot!

Reply to "pdf download isn' working!"
Johan (WMF) (talkcontribs)

We're seeing some problems with the rendering of single-page PDFs. We're aware of this problem. We're launching the new renderer any day now(tm), which will hopefully solve them.

Steelpillow (talkcontribs)

Can you please clarify, what is the currently deployed renderer and what will the new one be? The current one on en.Wikipedia appears to be produced by Skia/PDF m58 and created by (presumably headless) Chromium. I have found a page on this MediaWiki for Proton, but it is not informative about rollout status. Also, what is Skia/PDF m58 and is it about to change?

Johan (WMF) (talkcontribs)

Hi!

Skia is an open-source graphics library, it's been relevant for us for quite some time so it's not a new thing: https://en.wikipedia.org/wiki/Skia_Graphics_Engine

https://www.mediawiki.org/wiki/Proton is indeed the new renderer. It's been running in the background for testing for weeks, but we haven't actually made the switch yet. (Which I thought we would have. My apologies for saying "soon!" a bit too often. This thing that always seems to be two weeks away.)

Johan (WMF) (talkcontribs)
Steelpillow (talkcontribs)

OK, thanks. Further to that I have now found out that the current renderer is Electron, which also uses Chromium (though whether headless or not I don't know). It take it that it will remain unchanged until the relevant two weeks for Proton finally make their appearance.

Johan (WMF) (talkcontribs)

Yes, sorry, should have clarified that as well.

Reply to "Single-page PDFs"

Upgrade 1.20->1.31.1 Create a Book says "Book Creator is undergoing changes" - Confused

18
96.3.195.68 (talkcontribs)

I posted this to the project support desk page and did not get a reply so I'm trying here.

I am upgrading my MediaWiki to

MediaWiki  1.31.1

PHP        7.2.10 (apache2handler)

MariaDB    10.3.11-MariaDB

When I try and start the book creator I get a page that says:

"Book Creator is undergoing changes"

However, that page links to (here):

https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality

for details.  This page seems to indicate that the book generator is supposed to be operational, but I cannot tell so I don't know what should work and what does not.

What is the current status?  Is this issue that the "Download as PDF" is not working?  Rather creating a PDF via PediaPress should work?   Neither is working for me, but I am familiar with the process as I have extensively used book creation via "Download as PDF" in the past.

Thank you.  Brent

Johan (WMF) (talkcontribs)

Uh, good question. I don't think the book-to-PDF creation should have been included in 1.31.1 but normal PDF creation should, but I'm guessing here. Do you know, @OVasileva (WMF)?

Steelpillow (talkcontribs)

The information in the article does seem to be unclear. PediaPress are currently involved in two different ways:

1. You can create a book, upload it to the PediaPress web site and order print-on-demand physical copies.

2. PediaPress are also rewriting Wikipedia's own PDF book renderer, and while they are doing this it is not possible to create or print a PDF softcopy wikibook. This is the main subject of the recent update posts.

But I do not know what functionality is included in the 1.31.1 build.

Hope this helps a little.

Brentl999 (talkcontribs)

The "Download as PDF" is greyed out (it shows on the page but unavailable for use).

The "Preview with PediaPress" is "available". So I gather it is supposed to work?

If there is a way for me to get on the inside track for enabling/testing an alpha or beta "Download as PDF", please let me know.

Thank you for your replies. Brent


p.s. I might suggest that the www.mediawiki.org/wiki/Reading/Web/PDF_Functionality be more clear about what users can expect to work/not-work in the application as-of a specific MediaWiki release.

Steelpillow (talkcontribs)

The "Preview with PediaPress" is the proprietary print-on-demand option, which is fully working and so is not greyed out.

There is an "alpha"-ish test build of the open-source "Download as PDF" book creator, though it does not yet have the Chapter headings wrapper or anything, at https://pediapress.com/collector

Johan (WMF) (talkcontribs)

That's very much aimed at users of the Wikimedia wikis, yes. I'll see what we can figure out.

Brentl999 (talkcontribs)

I see there is more discussion about this at Topic:Uqod5bg3xaswxjn9. (www.mediawiki.org/wiki/Topic:Uqod5bg3xaswxjn9)

I'm just pasting this here so if someone is reading this thread in the future, hopefully, it will save them time.

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

Yes I found it! Thank you Dirk :)

Brentl999 (talkcontribs)

Just to close the loop on this thread in case someone else stumbles across this. My solution ultimately landed on getting MediaWiki 1.27 up with a version of Collection extension supporting book creation operational. Versions of MediaWiki after 1.27 do not appear to be compatible with the Collection extension that supports book creation.

Dirk Hünniger (talkcontribs)
This post was hidden by Dirk Hünniger (history)
OVasileva (WMF) (talkcontribs)

Hi all, apologies for the late reply. Unfortunately due to low usage, we will only be supporting PDF book creation via Pediapress for Wikimedia projects in the future, which is the renderer that Pediapress are currently working on. That said, the book creation process of the collections extension (everything outside the actual PDF download) are still supported and functional, as is the PDF creation from individual articles. It was a difficult decision for us to make, but fixing the book creator after retiring the OCG rendering service proved to be very complex and to require a lot of technical support in the future. As the usage of the feature was very low, we decided to continue providing the functionality on a smaller scale, and focus on rendering individual PDFs of articles instead.

Dirk Hünniger (talkcontribs)

Hi, thanks for the update. I feel a bit honoured that I do provide a feature ( http://mediawiki2latex.wmflabs.org/ ) in my free time which needs so much programmer time that the WMF decided that they cannot afford to implement it, especially since WMF got a budget of more than $90 million. Furthermore I implemented significant parts of the software when I was shaving cows in a cowshed for approx 5 EUR / hour and could not find any other job.

As a physicist this makes me laugh like looking a measured data that does surely not resemble reality. If you see something like that, something with the way you set up the experiment or with the way you analysed the data has gone terribly wrong. So yes the prophecy of the sect of purely functional programmers is true.

Quoting from the paper http://www.cse.chalmers.se/~rjmh/Papers/whyfp.pdf form 1984 "Functional programmers argue that there are great material benefits - that a functional programmer is an order of magnitude more productive than his conventional counterpart, because functional programs are an order of magnitude shorter." Yours Dirk

Steelpillow (talkcontribs)

When I learn that a book conversion which used to take five minutes on Wikipedia takes Dirk's system a time measured in hours, I wonder how much of that is down to hardware, how much to the chosen programming language, and how much to the code design. Maybe one day Dirk can install the forthcoming Wikimedia solution and see how fast it runs?

Dirk Hünniger (talkcontribs)

I think its because I am using http to get the data and I only do one request at once since the servers might not respond otherwise. So you could get a significant speedup if you connected to the database directly. But there is one large amount of costs which is the compilation with LaTeX, which has to be done 4 time in order to get the references right and that takes a least 20% of the runtime, likely more, so it will still be hours for large books, independent of the programming language and hardware used, in particular you can not parallelize the LaTeX run. You can get the LaTeX source from the mediawiki2latex web server and do your own measurements. You could do it without LaTeX, but you will not get the typographic quality that way.

Dirk Hünniger (talkcontribs)

so I consulted some documentation. The larges book I compiled was roughly 5000 pages it is here

https://de.wikipedia.org/wiki/Benutzer:M2k~dewiki/B%C3%BCcher/Ausgew%C3%A4hlte_Beitr%C3%A4ge_und_Bearbeitungen

and here

https://drive.google.com/file/d/1SA6TEKWrdpXAxDyHZe-umBa2cJ5Ya77X/view?usp=sharing

it took about 9 hours to compile. From other documentation I found that 31% of the runtime are due to the latex compile step on averige.

So such a book will take about 3 hours to compile at minimum, independent of any software use to prepare the LaTeX source.

Steelpillow (talkcontribs)

Perhaps that LaTeX processing overhead is why the Proton project tried to do it with html/mathml via headless Chrome? That was the mistake - it proved the wrong core to build books on. Even the single-article renderer is still warning of compositing problems. With hindsight, refreshing OCG would have been a faster and lower-risk strategy (evolution not revolution, as they say). It will be interesting to see how the PediaPress code performs.

Reply to "Upgrade 1.20->1.31.1 Create a Book says "Book Creator is undergoing changes" - Confused"
This post was hidden by Steelpillow (history)
Reply to "Download PDF"
Dirk Hünniger (talkcontribs)

My brother just sold me his old gaming machine. It got about 7000 Passmark points, 4 cores, max. 32 GByte of ram and it's silent enough so I could run it 24 hours a day. mediawiki2latex currently needs 12 GByte for a 5000 pages book. So it could do 3 or 4 books in parallel. So if you need to convert anything or want an other mediawiki2latex server to be made available just let me know. Yours Dirk

Reply to "computer available"
2003:D8:B3DD:4500:5054:1DAC:A895:20F6 (talkcontribs)

Sehr geehrte Damen und Herren,

Folgende Fehlermeldung kommt, wenn ich die PDF runterladen will.

C:\Users\INSPIR~2\AppData\Local\Temp\OtwFWVVK.pdf.part konnte nicht gespeichert werden, weil die Quelldatei nicht gelesen werden konnte.

Versuchen Sie es später erneut oder kontaktieren Sie den Server-Administrator.

Irgend ein Rat?


Dirk Hünniger (talkcontribs)
109.214.160.197 (talkcontribs)

merci beaucoup pour cet article ça m'a beaucoup aidé pour mon exposer merci infiniment !

Reply to "Kein Download möglich"
90.255.95.165 (talkcontribs)

I am trying to download Pdf's which are probably several pages long. Absolutely no chance of doing so. Instead I receive a Word download which will not open properly and produces streams of code.

I also receive a screen note saying ''network error''

As this download problem has been in existence for sometime could someone find an answer asap

Johan (WMF) (talkcontribs)

The new PDF renderer (for single articles) will hopefully be in production within a couple of weeks; I'll write an update here then.

Dirk Hünniger (talkcontribs)
Reply to "Poor download service"
2001:579:811C:322:D9E9:5FD4:E555:C45E (talkcontribs)

You can download and use "Libreoffice". As the Latin name implies, it is free. Available for the trash (aka microsoft), Linux, Unix, Mac,

VMS and all other systems.

Johan (WMF) (talkcontribs)

They're not really the same use cases, though. Part of the point of the PDF is reliable, consistent presentation in a reasonably aesthetically pleasing manner.

Reply to "Replace PDF System"
PhotographerTom (talkcontribs)
Johan (WMF) (talkcontribs)

We have been putting it off because we don't have that much new to say (although some comments have been left here on the talk page), but we should probably update just to give an idea of what's happening even if it's not much. This should come once https://phabricator.wikimedia.org/T186748 has properly taken over single-page PDF rendering (which should be soon), so that we can do both at one time. (:

Reply to "Update on books"