Talk:Reading/Web/PDF Functionality

Jump to navigation Jump to search

About this board

About giving feedback

Please read Reading/Web/PDF Functionality and comment on the plans we lay out there, to tell us what you need from the PDF service. We're especially interested in what you need in the future that doesn't exist in the plans laid out there – if there's a bug with something that should work right now (e.g. you get an error message when you try to create a PDF), we need to fix it, of course, but that would have been on the agenda.

Update: (23 April 2018) PediaPress will take over the development of the books-to-PDF functionality. See Reading/Web/PDF Functionality for more information.

Updates: (24 February 2018)

- Kerning and spacing issues (https://phabricator.wikimedia.org/T178665): there has been a few reports on spacing issues within PDF rendering. The readers web team is currently looking into a solution. We will first be updating the fonts for PDFs (https://phabricator.wikimedia.org/T181200) over the week of November 27. This will resolve some but not all of the spacing issues. We'll be looking further into the remaining issues after the initial fix.

85.155.60.126 (talkcontribs)

A mi bo se me descarga nada , que estafa y a demas cuando se "descarga" aparace una imagen de una i y no aparece la información os voy a denunciar y no lo deigo en broma

Johan (WMF) (talkcontribs)

I'm not really sure what you refer to here, I'm afraid.

Dirk Hünniger (talkcontribs)
Reply to "Nada"

Ready for single-page PDF Render Function

3
ChEbama87 (talkcontribs)

I am very interested in the single-page PDF render function!! This is of great importance to me and several of my friends. We have been reading about this and trying the button offered on the main pages. I love Open Source !! I have been using Open Source operating systems and tools since 1995. I helped the "fight" for the opening-up of what is now FireFox. This is directed toward PediaPress: There are many advantages to going Open Source. I know RedHat has made a LOT of money, even before the $34 billion IBM purchase.

Steelpillow (talkcontribs)

I have found that my Firefox browser makes a better job of rendering single articles than the current Wikipedia renderer. The content is *exactly* the same, with the extraneous wrapper and other in-page unprintables stripped out, but it is more cleanly laid out by Firefox. (Be warned that at the time of writing, many Firefox add-ons have been disabled for the last couple of days, while a fix for a security certificate blunder is sought).

I begin to wonder whether adopting and supporting their rendering engine might not be a better route that constantly reinventing unsatisfactory ones.

Dirk Hünniger (talkcontribs)

yes, might be a good idea. Especially since it is an open source one and the Mozilla foundation tries to keep it open source. But if you prefer something commercial you could also use http://www.unipublishing.com/wb2pdf/ . But of course it is much more fun if you develop one yourself.

Reply to "Ready for single-page PDF Render Function"

The benefits to PediaPress of Going Open Source

7
MJL (talkcontribs)

@Ckepper asked about the potential benefits of PediaPress going open source with this project in this thread. I wanted to give them some good takeaways to bring back to their company as it relates to this specific project.

Commercialization of a given project is an important reason to want to keep it closed source. However, I believe it would impede the success of the renderer long term were it to remain closed source. As things stand, I do not think it will be as successful as Extension:Collection because, for starters, it would not be free to install for most. Small wikis would not be able to afford much in licensing. Open source also instills a kind of trust that any large company, nonprofit, or single individual can rely on. It shows you are so confident with your product that you will extend that to showing it for the world to see in its most basic form: code.

On a different note, as is reported on the company's website, "[you] offer consulting, customization, and support for advanced document transformation solutions." This is nothing small right here, and I am confident in that business model. If, however, you believe otherwise, there is still ways to protect copyright without going closed source. In this case, I would look to Chromium for guidance in what potential path you can take. Not every one of your ideas needs to be included in an open source repository, so you can still maintain the parts you want secret or just to yourselves.

The principle rendering service should, however, be available to the public to do bug tests and the like. It's a win-win. Consulting and customization are where the real money is anyways. You could also branch into hosting this rendering service for others similar to how you already offer print-on-demand books to any mediawiki-wiki. Wikis will always need to pay for this if they want the product beyond what is already offered out there.

Finally, it is a strong selling point for a company with such strong ties to the open-source movement! I hope this helps you make the right decision on this matter.

Ckepper (talkcontribs)

Thank you for your comment. After talking with colleagues and other stakeholders, we have made the decision to release mwlib.html as open source when the project is sufficiently mature. This should help to ensure its long-term viability.

Also, I enabled the new render server so that rendering on https://pediapress.com/collector should work again (and be more stable).

MJL (talkcontribs)

That's awesome news! Major thanks goes out to your organisation for its willingness to do that. If there is anything you all need from the community (like press releases*, bug testing, etc.) please reach out! I just tried the collector on Simple:Spooky Scary Skeletons, and I think it really looks great!! Very elegant! :D

*I run Wikisource News (en) now, so I can help with publishing and writing it!

Ckepper (talkcontribs)

I have added Wikibooks (en) and Wikisource (en) to the test renderer. The output is still far from perfect but PediaPress was never able to generate PDFs from those sites before.

Helmoony (talkcontribs)

Hi @Ckepper, is it possible for you to add Wikipedia (ar), Wikibooks (ar) and Wikisource (ar). It's going to be a good test for right-to-left issues.


Ckepper (talkcontribs)

Hi @Helmoony, I have added Wikipedia (ar). A few years ago (for Wikimania Haifa 2011) we created a LTR export with our old PDF renderer and that was really painful - especially since no one on our team knew Hebrew. You can start playing around with the export, but this is definitely not a priority for us right now.

Helmoony (talkcontribs)

Thank you Ckepper, I tested the version, it's not working great. When it doesn't show ''Failed to load PDF document.'', errors are mainly: text format should start from right, wikidata-based infoboxes are not showing wikidata data including OSM-based map, some terms need to be translated (e.g. Image Sources, Licenses and Contributors). But at least we know what we need to do now.

Reply to "The benefits to PediaPress of Going Open Source"
2806:10AE:9:911F:DC71:B46F:825E:2C66 (talkcontribs)

com

o que no funciona

Johan (WMF) (talkcontribs)

What doesn't work?

Reply to "como que no funciona"

Long times or timeout with dl as pdf...

1
68.98.170.156 (talkcontribs)

I've complained often that dl as pdf either times out or takes up to 5 minutes from the time I click dl as pdf to the save dialog to appearing. It's doing it again.

It's great that you folks are working on books and so on but why not fix - once and for all - the basic dl as pdf so it works quickly and it works every time.

As a matter of priorities it seems to me a bugless "dl as pdf" should be at the top.of the list. If this were a commercial site it would be. If Amazon had a bug in such a basic functionality Walmart would get a big bump in business...

Reply to "Long times or timeout with dl as pdf..."
Dirk Hünniger (talkcontribs)

My brother just sold me his old gaming machine. It got about 7000 Passmark points, 4 cores, max. 32 GByte of ram and it's silent enough so I could run it 24 hours a day. mediawiki2latex currently needs 12 GByte for a 5000 pages book. So it could do 3 or 4 books in parallel. So if you need to convert anything or want an other mediawiki2latex server to be made available just let me know. Yours Dirk

Br shadow (talkcontribs)

Thank you Dirk for your work and dedication.

Reply to "computer available"
151.100.135.42 (talkcontribs)

I tried to convert the "network analyzer (electrical)" English Wikipedia page to PDF. The resulting PDF is written in German and is about "Aloisius von Gonzaga".

Johan (WMF) (talkcontribs)

Uh. That sounds really strange. Thank you for reporting. Do you speak German, so that is in any way possible that you could have visited the Aloisius von Gonzaga article?

Steelpillow (talkcontribs)

I just saw this post so I downloaded a PDF from the same page. It came through as the correct one. Steelpillow (talk) 14:16, 21 November 2018 (UTC)

Reply to "Wrong page is converted to PDF"

When will this be ready?

12
Summary by MJL

Answered

ARZ100 (talkcontribs)

I'm just wondering if anyone knows about when this will be ready for use on Wikipedia?

Steelpillow (talkcontribs)

Mañana

ARZ100 (talkcontribs)

Mañana? Google translate says that means "morning" in spanish. If that is true what do you mean by "morning"?

Steelpillow (talkcontribs)

"Mañana" is a traditional reply when meaning to imply "I have no idea but probably not for a long time, if ever."

ARZ100 (talkcontribs)

is a what?

Steelpillow (talkcontribs)
Johan (WMF) (talkcontribs)

Depends on whether you mean the book-to-PDF function or the single-page PDF renderer, @ARZ100. The book renderer? Depends entirely on the volunteer developers who took over. The Foundation looked at number of daily downloads and pretty much decided that it couldn't defend assigning the resources necessary at the expense of other projects when the solution for single-page PDFs didn't work, which meant that PediaPress took over, and they're handling it in their spare time. The single-page PDF renderer? That's a different thing.

ARZ100 (talkcontribs)
Johan (WMF) (talkcontribs)

@ARZ100, honestly, I have told you nothing of substance. But are you interested in the single-page PDF renderer (that is, just for one article) or the books-to-PDF renderer (that is, when you take more articles and put them together and then make a PDF out of them)?

MJL (talkcontribs)

@Johan (WMF) I am not @ARZ100, but I will say that I am exclusively interested in the books-to-PDF renderer. I've assumed watchlisting Reading/Web/PDF Functionality would be the best way to get updates on that. Am I right about that?

ARZ100 (talkcontribs)

@Johan (WMF) you have answered my question. I don't need any more information.

Johan (WMF) (talkcontribs)
Steelpillow (talkcontribs)

This is getting truly ridiculous. I am embarrassed for this Foundation from Jimbo down. Once again we have heard absolutely nothing - nothing - save a few false promises of announcements that never get made, of deadlines that are as believable as the flying spaghetti monster and pass into history quicker than a mayfly in summer. What is going on with this PDF book renderer, then? Where is the publicly accessible repos of this supposedly open-source code? Where is the opportunity for other developers to do the open-source thing and help the code along? Will somebody please, please, please tell us WTFlip!

Jdforrester (WMF) (talkcontribs)
Steelpillow (talkcontribs)

Thanks for the kind thought but the Chromium solution has been abandoned for books (being kept only for articles) in favour of the PediaPress solution. If you can find a link for that, you will indeed be my hero.

Jdforrester (WMF) (talkcontribs)

That's interesting, and the first I've heard of this. Can you provide a link?

TheDJ (talkcontribs)
Jdforrester (WMF) (talkcontribs)

Oh, curious.

Dirk Hünniger (talkcontribs)

as detailed on the above linked page, a new open source pdf renderer is going to be provided by PediaPress. I would like to look at the source code to see if there is a dependency to the mwlib library (which is also developed by PediaPress). Such a dependency would cause a stoppage of security updates due to the decommissioning of Python 2 on 1st January 2020, rendering the system undeployable.

Johan (WMF) (talkcontribs)

Last we spoke to PediaPress, they were aiming for end of the calendar year. I've pinged them in email to let them know questions are being asked about the books-to-PDF functionality.

Steelpillow (talkcontribs)

Thank you Johan. However this is not my first request, or your first acknowledgement, since the end of that calendar year. Could you also ask them to give details of the repos for their new renderer, so that we can confirm it is open source and can see what is going on without pestering the developer unduly? Dirk has already asked in another thread below here, but has not been answered.

Johan (WMF) (talkcontribs)

Yes, I mentioned that too.

Johan (WMF) (talkcontribs)

But I'd like to remind everyone that if you've got a beef with how long this is taking, the only sensible target of that is our (the WMF) priorities in not assigning resources to this particular functionality, not volunteer developers.

(We think it makes sense, of course, looking at what we'd have had to defund otherwise, or we wouldn't have prioritised that way. But still.)

Steelpillow (talkcontribs)

My beef with the current developer is not the length of time, which they give freely, but the lack of visibility of what is supposed to be an open-source initiative. They really ought to be giving that visibility freely, too. Both periodic communications, even if only "Sorry, I've been busy elsewhere this quarter", and sight of the repository would help a lot. For example a developer going quiet usually means a developer not making progress. And sight of the architecture is pretty darn important if the WMF want to de-risk yet a third fiasco in a row. Do you?

Ckepper (talkcontribs)

I am not sure if I missed a previous post but I don't have any intention to be secretive on purpose. The past quarter was indeed really busy and the next two will most likely be as well. Nevertheless, I intend to continue and finish the project. Part of the project relies on 10+ year old PediaPress infrastructure that needs to be upgraded before doing the next steps. One of the servers was already upgraded two weeks ago but we need to setup an additional new render server for the project and stabilize the whole render process. This needs time for investigation and fixing and I don't know when I will find this time. I am sorry about the delays.

Steelpillow (talkcontribs)

@ckepper Thank you for the update. Is the code that you have written open-source? Is the repository accessible to third parties? If not, are there any plans for that? Steelpillow (talk) 07:42, 12 April 2019 (UTC)

Ckepper (talkcontribs)

No, the code is not open source. Since we are using a commercial rendering component and plan to run the service on our own infrastructure, we didn't really see a need to open source it. But this might change of course - if people have a long-term interest in contributing to the project or if PediaPress could no longer operate it...

Also, PediaPress potentially might offer customized PDF rendering for non-WMF projects (think enterprise wikis) as a commercial product. Open sourcing this project would expose many of our ideas and eventually let us loose our edge in this field. However, especially open source often work very successfully with such a model. As you can hear, this is not an easy decision for me and I have to think about it.

What are your reasons for asking to open source the project?

Steelpillow (talkcontribs)

This on the home page of the Wikimedia Foundation: "From site reliability to machine learning, our open-source technology makes Wikipedia faster, more reliable, and more accessible worldwide." and on the linked Technology page: "We keep Wikimedia projects fast, reliable, and available to all." I do not understand how this vow can be honoured if the book creation function is closed source.

Also of course, if you guys shut up shop for any reason (these things happen), then who is to maintain our book renderer, and how?

Perhaps one of our WMF participants such as Jdforrester (WMF) or Johan (WMF) could answer these points? Steelpillow (talk) 12:46, 12 April 2019 (UTC)

Jdforrester (WMF) (talkcontribs)

I'm entirely uninvolved in this project. You showed up spreading profanity and I tried to help you, but I failed because you hadn't explained your concern and I was answering the wrong question. Sorry.

Steelpillow (talkcontribs)

It seems that you have no intention of clarifying the WMF's position. Let us hope that somebody a little more civil will be able to.

Johan (WMF) (talkcontribs)

@Steelpillow, as @Jdforrester (WMF) says, he doesn't work on this project, and that means he's got about the same level of access to information as you have. He's not obfuscating. We're not a monolith. There's no internal collection of decisions.

I've pinged the person best suited to reply to you and pointed them to this thread.

Johan (WMF) (talkcontribs)
Dirk Hünniger (talkcontribs)

Hi,

to my understanding the task to create PDF versions (as well as a few other file formats) of Wikipedia Books is currently solved by mediawiki2latex, which is currently functional, which is open sourced, and only depends on open source components, which can be understood be seeing that it is part of Debian, which may be used online, but may also be installed locally on the most common operating systems, of which can easily assured that it does not depend on python 2 by looking at the source code.

Furthermore it is not clear whether or not a functional alternative will be developed, if it is going to be open sourced, if it will be possible to install it locally, or if it will depend on any non open source components, or if it will depend on python 2 since the source code is currently not available.

So to me it is clear that it is necessary for me to keep on developing mediawiki2latex, although I know about other things I could do in my free time.

Yours Dirk

Steelpillow (talkcontribs)

Thank you Dirk. When we consider that your Haskell implementation was refused due to the suggested problem of finding programmers able to provide alternative support, the adoption by WMF of a closed-source core for which alternative support is genuinely impossible, becomes less easy to understand. Could the WMF please explain their thinking on this a little more fully than they have done in the past? Steelpillow (talk) 17:38, 12 April 2019 (UTC)

TheDJ (talkcontribs)

Causation and correlation. Both are expensive for the foundation. One just a little more so.

I think that is pretty much the point here. Rendering HTML to books at the level of quality our community would expect, is just not something the foundation wishes to spend its money on. Thats a valid pov, even if a few people disagree about it. Considering pediapress also doesn't seem too keen to throw lots of money at it again, that isn't totally crazy.

Steelpillow (talkcontribs)

OK, so we now know that the books project is on a very slow train to a proprietary solution that cannot be maintained by the community, and the WMF are happier with that than any other option open to them.

I am surprised to find that the maintainability issue, which killed both the old OCG and adoption of Dirk's Haskell solution, is no longer seen as an issue worth addressing. Thank you for making that plain.

My thanks too to all those who have given constructive replies to the questions I raised.

TheDJ (talkcontribs)

"the WMF are happier with that than any other option open to them" well happy is overstating I think. The foundation has limitations and needs to make choices. See also why they halted much of the work going into Maps and Graphs at some point. There is a considerable over ask and an eternal backlog of problems and wishes that could be worked on. To handle all of it, you could probably higher a 2500 people and still only barely keep up. Contrast this with the fact that people already complain that WMF spends too much money on developers and programs and you have identified the primary problem in my point of view.

MJL (talkcontribs)

@Ckepper, hey listen. It's important you try your best. Thank you for the good work you do, and please don't be discouraged if people get frustrated. We're all only human.

I appreciate this response! :)

Reply to "WTFlip?"
Summary by MJL

Just a pleasant IP comment.

223.231.76.176 (talkcontribs)

I don't have any problem.... What I want is just the writing in an Ordered way..and I hope the present pdf file format does this Job....

I would just Cheer up wikipedia for Presenting Collections of information on different Subjects/ fields :)

Plz keep doing this ..Wiki ")