About this board
About giving feedback
Update: (15 July 2019) We’ve launched the new PDF renderer. We’re looking at feedback, but haven't so far seen any significant issues. We might incorporate some suggestions, but want to note that this is not an ongoing project with continuous development. In other words, now that it's deployed and proven to work, the new renderer is entering maintenance mode. This page won’t be abandoned, but it could take a while before anyone reacts, simply because everyone's got so much else to do.
In terms of books, we've left it in the hands of volunteer developers and PediaPress. We'll be glad to reach out to them with questions, but we're not planning any involvement in terms of the technical implementation.
If Wiki authors that want to create tailored Open Educational Resources especially generate from Wikiversity or Wikipedia can be very helpful as learning resource (see https://pandoc.org/try ) how the cross-compilation works. Client-side conversion based on wtf_wikipedia.js
So this appears to be about an alternative way for a client to import and convert raw wikitext from individual articles, that is wholly unrelated to the PDF export service and, as far as I can tell, from markdown as well. Steelpillow (talk) 09:43, 28 February 2018 (UTC)
It just shows an option to create the PDF on the client side due to problems of PDF generation on the server side. Of course this work around enables the export of even more formats. If that is not appropriate as recommendation in this discussion, excuse me for being off track.
With Wiki2Reveal there is a rapid prototype for converting the wiki markdown on the client side. See an example:
If you want to create PDF on the client side, you can read the wiki markdown and start converting in the browser as runtime environment with existing libraries like https://github.com/MrRio/jsPDF This reduces the load on the server, because just the wiki markdown and the embedded media must be transfered to the client. The server side implementation is available on wmlabs by Dirk Hünniger http://mediawiki2latex.wmflabs.org/ that generates the PDF on the server and delivers the generated PDF to the user. Wiki markdown is converted in LaTeX (that can be done even in the browser) Costly in terms of performance is LaTeX conversion into the PDF. So why not allow the user to perform the final step - if he/she really wants to have a PDF document and the online-wikibook is not possible due to constraints of internet availability in remote areas and e.g. humantarian organisation want to create a tailored WikiBook for capacity building and need to deploy that offline (see tailored WikiBooks for Risk Mitigation) best regards and many thanks for discussing this topic and allowing offline use of Wikipedia and Wikiversity content under CC-BY-SA 3.0 license.
The benefits to PediaPress of Going Open Source
@Ckepper asked about the potential benefits of PediaPress going open source with this project in this thread. I wanted to give them some good takeaways to bring back to their company as it relates to this specific project.
Commercialization of a given project is an important reason to want to keep it closed source. However, I believe it would impede the success of the renderer long term were it to remain closed source. As things stand, I do not think it will be as successful as Extension:Collection because, for starters, it would not be free to install for most. Small wikis would not be able to afford much in licensing. Open source also instills a kind of trust that any large company, nonprofit, or single individual can rely on. It shows you are so confident with your product that you will extend that to showing it for the world to see in its most basic form: code.
On a different note, as is reported on the company's website, "[you] offer consulting, customization, and support for advanced document transformation solutions." This is nothing small right here, and I am confident in that business model. If, however, you believe otherwise, there is still ways to protect copyright without going closed source. In this case, I would look to Chromium for guidance in what potential path you can take. Not every one of your ideas needs to be included in an open source repository, so you can still maintain the parts you want secret or just to yourselves.
The principle rendering service should, however, be available to the public to do bug tests and the like. It's a win-win. Consulting and customization are where the real money is anyways. You could also branch into hosting this rendering service for others similar to how you already offer print-on-demand books to any mediawiki-wiki. Wikis will always need to pay for this if they want the product beyond what is already offered out there.
Finally, it is a strong selling point for a company with such strong ties to the open-source movement! I hope this helps you make the right decision on this matter.
Thank you for your comment. After talking with colleagues and other stakeholders, we have made the decision to release mwlib.html as open source when the project is sufficiently mature. This should help to ensure its long-term viability.
Also, I enabled the new render server so that rendering on https://pediapress.com/collector should work again (and be more stable).
That's awesome news! Major thanks goes out to your organisation for its willingness to do that. If there is anything you all need from the community (like press releases*, bug testing, etc.) please reach out! I just tried the collector on Simple:Spooky Scary Skeletons, and I think it really looks great!! Very elegant! :D
*I run Wikisource News (en) now, so I can help with publishing and writing it!
I have added Wikibooks (en) and Wikisource (en) to the test renderer. The output is still far from perfect but PediaPress was never able to generate PDFs from those sites before.
Hi @Ckepper, is it possible for you to add Wikipedia (ar), Wikibooks (ar) and Wikisource (ar). It's going to be a good test for right-to-left issues.
Hi @Helmoony, I have added Wikipedia (ar). A few years ago (for Wikimania Haifa 2011) we created a LTR export with our old PDF renderer and that was really painful - especially since no one on our team knew Hebrew. You can start playing around with the export, but this is definitely not a priority for us right now.
Thank you Ckepper, I tested the version, it's not working great. When it doesn't show ''Failed to load PDF document.'', errors are mainly: text format should start from right, wikidata-based infoboxes are not showing wikidata data including OSM-based map, some terms need to be translated (e.g. Image Sources, Licenses and Contributors). But at least we know what we need to do now.
The render worked for me just now for arwiki. There are still some RTL issues, but I didn't see the "Failed to load" issue.
Is the source available so that we can contribute?
Not yet, I'd like to clean it up a little bit before making it available.
I hope you can release it soon. The book functionality is needed! Thank you for your quick reply!
I hear you. Maybe I don't do the full cleanup to publish it sooner.
That would be awesome! Ugly code that works is better than no code.
There are, among those of us who use MediaWiki to run KM systems outside of Wikipedia, some absolutely essential extensions whose code is hideous.
I'm glad you want clean code, but I would hope that you can release the code as soon as possible and then clean up the code later.
Yes, absolutely. A buggy alpha release v0.01 is better than no release at all. Thank you so much for keeping on with this work.
@Steelpillow, agree complete. Is there any progress? Current situation April 11, 2020 is at opening Book Creator "Due to severe issues with our existing system, the Book Creator will no longer support saving a book as a PDF." A collaborative work "will always remain freely distributable and reproducible" only if I can export into another free file format like into the most common book format pdf or odt.
@Charis I have not been following progress lately. There is a test server at https://pediapress.com/collector/ which you can try. Otherwise, Ckepper is the best one to ask, as they have been the voice of PediaPress here.
I also posted this on Extension talk:Collection but the failure page when trying to Download to PDF on my wiki lands here, so cross-posting.
Our wiki is running on MediaWiki 1.31.7 and using Collection 1.7.0 (af3a0b8) 14:23, 15 April 2018. The Download as PDF is constantly failing and directing the user to Reading/Web/PDF Functionality which doesn't specifically address the reason for the "Book rendering failed". Reading through Talk:Reading/Web/PDF Functionality doesn't clear up the situation much either. It does seem to indicate there is a new render server available at https://pediapress.com/collector but that doesn't seem to work for non-Wikipedia sites. The existing render server https://tools.pediapress.com/mw-serve/ does seem to still active.
Is the functionality via this extension dead for low traffic sites that don't need or cannot install (i.e shared hosting) their own PDF server?
I got my mediawiki2latex package in ubuntu 20.04 (GPL). PDF generation seems to work fine. Furthermore I got my own rendering server, that also works with non wikimedia sites.
the last LTS version of mediawiki that supported the collection extension was 1.30.x it has been decomissioned accodring to https://www.mediawiki.org/wiki/Version_lifecycle WMF have stopped any development of the collection extension according to https://www.mediawiki.org/w/index.php?title=Topic:Uxkv0ib36m3i8vol&topic_showPostId=uxsjbpkqfmgq1jyx#flow-post-uxsjbpkqfmgq1jyx
I'm still confused, sorry, because that doesn't seem to agree with Extension:Collection which shows
as does Special:Version both here and on Wikipedia, both of which are running on
12:06, 4 May 2020
Reading between this discussion and the Extension:Collection and it's associated talk page doesn't help clarify the status of the extension but more importantly whether there is a render server that low traffic wiki sites can use so that the Download to PDF functionality works.
As ever, there is confusion between the collection extension or Book Creator and the rendering service. The old rendering service, the Offline Content Generator, has been pulled and the promised PediaPress replacement interminably delayed. Development of the collection extension/Book Creator also stopped, but it remains in use. It still generates a trickle of bug reports and issues, so periodically gets looked at to see if anything can be fixed. But this is pure volunteer effort and there seem to be no low-hanging fruit any more. Hope this helps.
what else is there
pandoc: also GPL but might require some lua or haskell programmer to make it work for your case
bluespice: from 2900 EUR per year.
mediawiki2latex server now parallel
the two mediawiki2latex servers are now able to serve requests in parallel. Furthermore mediawiki2latex uses significantly less resources, so even very large books can now be compiled successfully.
Can you clarify, do you mean that the two servers can run in parallel or that each server can run multiple conversion requests in parallel?
the large server can run up to two requests in parallel. The normal can run up to four requests in parallel.
Including Capability for right-to-left Langauges (Arabic)
I tried to creat a book by exporting Arabic wiki pages using the PediaPress previw, as it was not possible to download directly as pdf for the moment due to bug. However, I noticed that the whole content is aligned left-to-right, although it comes from a right-to-left language (Arabic). Would it be possible to consider that and use the "open-righ" function as well (if not done yet)?
book generator removed on wikipedia
The book generator has been remove from the English Wikipedia
It has not been removed, but user interface links to it have been and notices with incorrect statements added to many pages.
mediawiki2latex mass procduction / testing
I am currently running a test on all community maintained books on the English Wikipedia. approx 5000 in total. Currently I got 283 pdf files. In 20 cases no pdf was produced, some pdfs are more that 4000 pages long. If anybody can provide webspace I will happily upload them. We could later link to them from Book namespace in Wikipedia.
I uploaded the first 100 resulting pdfs.
more will not work due to a lack of webspace
I uploaded more than 500 pdfs with images stripped to work around the limited webspace. See here:
Currently I got more than 1000 pdfs on my local disc. Those also contain images. Currently the chance that mediawiki2latex fails on a community maintained book is about 6.5%. merry xmas
From the statistics gained from this experiment I deduced that a full rebuild of all community maintained books will take less then a month and cause server costs of 320 EUR. https://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf/manual#Wikipedia_Books
as request for storage to upload the pdfs has been filed
I followed the idea to look into possibilities for speeding up mediawiki2latex. I came up with C code based on html tidy, which needs more than a factor of 14 less wall clock time than mediawiki2latex.
The resulting file is here:
Thank you very much for your work on mediawiki2latex. I really like it, and the output looks amazing.
This looks promising. Could you automate the whole algorithm: build the book in C/Tachyon, then generate the contributions list and append it with Haskell/mw2latex? That is, use modules in C as separate accelerators for the overall Haskell process?
No, that's not the route to go. 50% of the runtime of mediawiki2latex is used for the generation of the lists of contributors and figures. This is because the information is extracted from the pages histories of all pages and all images in the book.
The way to do this right is to directly query the SQL database. As far as I understand WMF allows for that. But the is a problem: my second name is Hünniger which contains the letter ü, which is impossible for the horizon web interface of WMF. So I am sorry to say that this is not possible until new software get installed on horizon, which might take years, since the problem already exists for years.
Another 30% of the runtime of mediawiki2latex is needed for the images. In mediawiki2latex I download the images in the maximum possible resolution and scale the down to 300 dpi and include these rather large images in LaTeX which causes the LaTeX compiler to spend more than half of its total runtime on images, since it is doing a time consuming recompression when embedding the images, which cannot be changed according to the German LaTeX mailing list.
In tachyon I download images in a much lower resolution and don't do any image processing which speeds up things significantly. So a lot of the speedup in tachyon actually come from leaving out contributor information and using lower resolution images.
The actual runtime of the tachyon C code accounts for less that one percent of the actual runtime of a single tachyon run. The rest of the time is needed by wget to download to images and html pages and by xelatex to create the pdf. So tachyon is more an experiment to measure a lower bound to the runtime of such a conversion than the future route of mediawiki2latex.
So to wrap it up. Moving from Haskell to C can make that part of the program significantly faster, but even if we did that we would only affect 20% of the total runtime, since the rest of the runtime is spend by auxiliary programmers, not under my control. So what we can get at most is a 20% speed up when porting to C. If we did direct database querys we will get a 50% speed up for free. An other way is using lower resolution images, but I doubt many people will be happy with that. But yes, if all these issues are solved, we could still go to C.
Thank you for the explanation.
The umlaut issue is just the kind of reason why alias accounts are sometimes allowed. Would it help if you created a second account using the "Huenniger" spelling?
yeah this is likely to help. But it would require some administrative artistry to get the right privileges for that new account. I have not yet decided if I really want to do that. But basically that is a possible solution
mediawiki2latex now with sidebar integration
mediawiki2latex can now easily integrate into the sidebar on any mediawiki installtion. Just copy the code in https://en.wikibooks.org/wiki/User:Dirk_H%C3%BCnniger/common.js to the common.js in your user namespace on your wiki. Of course you may also integrate it globally by modifying MediaWiki:Common.js on your site.
mwlib.pdf renderer available on Github
after some cleanup, we just put the `mwlib.pdf` MediaWiki to PDF renderer on our Github account: https://github.com/pediapress/mwlib.pdf
Unfortunately, the renderer still requires Python 2.7 because it depends on mwlib. Once mwlib has been upgraded to Python 3, it won't be difficult to upgrade the renderer as well. But as you will see, the renderer still requires substantial work.
It's been a very long time since PediaPress released a new renderer so it's very possible that some elements might not work. Please file a bug or share your feedback here if you run into problems.
PDF download is greyed out
I did read the warning at the top, but the update said it was working and deployed, so I went ahead and made a book.
I can't choose a download format (greyed out), so I can't d/l a pdf (also greyed out).
And I can't save all my work to my user location of User:Gemlog/Books/ nor to https://en.wikipedia.org/w/index.php?title=Special:PrefixIndex&prefix=Book:
Both produce an API error.
[Xb@-swpAIC4AALGGnJMAAAAS] 2019-11-04 06:05:39: Fatal exception of type "ApiUsageException"
I can, of course, give money to PediaPress. That link works perfectly and the books look amazing.
It would be a wonderful thing if the pdf worked like the July 2019 note says though...
any rendering functionality of books or collection to any downloadable format has been decommissioned. Any funds for any development of a replace or repair of any such functionality have been withdrawn. To say it the German language used by the miners in the area I live in: "Et is im Aasch". I try to develop a free alternative in my free time without any funding. https://mediawiki2latex-large.wmflabs.org/ Good Luck
Thank you very much for replying me!
The note to the right of this page is extremely misleading to say the least. Well. Now I know not to bother.
However, I may have just learned of a new tool! So there's that :-)
KDE Neon can't find wb2pdf with apt, but I'll find it.
The page pdf renderer has been updated and deployed, the Book pdf renderer has been decommissioned. On a Book page this can be misleading, as the "Download as PDF" link only downloads the page and not the whole book. On the other hand, it should not be greyed out and you should also be able to save your new page to your user pages or the Book: namespace as desired.
If your experience differs from this, can you give more precise details?
Another volunteer is writing a new Book pdf renderer and says they will release it as open source for us, but we have been waiting a long time.
I pasted the errors I received into the first post I made ;-)
Also, I see that the misleading box on the right of this page that I was referring to is now gone, so... yay :-)
Still we need more precise information. I cannot find a book "PDF Download" option you say is greyed out. Can you give the url of the page you see it on? Or, is it the "Download as PDF" Print/export option in the lefthand menu (which is for article download, not whole books)? Was it perhaps in the strange misleading dialog that vanished? If you do not tell us accurately where it is, we cannot diagnose it for you!
Again, when you received the error message you pasted, was this in the Book Creator when you tried to save the book? I just created and saved a new book and it all worked fine. Did you add any extra code to your book, such as chapter headings or meta-information? If you post a list of the articles in your book, I can try to see if it will work for me.
In Book Creator, there is a "PDF Download" option in a box to the lower right that is greyed out and cannot be used. There is really no simpler way to explain it.
Do you you mean the "Download" box which offers several formats besides PDF? In English, quote marks indicate exact wording. Yes, as I explained above, that is meant to be greyed out.
Otherwise, please post or email me a screenshot to show the option I am not seeing on my PC.
In English, superfluous pedantry is insulting. Please insert that in your "Download" box. Thank you.
My apologies, no insult is intended. I suppose that my approach to problem diagnosis is highly pedantic, but I get better results that way. May I take it that you have no problem with this software which remains to be diagnosed.