Talk:Reading/Web/PDF Functionality

Jump to navigation Jump to search

About this board

About giving feedback

Please read Reading/Web/PDF Functionality and comment on the plans we lay out there, to tell us what you need from the PDF service. We're especially interested in what you need in the future that doesn't exist in the plans laid out there – if there's a bug with something that should work right now (e.g. you get an error message when you try to create a PDF), we need to fix it, of course, but that would have been on the agenda.

Update: (23 April 2018) PediaPress will take over the development of the books-to-PDF functionality. See Reading/Web/PDF Functionality for more information.

Updates: (24 February 2018)

- Kerning and spacing issues (https://phabricator.wikimedia.org/T178665): there has been a few reports on spacing issues within PDF rendering. The readers web team is currently looking into a solution. We will first be updating the fonts for PDFs (https://phabricator.wikimedia.org/T181200) over the week of November 27. This will resolve some but not all of the spacing issues. We'll be looking further into the remaining issues after the initial fix.

New Render Server for PDF generation

59
Ckepper (talkcontribs)

Please check out a new test server for PDF generation hosted by PediaPress. It is using a new renderer, we have been working on for quite some time and allows you to create multi-article PDFs. Currently, you create a collection from one of the following projects: Simple English Wikipedia, EN, DE, ES, FR, IT, NL, PT, and SV Wikipedia. Other projects can be added as well on a per-request basis. After selecting a Wiki you can start adding articles. Open https://pediapress.com/collector/ again to select a different wiki.

Please note that this project is still in alpha-state. This means that we haven't tested too many articles and the rendering of an article might fail for non-obvious reasons with no clear error messages.

Also, rendering is far from perfect. You will very likely find lots of things you don't like (but maybe also a few things you do). Please share your experiences and let us know about the most glaring problems you encounter.

Br shadow (talkcontribs)

Still cannot understand why would you guys spend so much time creating this functionality when I could simply print the article page as a pdf using my browser... the only thing that would be useful (and actually VERY useful but you chose to remove it) is a double column pdf rendering service...

Brentl999 (talkcontribs)

To your question about why time is being spent on rendering, versus printing from a browser tab. Printing from a browser to PDF does not generate a "book" (cover, table of contents, paging, etc). I depend on book generation to produce print ready documents/manuals from my Wiki. Therefore, I sit on the opposite end of the question, I can't figure out why such a critical component has not been provided for 2 years.

M2k~dewiki (talkcontribs)

Hello Ckepper, great job! Thanks for all the effort and hard work you put in this, I really appreciate it. Especially since I understand, how difficult it must be, to find a general approach to render and display all possible kind of layouts. One problem I find in this alpha-version seems to be, that some images are not displayed, for example when trying to create a PDF for de:Landesregierung Mikl-Leitner I or de:Verena Altenberger. Also some rows of the table in the first example are omitted / missing in the PDF. In the final version, I would like to be able to create a PDF for a predefined book, like it was possible in the version before, for example for de:Benutzer:M2k~dewiki/Bücher/Ausgewählte_Beiträge_und_Bearbeitungen respectivley de:Kategorie:Wikipedia:Bücher.

Hello Br shadow, in previous versions, it was possible to generate and download a hole book consisting of several articles grouped into chapters, not only one single article. For examples see de:Wikipedia:Bücher / de:Kategorie:Wikipedia:Bücher.

Ckepper (talkcontribs)

Thank you for you feedback M2k-dewiki, just a few notes:

  • The renderer respects the license information in the images as accurately as possible. Version 4.0 of the CC licenses is not yet listed in mwlib and therefor those images are rejected. It should be fairly easy to add those licenses.
  • Chapters will be fully supported. It's currently just not possible to add them in the prototype frontend. Here is an example of chapters. Chapters are currently shown only in the TOC and not in the rendered article pages. They will be presented there as well, once the design for chapters has been finalized.
  • The renderer is built on the existing format of Wikipedia Books (Collection Extension). All existing collections (books) should work without modification in the new renderer. If you are interested in a particular sample, I can render it offline and share a link.
  • Our goal is a direct integration of the renderer in the collection extension so that you can choose "Download PDF" from the "Manage your book" page again and don't have to go through any PediaPress pages.
Dirk Hünniger (talkcontribs)

Standard announcement: All rows and all images in both articles work with http://mediawiki2latex.wmflabs.org/ . For Ausgewählte_Beiträge_und_Bearbeitungen you won't to able to get any result with the mediawiki2latex webinterface because of the time limit of one hour. But I started the command line version like this.

mediawiki2latex -k -u https://de.wikipedia.org/wiki/Benutzer:M2k~dewiki/B%C3%BCcher/Ausgew%C3%A4hlte_Beitr%C3%A4ge_und_Bearbeitungen -o dirk.pdf

We will see what comes out. Yours Dirk

Brentl999 (talkcontribs)

Hello Dirk and Thank you. I'm getting ready to install your project and test it.

I have a couple of questions:

- Is this project the to be the official backend for "Download to PDF" book generation that ships with MediaWiki at some point in the future? Or, is this your effort to fill the current gap in MediaWiki's book generation system?

- Does the project support the historical book, chapter, etc collection format saved by previously working versions of MediaWiki? Or, do I need to recreate my book structures?

Thank you. Brent

Dirk Hünniger (talkcontribs)

Hi,

I think it will never be the official backend and I was hoping that an official backend might be developed. But for many years that hope has not turned into reality, although there were two or three official attempts to redevelop an official backend with different technologies each time. I don't know the future, but it very well possible that mediawiki2latex will fill the gap for many years to come. Nevertheless mediawiki2latex is fully open source and it could in the end turn into the official backend.

You can run mediawiki2latex on a collection like this:

mediawiki2latex -u https://en.wikipedia.org/wiki/Book:River_martin -o output.pdf -k

the -k switch is for collections. It essentially follows all links in a non recursive manner. I hope this is enough for your case, but I could still extend that if you need more. Still I got a full time job and won't be able to react quickly.

Yours Dirk

Brentl999 (talkcontribs)

Thank you. I am learning more as I dig into the install. If I can contribute to the effort, I will (it has only been about 30 years since I have seriously used LaTex). Is this a good forum for asking questions? Currently I'm working through latex dependencies for CentOS 7 which I hope to post for anyone else doing a CentOS 7 compile. I just want to be sure I'm placing my comments, feedback, etc in an appropriate place. Brent

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

Thank you. I'm working through it. The ghc and cabal dependancies are taking me some time. Most components are available in the epel repository, but some I'm having to download and build.

Dirk Hünniger (talkcontribs)

Hi,

you can also build with "cabal install". This should install all build time dependencies automatically.

Yours Dirk

Brentl999 (talkcontribs)

Yes, I figured that out. The version of the cabal that is in the CentOS 7 epel respository preceeds 1.18.2 and that side tracked me for a while. Currently I'm stuck with http-client-0.5.12 not installing:

[ 7 of 19] Compiling Network.HTTP.Client.Types ( Network/HTTP/Client/Types.hs, dist/build/Network/HTTP/Client/Types.o )

Network/HTTP/Client/Types.hs:339:73:

    No instance for (Semigroup Builder) arising from a use of `<>'

    Possible fix: add an instance declaration for (Semigroup Builder)

    In the second argument of `RequestBodyBuilder', namely `(x <> y)'

    In the expression: RequestBodyBuilder (i + j) (x <> y)

    In a case alternative:

        (Left (i, x), Left (j, y)) -> RequestBodyBuilder (i + j) (x <> y)

Failed to install http-client-0.5.12

Brentl999 (talkcontribs)

ghc --version :

The Glorious Glasgow Haskell Compilation System, version 7.6.3


cabal --version:

cabal-install version 1.16.1.0

using version 1.16.0 of the Cabal library

Dirk Hünniger (talkcontribs)

Hi,

this is a problem in a library that I have as a dependency. You could ask its maintainer to resolve the issue. An other thing you can try is to modify the mediawiki2latex.cabal file and try to force it to use an other version of http-client. The dependency might also be transitive. Thats is mediawiki2latex does depend on something that depends on httpclient. Here on Debain everything worked fine with cabal.

Yours Dirk

Brentl999 (talkcontribs)

What versions of ghc and cabal-install are provided on your version of Debain? Thank you. Brent

Brentl999 (talkcontribs)

I ended up pulling ghc 8.0.2 and cabal-install 1.24.2.0 from this repository: petersen/ghc-8.0.2 Copr repo (EL7)

And I did get mediawiki2latex to compile.

There is some incompatibility betweem several mediawiki2latex dependent components and the ghc available via the CentOS 7 repository.

Dirk Hünniger (talkcontribs)

Hi,

Congratulations. Now you just need to install the runtime dependencies. I got cabal --version => cabal-install version 1.24.0.2 and ghc --version => The Glorious Glasgow Haskell Compilation System, version 8.0.2 So that seems to be just the same that you used now.

Yours Dirk

Dirk Hünniger (talkcontribs)

Hi,

runtime dependencies are: librsvg2-bin, imagemagick,

fonts-freefont-ttf, texlive-xetex, texlive-latex-recommended,
texlive-latex-extra, texlive-fonts-recommended, texlive-fonts-extra,
cm-super-minimal, texlive-lang-all, poppler-utils,
lmodern, texlive-generic-recommended, latex-cjk-common,
fonts-cmu, ttf-unifont, fonts-wqy-zenhei, calibre, latex2rtf, libreoffice

Yours Dirk

Dirk Hünniger (talkcontribs)

Hi,

If mediawiki2latex already compiles. Then you could be able to get a latex document using the -c command line option, even if the runtime dependencies are not yet fulfilled. You can then try to make that latex document compile. In this process you will install most runtime dependencies.

Yours Dirk

Brentl999 (talkcontribs)

Hello Dirk,

I am running the test case:

mediawiki2latex -u https://en.wikipedia.org/wiki/Book:River_martin -o output.pdf -k

I am getting an error, here is the last few lines of the messaging provided by mediawiki2latex:

mediawiki2latex (1547481318.678343827s): generating PDF file. LaTeX run 1 of 4

mediawiki2latex (1547481319.864715348s): generating PDF file. LaTeX run 2 of 4

mediawiki2latex (1547481321.058693473s): generating PDF file. LaTeX run 3 of 4

mediawiki2latex (1547481322.189036007s): generating PDF file. LaTeX run 4 of 4

mediawiki2latex: main.pdf: openBinaryFile: does not exist (No such file or directory)

Any suggestion on how to fix this?  The process runs for some time before getting this error.

Thank you.  Brent

Dirk Hünniger (talkcontribs)

Hi,

apperently no pdf file was produced. Possibly no xelatex is not installed. Or there is something wrong with the fonts (not installed, or not in the same paths as on Ubuntu)

You could try:

 mkdir test
 mediawiki2latex -u https://en.wikipedia.org/wiki/Book:River_martin -o output.pdf -k -c test

and then have a look if any files were created in the test directory.

During the weekend I was able to reduce memory consumption. So possibly I can set up a test server for up to 800 pages later this week.

Yours Dirk

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

xelatex is installed. When I run this in test mode, I get a directory structure within "test" that includes some 41 files. Is there anything I should specifically be looking for?


CentOS fonts go into /usr/share/fonts. Specifically unifont /usr/share/fonts/truetype which appears to be the same as BaseFont.hs requires. Clearly I could have missed something here, but I did review/follow the font install procedure.


Ultimately I need to have my own instance of this working.  The pdfs I generate contain intellectual property and therfore cannot be rendered by a 3rd party.  I have had the process working on MediaWiki 1.15.1 for many years, but an upgrade to that environment is long overdue. So that is the background on why I'm pursuing this.


Thank you

Dirk Hünniger (talkcontribs)

Hi,

the in the created directory there should be a file called main.tex go to the directory where it resides and run

 xelatex main.tex. 

I suspect that xelatex will show some error messages.

Yours Dirk

Dirk Hünniger (talkcontribs)

uups command is

 xelatex main.tex
Brentl999 (talkcontribs)

Okay thanks. I'm on my way again. It appears that there are few LaTex styles that are not packaged with CentOS 7. I'm installing them one at a time as I rerun xelatex. Brent

Brentl999 (talkcontribs)

After several iterations of dealing with xelatex errors, I removed the CentOS 7 texlive package and ran the install via http://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz.


The result is that I was able to generate the "River Martin" test case of 83 pages. However, I did have to bypass a number of errors which I do not believe relate to my install. So I'm not sure what the optimal way is to deal with them (e.g. including a documentclass enabledeprecatedfontcommands in main.tex)? Is there a version of this book published that I can validate my output against?


Overall, thank you. I will create a CentOS install when I feel like I have mediawiki2latex working optimally. In general, I could have saved a lot time vy ignoring CentOS repositories for dependancies and installing from the appropriate components from source (e.g. ghc, cabal-install, texlive).


Brent

Dirk Hünniger (talkcontribs)

Hi, you can try with the web interface and compare with that, you can also get a tex source zip file from these. Otherwise you could try with current ubuntu. Some errors will be in the latex document I did not fix all, but it should still work with --interaction=nonstopmode

Yours Dirk

Brentl999 (talkcontribs)

Thank you Dirk. My table of contents was not displaying because I was missing GNU FreeFont. With the addition of that font, I have generated a matching River martin PDF.


If it is okay with you, I'll add a CentOS 7 install section to: https://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf


It won't be "complete" because I would have to start with a fresh CentOS 7 install and do a reinstall to formally prove the steps, but it will be better than having no reference steps at all. I might also suggest a section for diagnostic tips and a validation section using the River martin collection.


Brent

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

Hello Dirk. I'm testing one of my books with mediawiki2latex. Although I can get the template to load, mediawiki2latex is making some assumption about the URL prefix to the page that isn't working in my environment. I think I can work around this, but it would helpful if you could explain how mediawiki2latex builds the URL of each page request from the template. Thank you, Brent.

Dirk Hünniger (talkcontribs)

Hi,

If you are converting the page

wiki.org/foo/bar/MyPage

the MyPage is called a lemma. If MyPage includes a subpage or image called MySubPage. Then MySubPage is a lemma to. It will be looked at the following urls

wiki.org/foo/bar/MySubPage
wiki.org/foo/MySubPage
wiki.org/MySubPage

For images wikimedia commons will also be considered.

Yours Dirk

Brentl999 (talkcontribs)

My issue is with the URL prefix "http://myabc123.com/wikilocation. mediawiki2latex gets my template, but is incorrectly generating URL requests for the pages because it is prepending the wrong http prefix. If you can direct me to the source for this part of mediawiki2latex I can probably figure it out.


To be specific. My template is: http://127.0.0.1/index.php/MyBookTemplate

The tempate lists pages names in the "old" book format:

   [[PageRef1]]
   [[PageRef2]]
   [[PageRef2]]

For each page in my template, mediawiki2latex seems to be make two attempts to get the page.

  1. /wiki/http://127.0.0.1/[page-ref1-from-template]
  2. /index.phpwiki/http://127.0.0.1/[page-ref1-from-template]

In order for the requests to succeed they need to be:

/index.php/[page-ref-from-tempate]

I have tried the "--bookmode" command line option and this does not change the page requests.

Dirk Hünniger (talkcontribs)
This post was hidden by 105.66.129.53 (history)
Brentl999 (talkcontribs)

Hello Dirk. I did get a test to run. I do have a couple of issue.s.

- It takes quite some time to run. When I tried to run it in the background I got an error. Here is the tail of log:

mediawiki2latex (1547934849.177406998s):(547,"File:Publication_tab.png|90px")

mediawiki2latex (1547934871.066911492s):(548,"File:ServiceReport_05.png|617px")

mediawiki2latex (1547934893.076331817s):(549,"File:Change_Agency.png|293px")

mediawiki2latex (1547934914.940887587s): precompiling table columns

mediawiki2latex (1547934914.940955355s): number of columns to be compiled: 25

mediawiki2latex (1547934914.941277373s): precompiling column number 1

mediawiki2latex: <stdin>: hWaitForInput: invalid argument (Bad file descriptor)

- When I run it in the foreground output ends with:

mediawiki2latex (1547987410.636314299s): generating PDF file. LaTeX run 1 of 4

Since I am running in test mode, I went into the testdir/document/main and generated a pdf via xelatex.

Thoughts? I'm not seeing any resource issues on the server that would otherwise explain the process ending.

Can you tell me how to disable lists of figures?

Thank you. Brent

Dirk Hünniger (talkcontribs)

Hi Brent,

I just tested in background with

 mediawiki2latex -u https://de.wikibooks.org/wiki/Handbuch_Open_Science/_Rechtswissenschaft -o dirk.pdf &

so the & character did make the "background run". And everything went fine here.

For the abort in the foreground run I got no explanation. LaTeX should be running and you should see it with ps -xaww. It is strange to me that you could get a pdf with xelatex in testdir/document/main. Essentially mediawiki2latex does exactly just run xelatex in this particular step. I had some wired issues that caused similar problems with pipes on windows, but that is really far fetched. So for now I suggest that you run mediawiki2latex with the -c option and the run xelatex in that directory to get the pdf, and possibly write a script to automate that process.

It is normal that is takes quite some time to run. I disabled multithreading since I found that mediawiki does not like too many requests too fast. Also the xelatex compile step account for a considerable amount of runtime, with is irreducible. So there is not much I can to about the runtime. Also determining the list of contributors and the list of figures for proper attribution takes quite some runtime. This is necessary for open source licensed content because of the terms of the licenses. You could try to remove that, it will be a solvable but not be an easy task and it will require skill in functional programming in Haskell.

For the list of figures I suggest to edit a function in All.hs

 jjoin :: String -> String -> String
 jjoin theBody listOfFiguers
   = ((toString (latexHeader)) ++
        theBody ++ listOfFiguers ++ (toString latexFooter))

Yours Dirk

Dirk Hünniger (talkcontribs)

Hi Brent,

if you need more complex changes made to mediawiki2latex, I suggest to contact Henning Thielemann (PhD), he already worked on mediawiki2latex and has got more time available than me. As I said I got a full time job, and can only support you in the way I just do.

Yours Dirk

Brentl999 (talkcontribs)

Okay I understand Dirk. I can run mediawiki2latex in the background for the River_martin case that you gave me. But for my real book, I get that error as I mentioned. I understand that you have limited time and I do appreciate your help. Thank you again. Brent

Dirk Hünniger (talkcontribs)

Hi Brent,

you could try something in All.hs

replace

mysystem :: String -> IO ()
mysystem x
 = if os == "linux" then
     do (_, o, e, h) <- runInteractiveCommand x
        ex h o e
        return ()
     else
     do _ <- system x
        return ()

with

mysystem :: String -> IO ()
mysystem x = system x

Yours Dirk

Brentl999 (talkcontribs)

Thank you. I am doing further testing, the background issue seems to be tied some combination of command line options. When I have narrowed the issue down I can pass that on. And, can given feedback on your suggest code change.

With respect to the list of figures, I didn't explain this very well. I'm looking to elliminate footnotes/references that are automatically populated from the bottom up on each page. The feature of automated footnoting is using a quarter to half of each page.

Again, I understand your limited availability. I am certainly willing to dig into code, I appreciate being pointed in the right direction.

Brent

Dirk Hünniger (talkcontribs)

Hi Brent,

for the list of figures. You can look at makeImgList in All.hs

Here look at the part with the string "\\href{". That way you should be able to remove the links.

Yours Dirk

Brentl999 (talkcontribs)

Dirk, Is there a means to specify chapters? That is to say, a way to:

- group Wiki page references in a way that provides sequential chapters in the table of contents

- provides for a chapter heading prior to pages beginning with each chapter

- outline numbered Wiki sections with the chapter number as a prefix.

For example, the book generation specificiation that I have been using in MediaWiki (prior to it being removed from MediaWiki) provided for a syntax:

;Chapter Heading A

:[[Page link 1]]

:[[Page link 2]]

;Chapter Heading B

:[[Page link 3]]

This would result in two chapter headings with section references 1.1, 1.2 and then 2.1.

Brent

Dirk Hünniger (talkcontribs)

Hi Brent,

this feature is not been implemented yet. A starting point is the function runBookActions in the modules Load.hs

Currently I don't have a timeslot available to do this task, but you are welcome to do that yourself or let it be done by Henning.

Yours Dirk

Dirk Hünniger (talkcontribs)

Furthermore the paths to the fonts are hardcoded in the module BaseFont.hs They might be different on Centos, so be sure to adjust that.

M2k~dewiki (talkcontribs)

Hello Dirk, thanks for the information. I tried to use the above command (current Ubuntu-Release, mediawiki2latex version 7.30), but it seemed to use up all CPU/memory, so I had to reset my computer in order to be able to work again. I also tried "mediawiki2latex -u https://de.wikipedia.org/wiki/Landesregierung_Mikl-Leitner_I -o wiki.pdf" to create only a PDF for that single article, but the generated PDF did not include the table and included only one single image. With the additional option "-k" (which is not documented in the man page installed on my computer, but only on the command-line help / usage message) the program seems also to follow all categories and articles in that categories and links included in the initial document/URL, so it seems it does not terminate. The online version http://mediawiki2latex.wmflabs.org/ includes the table and the images for de:Landesregierung Mikl-Leitner I, but it is quite small (the hole table is scaled down to less than half of a page), so it is hard to read the text or to recognize something on the images.

Dirk Hünniger (talkcontribs)
Dirk Hünniger (talkcontribs)

Hi M2k, in deed the tables don't work properly in 7.30 any more. This is because in default mode it processes the html generated by mediawiki, which is changing frequently, so I have to fix it in a new version of mediawiki2latex which does only get in the next release of the operating system. So the only thing you can do about is to update mediawikilatex locally by following the instruction at https://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf for Debian. The -k command line option in deed follows all links, but not recursively. It is only useful with Collections like the one found in the book namespace. The table created by the online version when processing de:Landesregierung Mikl-Leitner is in deed too small. This happens when trying to fit the table onto the page. You can still use mediawiki2latex to get a zip archive of the latex code and change that manually. For de:Benutzer:M2k~dewiki/Bücher/Ausgewählte_Beiträge_und_Bearbeitungen I had to stop the processing since I sleep in the same room where my computer is located and I can not sleep while it is running because of the noise. So I restared today and currently its using 23 GByte of RAM. We will see. It already told me that it is going to download 1380 images.

Steelpillow (talkcontribs)

Thank you!

I created Collection ID: 6997073ff4f8d484 containing two articles; Wing configuration, and Topology.

The big, BIG problem is that I cannot add Chapter headings. This means that almost all books already created on Wikipedia will not render in usable form. This absolutely MUST be put right before the new builder can go live.

Other significant issues include:

  • Image sizes vary greatly, for example the lead image for Wing configuration is rendered huge but the one in Topology is much smaller and more sensibly sized. If you want to keep all the text away from the right hand side to make room for images and boxouts, then you should also narrow all the right-aligned images and boxouts to keep them out of the text column too.
  • Font size for the body text is still too small for a page which does not use columns, while heading font sizes are too large, creating much too wide a range of sizes for easy readability. It looks more like a cheap newspaper than an encyclopedia.
  • Tables which are centered in the Wiki page should also be centered in the rendered text column. A lot of them, though not all, appear to have been left-aligned.
  • Template:Commons is still being included. It should be left out.

Once the Chapter headings problem is fixed and the code is stable enough not to crash too often, then I think it should be made live as soon as possible, we are just so desperate to get our old books updated and available again. The other points can come later.

Your test service user interface seems to have changed and no longer allows me to build a fully custom collection, I trust that is deliberate.

One other thing. If we editors add bad markup code to our pages, your only problem should be to keep the software running on. Fixing unprintable pages is our problem not yours! Do not waste your valuable time working around our carelessness. We have help forum pages and stuff here to help us do that.

Ckepper (talkcontribs)

Thanks for your feedback, @Steelpillow!

  • As I noted earlier, chapter headings are still supported, they are just missing from the test frontend and not yet rendered in the article output.
  • Image sizes are approximated based on the size of images in the article and resolution. The goal is to show images at roughly the same size they appear in the article. But the algorithm obviously needs more tweaking. Images should not be confined to just the right column.
  • Body font size is currently at 9pt with 16pt line-spacing which is a pretty common size for textbooks. It's easy to adjust but so far you are the only person who complained about this. Maybe other people can chime in on this issue.
  • I can check centered tables again. It might also depend on the markup being used. Do you have a particular example in mind?
  • It shouldn't be too hard to remove Template:Commons. I will look into that.
  • The test UI has NOT changed and certainly not been deliberately limited. If this occurs again, please try to open a fresh collection by opening https://pediapress.com/collector again.
  • Bad markup is indeed a problem. If something looks wrong or broken on a particular page, it's not trivial for me to distinguish between broken markup or broken renderer. It helps to get reports of good markup that is not rendered correctly. If I could wish for one thing it would be more semantic markup...
Steelpillow (talkcontribs)

Thank you too for the quick response.

  • I missed your earlier post about chapter headings, I must pay more attention.
  • But also, I think you may have missed several posts on this page about font size, I am not the only one who has pointed it out. I do not know how to link to their posts, but at least two others - Szvacek1 and Salino01 - have made the same comment. Yes, 9pt is often used but only in smaller pages sizes than A4 or in a narrower multi-column format. I have never seen it in wide columns like these ones here. Perhaps I do not read the right textbooks, but I read plenty of others.
  • In the article on Wing configuration, many of the images are grouped in tables which are centre-aligned. However in the pdf they are all right-aligned.
  • Ah, my problem with the test UI was that I had selected the Simple English Wikipedia by mistake and it could not find my articles! But now I have another problem. I did not save a copy of Collection ID: 6997073ff4f8d484 which comprises Wing Configuration and Topology. When I try to recreate it, your server gives me the same collection ID as before but then gives an Error 500 Internal Server Error, "Sorry, the requested URL 'http://tools.gke.pediapress.com/?command=download&writer=html&collection_id=6997073ff4f8d484' caused an error:"
  • Do you take the article as raw Wikitext or as HTML output? If it is HTML, could you pass it through HTML Tidy or something before rendering? ISTR that used to be a config option for the mediawiki server.
Steelpillow (talkcontribs)

To add to my previous comment on concerns about font size, Gpc62 also made this point in a short conversation with Ckepper on this talk page about 4 months ago. Given the limited overall number of comments made on these drafts, the four of us who have complained about the small font size do represent a significant level of unhappiness. In many areas of life, 10pt is regarded as the minimum readable for full-width text of A4 or US letter pages: there is a reason for that view, there really is.

Dirk Hünniger (talkcontribs)

The calculation of mediawiki2latex to generate a pdf of de:Benutzer:M2k~dewiki/Bücher/Ausgewählte_Beiträge_und_Bearbeitungen has terminated sucessfully. I used the command line give above. I put the pdf on google drive. Its here

https://drive.google.com/file/d/1SA6TEKWrdpXAxDyHZe-umBa2cJ5Ya77X/view?usp=sharing

It took a bit less then 9 hours and about 24 Gbyte of RAM on a i3-4330. Its a bit more than 5000 pages long. And about 600 MByte in size. To me it is an interesting test case to see that mediawiki2latex can handle books of very large size. I will soon upload the LaTeX source so you can split it into volumes in case you want to print and bind it. Yours Dirk

M2k~dewiki (talkcontribs)
Dirk Hünniger (talkcontribs)
Straw17 (talkcontribs)

One minor problem I've noticed with the PDF generator is that some text tends to go over the edge of the page.

For example: https://imgur.com/gRzj891 (the right side of the image is the right side of the page).

Ckepper (talkcontribs)

Yes, it seems like the problems stems from the infobox width given in `em` units. I will investigate further and try to fix it.

Theklan (talkcontribs)

Thanks fpr the update. This service should be available for ALL the languages, automatically. That's the purpose of WMF.

Reply to "New Render Server for PDF generation"
Steelpillow (talkcontribs)

The headline task was closed last summer. Is this issue fully resolved, or is some other task still open for it?

Reply to "Kerning and spacing update"
Bestoernesto (talkcontribs)

Now for the 3rd time I have downloaded and printed a DeWP article as PDF without any problems. The download felt a bit faster than the average of other PDF files of the same size. The page formatting leaves nothing to be desired. (However, it would be an advantage to specify a page number on each page.) The marked links in the text also work. When printing, I get exactly what I see on the screen. (Win7 prof.64-bit, Firefox 56.0.2 64-bit, Adobe Acrobat Reader DC

--~~~~

Reply to "What problems??"
Aliasabuhanifah (talkcontribs)

These 'developments' are scams. The head developer is affiliated with pediapress.com (a for profit company).

The longer these "developments" take place, the more people will use their paid services!

Conflict of interest. No free stuff, people will pay!

Dear pediapress.com, tell me that your sales does not quadruple since the pdf service has shut down?

Corrupt.

Steelpillow (talkcontribs)

That is childishly silly. The project spent several years failing badly, before PediaPress offered to help us out. The new code is open licensed. By doing this they are actually reducing their potential to make money off their PoD service. They are good people. To be perfectly clear to you, I have no business relationship with PediaPress, other than buying a few printed books off them. I hope they did make a little beer money!

Dirk Hünniger (talkcontribs)

Hi,

I kind of see that there is conflict of interest problem with that too. And mixing computer science with w:Token economy seems generally a bad choice to me. So I keep http://mediawiki2latex.wmflabs.org/ updated to provide an alternative export to pdf epub and odt fully open source and available on Debain. It will be interesting to see how both projects will evolve in the future.

Yours Dirk

Johan (WMF) (talkcontribs)

a. Concepts like "assume good faith" and "civility" are at the core of what we do for a reason. Please don't poison the discussion climate. It is perfectly possible to discuss potential conflict of interest in a way that is in line with Wikimedia norms and expected behavior.

b. A fair chunk of the development time lies at the feet of the WMF, when the old solution was breaking down, the new renderer couldn't effectively handle collections and the Foundation, looking at the number of people who were using the books-to-PDF solution, couldn't justify taking people away from other projects to work on it.

c. I suspect you vastly overestimate the long-term financial viability of discouraging the use of collections if one's business model is printing collections of articles. The typical reaction to not being able to generate a PDF in the way one had hoped is to not generate the PDF. Printing a book is rarely a reasonable alternative to downloading a PDF. PediaPress stepped in because they want this to work.

d. The developers have to do other work that's actually putting food on the table.

Johan (WMF) (talkcontribs)

TL;DR: This isn't a scam, it's the result of WMF prioritising working on more widely used functions leaving this to volunteer developers related to PediaPress.

Reply to "Scam."
85.194.79.90 (talkcontribs)

Hi i got an idea to make pdf easily use javascript rather then php server side

using js to make pdf is great idea because ther isnt load in server (i am programmer i know. to make file in server side you have to save it then dump it)

but in js you can make in object then runder the file

so take a look to the URL[1]

goodbay.

Steelpillow (talkcontribs)

The codebase is too big to download and run in a web browser. It is probably bigger than most books. Creating a whole book could cripple the user device for an hour or more.

This post was hidden by Steelpillow (history)
Reply to "some good to make pdf"

MediaWiki2LaTeX rendering Server for large documents

1
Dirk Hünniger (talkcontribs)

Hi,

during the weekend I was able to reduce the memory consumption of mediawiki2latex significantly. I set up a new server with 4 hours max runtime per request. Now nearly every community maintained book on the english wikipedia should compile.

http://mediawiki2latex-large.wmflabs.org

Yours Dirk

Reply to "MediaWiki2LaTeX rendering Server for large documents"

My book has disappeared. I now have zero pages. anybody know what happened? Can I retrieve my pages?

9
Ascjames (talkcontribs)

My book has disappeared. I now have zero pages. anybody know what happened? Can I retrieve my pages?

Johan (WMF) (talkcontribs)

Hi @Ascjames, where did you previously save the book?

Ascjames (talkcontribs)

Thanks Johan -

I saved the book in Wikipedia. I also, fortunately, downloaded most of the pages to this computer.

Sometimes, but not always, there is a banner near the top of the page that asks "add this page to your book?" I add the page and the counter rises by one to show how many pages are in my book, In this case, the book, titled something like "FRENETTER - (a tale of Mississippi), disappeared after I added 10 or more pages.. I saved it here in Wikipedia. I saved and had a similar titled book a few years ago. This one is new..

Steelpillow (talkcontribs)

do you have a link or the page name you can give? We need to know which book we are looking for. Steelpillow (talk) 11:12, 17 October 2018 (UTC)

Ascjames (talkcontribs)

In this case, the book, titled something like "FRENETTER - (a tale of Mississippi), disappeared after I added 10 or more pages.

I saved the book in Wikipedia. I also, fortunately, downloaded most of the pages to this computer.

Sometimes, but not always, there is a banner near the top of the page that asks "add this page to your book?" I add the page and the counter rises by one to show how many pages are in my book, In this case, the book, titled something like "FRENETTER - (a tale of Mississippi), disappeared after I added 10 or more pages.. I saved it here in Wikipedia. I saved and had a similar titled book a few years ago. This one is new..

Steelpillow (talkcontribs)

I am finding it hard to find any record. Without the exact page name I cannot check if it has been deleted. Also, there has never been a user called Ascjames on the English Wikipedia. Do you have another user name there, or are you editing a different language Wikipedia? Steelpillow (talk) 19:40, 18 October 2018 (UTC)

Ascjames (talkcontribs)

Hello Steelpillow.

I'm going to try harder to find the name I used before.

Thanks for your care.

James

Steelpillow (talkcontribs)

If you can tell us which language Wikipedia it was and what your user name there is, we can search for it and someone may be able to retrieve it. Steelpillow (talk) 08:48, 26 October 2018 (UTC)

Ascjames (talkcontribs)

Thank you. I think I have saved all the articles elsewhere.

Let's stop searching, for now. I don't remember which name I signed in under.

I've started a new book under Ascjames.

Your work is greatly app-reciated by me.

James

Reply to "My book has disappeared. I now have zero pages. anybody know what happened? Can I retrieve my pages?"

Upgrade 1.20->1.31.1 Create a Book says "Book Creator is undergoing changes" - Confused

9
96.3.195.68 (talkcontribs)

I posted this to the project support desk page and did not get a reply so I'm trying here.

I am upgrading my MediaWiki to

MediaWiki  1.31.1

PHP        7.2.10 (apache2handler)

MariaDB    10.3.11-MariaDB

When I try and start the book creator I get a page that says:

"Book Creator is undergoing changes"

However, that page links to (here):

https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality

for details.  This page seems to indicate that the book generator is supposed to be operational, but I cannot tell so I don't know what should work and what does not.

What is the current status?  Is this issue that the "Download as PDF" is not working?  Rather creating a PDF via PediaPress should work?   Neither is working for me, but I am familiar with the process as I have extensively used book creation via "Download as PDF" in the past.

Thank you.  Brent

Johan (WMF) (talkcontribs)

Uh, good question. I don't think the book-to-PDF creation should have been included in 1.31.1 but normal PDF creation should, but I'm guessing here. Do you know, @OVasileva (WMF)?

Steelpillow (talkcontribs)

The information in the article does seem to be unclear. PediaPress are currently involved in two different ways:

1. You can create a book, upload it to the PediaPress web site and order print-on-demand physical copies.

2. PediaPress are also rewriting Wikipedia's own PDF book renderer, and while they are doing this it is not possible to create or print a PDF softcopy wikibook. This is the main subject of the recent update posts.

But I do not know what functionality is included in the 1.31.1 build.

Hope this helps a little.

Brentl999 (talkcontribs)

The "Download as PDF" is greyed out (it shows on the page but unavailable for use).

The "Preview with PediaPress" is "available". So I gather it is supposed to work?

If there is a way for me to get on the inside track for enabling/testing an alpha or beta "Download as PDF", please let me know.

Thank you for your replies. Brent


p.s. I might suggest that the www.mediawiki.org/wiki/Reading/Web/PDF_Functionality be more clear about what users can expect to work/not-work in the application as-of a specific MediaWiki release.

Steelpillow (talkcontribs)

The "Preview with PediaPress" is the proprietary print-on-demand option, which is fully working and so is not greyed out.

There is an "alpha"-ish test build of the open-source "Download as PDF" book creator, though it does not yet have the Chapter headings wrapper or anything, at https://pediapress.com/collector

Johan (WMF) (talkcontribs)

That's very much aimed at users of the Wikimedia wikis, yes. I'll see what we can figure out.

Brentl999 (talkcontribs)

I see there is more discussion about this at Topic:Uqod5bg3xaswxjn9. (www.mediawiki.org/wiki/Topic:Uqod5bg3xaswxjn9)

I'm just pasting this here so if someone is reading this thread in the future, hopefully, it will save them time.

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

Yes I found it! Thank you Dirk :)

Reply to "Upgrade 1.20->1.31.1 Create a Book says "Book Creator is undergoing changes" - Confused"
Dirk Hünniger (talkcontribs)

Hi,

I could generate PDF versions of all community maintained books in the English Wikipedia and store them in a cloud accessible with sftp. I could update each PDF once a year. We could link form the book template to the cloud with lua. Are you interested?

Yours Dirk

Reply to "Book Creation -> Cached Books"
117.204.124.110 (talkcontribs)

It'd be helpul if some topics like 'Reference' could be remved from all the pages

Steelpillow (talkcontribs)

References are necessary for the reader to check the facts for themself. It would be unacceptable to remove them.

Reply to "Exculsion of certain links"