Topic on Talk:Reading/Web/PDF Functionality/Flow

New Render Server for PDF generation

59
Ckepper (talkcontribs)

Please check out a new test server for PDF generation hosted by PediaPress. It is using a new renderer, we have been working on for quite some time and allows you to create multi-article PDFs. Currently, you create a collection from one of the following projects: Simple English Wikipedia, EN, DE, ES, FR, IT, NL, PT, and SV Wikipedia. Other projects can be added as well on a per-request basis. After selecting a Wiki you can start adding articles. Open https://pediapress.com/collector/ again to select a different wiki.

Please note that this project is still in alpha-state. This means that we haven't tested too many articles and the rendering of an article might fail for non-obvious reasons with no clear error messages.

Also, rendering is far from perfect. You will very likely find lots of things you don't like (but maybe also a few things you do). Please share your experiences and let us know about the most glaring problems you encounter.

Br shadow (talkcontribs)

Still cannot understand why would you guys spend so much time creating this functionality when I could simply print the article page as a pdf using my browser... the only thing that would be useful (and actually VERY useful but you chose to remove it) is a double column pdf rendering service...

Brentl999 (talkcontribs)

To your question about why time is being spent on rendering, versus printing from a browser tab. Printing from a browser to PDF does not generate a "book" (cover, table of contents, paging, etc). I depend on book generation to produce print ready documents/manuals from my Wiki. Therefore, I sit on the opposite end of the question, I can't figure out why such a critical component has not been provided for 2 years.

M2k~dewiki (talkcontribs)

Hello Ckepper, great job! Thanks for all the effort and hard work you put in this, I really appreciate it. Especially since I understand, how difficult it must be, to find a general approach to render and display all possible kind of layouts. One problem I find in this alpha-version seems to be, that some images are not displayed, for example when trying to create a PDF for de:Landesregierung Mikl-Leitner I or de:Verena Altenberger. Also some rows of the table in the first example are omitted / missing in the PDF. In the final version, I would like to be able to create a PDF for a predefined book, like it was possible in the version before, for example for de:Benutzer:M2k~dewiki/Bücher/Ausgewählte_Beiträge_und_Bearbeitungen respectivley de:Kategorie:Wikipedia:Bücher.

Hello Br shadow, in previous versions, it was possible to generate and download a hole book consisting of several articles grouped into chapters, not only one single article. For examples see de:Wikipedia:Bücher / de:Kategorie:Wikipedia:Bücher.

Ckepper (talkcontribs)

Thank you for you feedback M2k-dewiki, just a few notes:

  • The renderer respects the license information in the images as accurately as possible. Version 4.0 of the CC licenses is not yet listed in mwlib and therefor those images are rejected. It should be fairly easy to add those licenses.
  • Chapters will be fully supported. It's currently just not possible to add them in the prototype frontend. Here is an example of chapters. Chapters are currently shown only in the TOC and not in the rendered article pages. They will be presented there as well, once the design for chapters has been finalized.
  • The renderer is built on the existing format of Wikipedia Books (Collection Extension). All existing collections (books) should work without modification in the new renderer. If you are interested in a particular sample, I can render it offline and share a link.
  • Our goal is a direct integration of the renderer in the collection extension so that you can choose "Download PDF" from the "Manage your book" page again and don't have to go through any PediaPress pages.
Dirk Hünniger (talkcontribs)

Standard announcement: All rows and all images in both articles work with http://mediawiki2latex.wmflabs.org/ . For Ausgewählte_Beiträge_und_Bearbeitungen you won't to able to get any result with the mediawiki2latex webinterface because of the time limit of one hour. But I started the command line version like this.

mediawiki2latex -k -u https://de.wikipedia.org/wiki/Benutzer:M2k~dewiki/B%C3%BCcher/Ausgew%C3%A4hlte_Beitr%C3%A4ge_und_Bearbeitungen -o dirk.pdf

We will see what comes out. Yours Dirk

Brentl999 (talkcontribs)

Hello Dirk and Thank you. I'm getting ready to install your project and test it.

I have a couple of questions:

- Is this project the to be the official backend for "Download to PDF" book generation that ships with MediaWiki at some point in the future? Or, is this your effort to fill the current gap in MediaWiki's book generation system?

- Does the project support the historical book, chapter, etc collection format saved by previously working versions of MediaWiki? Or, do I need to recreate my book structures?

Thank you. Brent

Dirk Hünniger (talkcontribs)

Hi,

I think it will never be the official backend and I was hoping that an official backend might be developed. But for many years that hope has not turned into reality, although there were two or three official attempts to redevelop an official backend with different technologies each time. I don't know the future, but it very well possible that mediawiki2latex will fill the gap for many years to come. Nevertheless mediawiki2latex is fully open source and it could in the end turn into the official backend.

You can run mediawiki2latex on a collection like this:

mediawiki2latex -u https://en.wikipedia.org/wiki/Book:River_martin -o output.pdf -k

the -k switch is for collections. It essentially follows all links in a non recursive manner. I hope this is enough for your case, but I could still extend that if you need more. Still I got a full time job and won't be able to react quickly.

Yours Dirk

Brentl999 (talkcontribs)

Thank you. I am learning more as I dig into the install. If I can contribute to the effort, I will (it has only been about 30 years since I have seriously used LaTex). Is this a good forum for asking questions? Currently I'm working through latex dependencies for CentOS 7 which I hope to post for anyone else doing a CentOS 7 compile. I just want to be sure I'm placing my comments, feedback, etc in an appropriate place. Brent

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

Thank you. I'm working through it. The ghc and cabal dependancies are taking me some time. Most components are available in the epel repository, but some I'm having to download and build.

Dirk Hünniger (talkcontribs)

Hi,

you can also build with "cabal install". This should install all build time dependencies automatically.

Yours Dirk

Brentl999 (talkcontribs)

Yes, I figured that out. The version of the cabal that is in the CentOS 7 epel respository preceeds 1.18.2 and that side tracked me for a while. Currently I'm stuck with http-client-0.5.12 not installing:

[ 7 of 19] Compiling Network.HTTP.Client.Types ( Network/HTTP/Client/Types.hs, dist/build/Network/HTTP/Client/Types.o )

Network/HTTP/Client/Types.hs:339:73:

    No instance for (Semigroup Builder) arising from a use of `<>'

    Possible fix: add an instance declaration for (Semigroup Builder)

    In the second argument of `RequestBodyBuilder', namely `(x <> y)'

    In the expression: RequestBodyBuilder (i + j) (x <> y)

    In a case alternative:

        (Left (i, x), Left (j, y)) -> RequestBodyBuilder (i + j) (x <> y)

Failed to install http-client-0.5.12

Brentl999 (talkcontribs)

ghc --version :

The Glorious Glasgow Haskell Compilation System, version 7.6.3


cabal --version:

cabal-install version 1.16.1.0

using version 1.16.0 of the Cabal library

Dirk Hünniger (talkcontribs)

Hi,

this is a problem in a library that I have as a dependency. You could ask its maintainer to resolve the issue. An other thing you can try is to modify the mediawiki2latex.cabal file and try to force it to use an other version of http-client. The dependency might also be transitive. Thats is mediawiki2latex does depend on something that depends on httpclient. Here on Debain everything worked fine with cabal.

Yours Dirk

Brentl999 (talkcontribs)

What versions of ghc and cabal-install are provided on your version of Debain? Thank you. Brent

Brentl999 (talkcontribs)

I ended up pulling ghc 8.0.2 and cabal-install 1.24.2.0 from this repository: petersen/ghc-8.0.2 Copr repo (EL7)

And I did get mediawiki2latex to compile.

There is some incompatibility betweem several mediawiki2latex dependent components and the ghc available via the CentOS 7 repository.

Dirk Hünniger (talkcontribs)

Hi,

Congratulations. Now you just need to install the runtime dependencies. I got cabal --version => cabal-install version 1.24.0.2 and ghc --version => The Glorious Glasgow Haskell Compilation System, version 8.0.2 So that seems to be just the same that you used now.

Yours Dirk

Dirk Hünniger (talkcontribs)

Hi,

runtime dependencies are: librsvg2-bin, imagemagick,

fonts-freefont-ttf, texlive-xetex, texlive-latex-recommended,
texlive-latex-extra, texlive-fonts-recommended, texlive-fonts-extra,
cm-super-minimal, texlive-lang-all, poppler-utils,
lmodern, texlive-generic-recommended, latex-cjk-common,
fonts-cmu, ttf-unifont, fonts-wqy-zenhei, calibre, latex2rtf, libreoffice

Yours Dirk

Dirk Hünniger (talkcontribs)

Hi,

If mediawiki2latex already compiles. Then you could be able to get a latex document using the -c command line option, even if the runtime dependencies are not yet fulfilled. You can then try to make that latex document compile. In this process you will install most runtime dependencies.

Yours Dirk

Brentl999 (talkcontribs)

Hello Dirk,

I am running the test case:

mediawiki2latex -u https://en.wikipedia.org/wiki/Book:River_martin -o output.pdf -k

I am getting an error, here is the last few lines of the messaging provided by mediawiki2latex:

mediawiki2latex (1547481318.678343827s): generating PDF file. LaTeX run 1 of 4

mediawiki2latex (1547481319.864715348s): generating PDF file. LaTeX run 2 of 4

mediawiki2latex (1547481321.058693473s): generating PDF file. LaTeX run 3 of 4

mediawiki2latex (1547481322.189036007s): generating PDF file. LaTeX run 4 of 4

mediawiki2latex: main.pdf: openBinaryFile: does not exist (No such file or directory)

Any suggestion on how to fix this?  The process runs for some time before getting this error.

Thank you.  Brent

Dirk Hünniger (talkcontribs)

Hi,

apperently no pdf file was produced. Possibly no xelatex is not installed. Or there is something wrong with the fonts (not installed, or not in the same paths as on Ubuntu)

You could try:

 mkdir test
 mediawiki2latex -u https://en.wikipedia.org/wiki/Book:River_martin -o output.pdf -k -c test

and then have a look if any files were created in the test directory.

During the weekend I was able to reduce memory consumption. So possibly I can set up a test server for up to 800 pages later this week.

Yours Dirk

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

xelatex is installed. When I run this in test mode, I get a directory structure within "test" that includes some 41 files. Is there anything I should specifically be looking for?


CentOS fonts go into /usr/share/fonts. Specifically unifont /usr/share/fonts/truetype which appears to be the same as BaseFont.hs requires. Clearly I could have missed something here, but I did review/follow the font install procedure.


Ultimately I need to have my own instance of this working.  The pdfs I generate contain intellectual property and therfore cannot be rendered by a 3rd party.  I have had the process working on MediaWiki 1.15.1 for many years, but an upgrade to that environment is long overdue. So that is the background on why I'm pursuing this.


Thank you

Dirk Hünniger (talkcontribs)

Hi,

the in the created directory there should be a file called main.tex go to the directory where it resides and run

 xelatex main.tex. 

I suspect that xelatex will show some error messages.

Yours Dirk

Dirk Hünniger (talkcontribs)

uups command is

 xelatex main.tex
Brentl999 (talkcontribs)

Okay thanks. I'm on my way again. It appears that there are few LaTex styles that are not packaged with CentOS 7. I'm installing them one at a time as I rerun xelatex. Brent

Brentl999 (talkcontribs)

After several iterations of dealing with xelatex errors, I removed the CentOS 7 texlive package and ran the install via http://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz.


The result is that I was able to generate the "River Martin" test case of 83 pages. However, I did have to bypass a number of errors which I do not believe relate to my install. So I'm not sure what the optimal way is to deal with them (e.g. including a documentclass enabledeprecatedfontcommands in main.tex)? Is there a version of this book published that I can validate my output against?


Overall, thank you. I will create a CentOS install when I feel like I have mediawiki2latex working optimally. In general, I could have saved a lot time vy ignoring CentOS repositories for dependancies and installing from the appropriate components from source (e.g. ghc, cabal-install, texlive).


Brent

Dirk Hünniger (talkcontribs)

Hi, you can try with the web interface and compare with that, you can also get a tex source zip file from these. Otherwise you could try with current ubuntu. Some errors will be in the latex document I did not fix all, but it should still work with --interaction=nonstopmode

Yours Dirk

Brentl999 (talkcontribs)

Thank you Dirk. My table of contents was not displaying because I was missing GNU FreeFont. With the addition of that font, I have generated a matching River martin PDF.


If it is okay with you, I'll add a CentOS 7 install section to: https://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf


It won't be "complete" because I would have to start with a fresh CentOS 7 install and do a reinstall to formally prove the steps, but it will be better than having no reference steps at all. I might also suggest a section for diagnostic tips and a validation section using the River martin collection.


Brent

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

Hello Dirk. I'm testing one of my books with mediawiki2latex. Although I can get the template to load, mediawiki2latex is making some assumption about the URL prefix to the page that isn't working in my environment. I think I can work around this, but it would helpful if you could explain how mediawiki2latex builds the URL of each page request from the template. Thank you, Brent.

Dirk Hünniger (talkcontribs)

Hi,

If you are converting the page

wiki.org/foo/bar/MyPage

the MyPage is called a lemma. If MyPage includes a subpage or image called MySubPage. Then MySubPage is a lemma to. It will be looked at the following urls

wiki.org/foo/bar/MySubPage
wiki.org/foo/MySubPage
wiki.org/MySubPage

For images wikimedia commons will also be considered.

Yours Dirk

Brentl999 (talkcontribs)

My issue is with the URL prefix "http://myabc123.com/wikilocation. mediawiki2latex gets my template, but is incorrectly generating URL requests for the pages because it is prepending the wrong http prefix. If you can direct me to the source for this part of mediawiki2latex I can probably figure it out.


To be specific. My template is: http://127.0.0.1/index.php/MyBookTemplate

The tempate lists pages names in the "old" book format:

   [[PageRef1]]
   [[PageRef2]]
   [[PageRef2]]

For each page in my template, mediawiki2latex seems to be make two attempts to get the page.

  1. /wiki/http://127.0.0.1/[page-ref1-from-template]
  2. /index.phpwiki/http://127.0.0.1/[page-ref1-from-template]

In order for the requests to succeed they need to be:

/index.php/[page-ref-from-tempate]

I have tried the "--bookmode" command line option and this does not change the page requests.

Dirk Hünniger (talkcontribs)
Brentl999 (talkcontribs)

Thank you Dirk. I am testing it. Brent

Brentl999 (talkcontribs)

Hello Dirk. I did get a test to run. I do have a couple of issue.s.

- It takes quite some time to run. When I tried to run it in the background I got an error. Here is the tail of log:

mediawiki2latex (1547934849.177406998s):(547,"File:Publication_tab.png|90px")

mediawiki2latex (1547934871.066911492s):(548,"File:ServiceReport_05.png|617px")

mediawiki2latex (1547934893.076331817s):(549,"File:Change_Agency.png|293px")

mediawiki2latex (1547934914.940887587s): precompiling table columns

mediawiki2latex (1547934914.940955355s): number of columns to be compiled: 25

mediawiki2latex (1547934914.941277373s): precompiling column number 1

mediawiki2latex: <stdin>: hWaitForInput: invalid argument (Bad file descriptor)

- When I run it in the foreground output ends with:

mediawiki2latex (1547987410.636314299s): generating PDF file. LaTeX run 1 of 4

Since I am running in test mode, I went into the testdir/document/main and generated a pdf via xelatex.

Thoughts? I'm not seeing any resource issues on the server that would otherwise explain the process ending.

Can you tell me how to disable lists of figures?

Thank you. Brent

Dirk Hünniger (talkcontribs)

Hi Brent,

I just tested in background with

 mediawiki2latex -u https://de.wikibooks.org/wiki/Handbuch_Open_Science/_Rechtswissenschaft -o dirk.pdf &

so the & character did make the "background run". And everything went fine here.

For the abort in the foreground run I got no explanation. LaTeX should be running and you should see it with ps -xaww. It is strange to me that you could get a pdf with xelatex in testdir/document/main. Essentially mediawiki2latex does exactly just run xelatex in this particular step. I had some wired issues that caused similar problems with pipes on windows, but that is really far fetched. So for now I suggest that you run mediawiki2latex with the -c option and the run xelatex in that directory to get the pdf, and possibly write a script to automate that process.

It is normal that is takes quite some time to run. I disabled multithreading since I found that mediawiki does not like too many requests too fast. Also the xelatex compile step account for a considerable amount of runtime, with is irreducible. So there is not much I can to about the runtime. Also determining the list of contributors and the list of figures for proper attribution takes quite some runtime. This is necessary for open source licensed content because of the terms of the licenses. You could try to remove that, it will be a solvable but not be an easy task and it will require skill in functional programming in Haskell.

For the list of figures I suggest to edit a function in All.hs

 jjoin :: String -> String -> String
 jjoin theBody listOfFiguers
   = ((toString (latexHeader)) ++
        theBody ++ listOfFiguers ++ (toString latexFooter))

Yours Dirk

Dirk Hünniger (talkcontribs)

Hi Brent,

if you need more complex changes made to mediawiki2latex, I suggest to contact Henning Thielemann (PhD), he already worked on mediawiki2latex and has got more time available than me. As I said I got a full time job, and can only support you in the way I just do.

Yours Dirk

Brentl999 (talkcontribs)

Okay I understand Dirk. I can run mediawiki2latex in the background for the River_martin case that you gave me. But for my real book, I get that error as I mentioned. I understand that you have limited time and I do appreciate your help. Thank you again. Brent

Dirk Hünniger (talkcontribs)

Hi Brent,

you could try something in All.hs

replace

mysystem :: String -> IO ()
mysystem x
 = if os == "linux" then
     do (_, o, e, h) <- runInteractiveCommand x
        ex h o e
        return ()
     else
     do _ <- system x
        return ()

with

mysystem :: String -> IO ()
mysystem x = system x

Yours Dirk

Brentl999 (talkcontribs)

Thank you. I am doing further testing, the background issue seems to be tied some combination of command line options. When I have narrowed the issue down I can pass that on. And, can given feedback on your suggest code change.

With respect to the list of figures, I didn't explain this very well. I'm looking to elliminate footnotes/references that are automatically populated from the bottom up on each page. The feature of automated footnoting is using a quarter to half of each page.

Again, I understand your limited availability. I am certainly willing to dig into code, I appreciate being pointed in the right direction.

Brent

Dirk Hünniger (talkcontribs)

Hi Brent,

for the list of figures. You can look at makeImgList in All.hs

Here look at the part with the string "\\href{". That way you should be able to remove the links.

Yours Dirk

Brentl999 (talkcontribs)

Dirk, Is there a means to specify chapters? That is to say, a way to:

- group Wiki page references in a way that provides sequential chapters in the table of contents

- provides for a chapter heading prior to pages beginning with each chapter

- outline numbered Wiki sections with the chapter number as a prefix.

For example, the book generation specificiation that I have been using in MediaWiki (prior to it being removed from MediaWiki) provided for a syntax:

;Chapter Heading A

:[[Page link 1]]

:[[Page link 2]]

;Chapter Heading B

:[[Page link 3]]

This would result in two chapter headings with section references 1.1, 1.2 and then 2.1.

Brent

Dirk Hünniger (talkcontribs)

Hi Brent,

this feature is not been implemented yet. A starting point is the function runBookActions in the modules Load.hs

Currently I don't have a timeslot available to do this task, but you are welcome to do that yourself or let it be done by Henning.

Yours Dirk

Dirk Hünniger (talkcontribs)

Furthermore the paths to the fonts are hardcoded in the module BaseFont.hs They might be different on Centos, so be sure to adjust that.

M2k~dewiki (talkcontribs)

Hello Dirk, thanks for the information. I tried to use the above command (current Ubuntu-Release, mediawiki2latex version 7.30), but it seemed to use up all CPU/memory, so I had to reset my computer in order to be able to work again. I also tried "mediawiki2latex -u https://de.wikipedia.org/wiki/Landesregierung_Mikl-Leitner_I -o wiki.pdf" to create only a PDF for that single article, but the generated PDF did not include the table and included only one single image. With the additional option "-k" (which is not documented in the man page installed on my computer, but only on the command-line help / usage message) the program seems also to follow all categories and articles in that categories and links included in the initial document/URL, so it seems it does not terminate. The online version http://mediawiki2latex.wmflabs.org/ includes the table and the images for de:Landesregierung Mikl-Leitner I, but it is quite small (the hole table is scaled down to less than half of a page), so it is hard to read the text or to recognize something on the images.

Dirk Hünniger (talkcontribs)
Dirk Hünniger (talkcontribs)

Hi M2k, in deed the tables don't work properly in 7.30 any more. This is because in default mode it processes the html generated by mediawiki, which is changing frequently, so I have to fix it in a new version of mediawiki2latex which does only get in the next release of the operating system. So the only thing you can do about is to update mediawikilatex locally by following the instruction at https://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf for Debian. The -k command line option in deed follows all links, but not recursively. It is only useful with Collections like the one found in the book namespace. The table created by the online version when processing de:Landesregierung Mikl-Leitner is in deed too small. This happens when trying to fit the table onto the page. You can still use mediawiki2latex to get a zip archive of the latex code and change that manually. For de:Benutzer:M2k~dewiki/Bücher/Ausgewählte_Beiträge_und_Bearbeitungen I had to stop the processing since I sleep in the same room where my computer is located and I can not sleep while it is running because of the noise. So I restared today and currently its using 23 GByte of RAM. We will see. It already told me that it is going to download 1380 images.

Steelpillow (talkcontribs)

Thank you!

I created Collection ID: 6997073ff4f8d484 containing two articles; Wing configuration, and Topology.

The big, BIG problem is that I cannot add Chapter headings. This means that almost all books already created on Wikipedia will not render in usable form. This absolutely MUST be put right before the new builder can go live.

Other significant issues include:

  • Image sizes vary greatly, for example the lead image for Wing configuration is rendered huge but the one in Topology is much smaller and more sensibly sized. If you want to keep all the text away from the right hand side to make room for images and boxouts, then you should also narrow all the right-aligned images and boxouts to keep them out of the text column too.
  • Font size for the body text is still too small for a page which does not use columns, while heading font sizes are too large, creating much too wide a range of sizes for easy readability. It looks more like a cheap newspaper than an encyclopedia.
  • Tables which are centered in the Wiki page should also be centered in the rendered text column. A lot of them, though not all, appear to have been left-aligned.
  • Template:Commons is still being included. It should be left out.

Once the Chapter headings problem is fixed and the code is stable enough not to crash too often, then I think it should be made live as soon as possible, we are just so desperate to get our old books updated and available again. The other points can come later.

Your test service user interface seems to have changed and no longer allows me to build a fully custom collection, I trust that is deliberate.

One other thing. If we editors add bad markup code to our pages, your only problem should be to keep the software running on. Fixing unprintable pages is our problem not yours! Do not waste your valuable time working around our carelessness. We have help forum pages and stuff here to help us do that.

Ckepper (talkcontribs)

Thanks for your feedback, @Steelpillow!

  • As I noted earlier, chapter headings are still supported, they are just missing from the test frontend and not yet rendered in the article output.
  • Image sizes are approximated based on the size of images in the article and resolution. The goal is to show images at roughly the same size they appear in the article. But the algorithm obviously needs more tweaking. Images should not be confined to just the right column.
  • Body font size is currently at 9pt with 16pt line-spacing which is a pretty common size for textbooks. It's easy to adjust but so far you are the only person who complained about this. Maybe other people can chime in on this issue.
  • I can check centered tables again. It might also depend on the markup being used. Do you have a particular example in mind?
  • It shouldn't be too hard to remove Template:Commons. I will look into that.
  • The test UI has NOT changed and certainly not been deliberately limited. If this occurs again, please try to open a fresh collection by opening https://pediapress.com/collector again.
  • Bad markup is indeed a problem. If something looks wrong or broken on a particular page, it's not trivial for me to distinguish between broken markup or broken renderer. It helps to get reports of good markup that is not rendered correctly. If I could wish for one thing it would be more semantic markup...
Steelpillow (talkcontribs)

Thank you too for the quick response.

  • I missed your earlier post about chapter headings, I must pay more attention.
  • But also, I think you may have missed several posts on this page about font size, I am not the only one who has pointed it out. I do not know how to link to their posts, but at least two others - Szvacek1 and Salino01 - have made the same comment. Yes, 9pt is often used but only in smaller pages sizes than A4 or in a narrower multi-column format. I have never seen it in wide columns like these ones here. Perhaps I do not read the right textbooks, but I read plenty of others.
  • In the article on Wing configuration, many of the images are grouped in tables which are centre-aligned. However in the pdf they are all right-aligned.
  • Ah, my problem with the test UI was that I had selected the Simple English Wikipedia by mistake and it could not find my articles! But now I have another problem. I did not save a copy of Collection ID: 6997073ff4f8d484 which comprises Wing Configuration and Topology. When I try to recreate it, your server gives me the same collection ID as before but then gives an Error 500 Internal Server Error, "Sorry, the requested URL 'http://tools.gke.pediapress.com/?command=download&writer=html&collection_id=6997073ff4f8d484' caused an error:"
  • Do you take the article as raw Wikitext or as HTML output? If it is HTML, could you pass it through HTML Tidy or something before rendering? ISTR that used to be a config option for the mediawiki server.
Steelpillow (talkcontribs)

To add to my previous comment on concerns about font size, Gpc62 also made this point in a short conversation with Ckepper on this talk page about 4 months ago. Given the limited overall number of comments made on these drafts, the four of us who have complained about the small font size do represent a significant level of unhappiness. In many areas of life, 10pt is regarded as the minimum readable for full-width text of A4 or US letter pages: there is a reason for that view, there really is.

Dirk Hünniger (talkcontribs)

The calculation of mediawiki2latex to generate a pdf of de:Benutzer:M2k~dewiki/Bücher/Ausgewählte_Beiträge_und_Bearbeitungen has terminated sucessfully. I used the command line give above. I put the pdf on google drive. Its here

https://drive.google.com/file/d/1SA6TEKWrdpXAxDyHZe-umBa2cJ5Ya77X/view?usp=sharing

It took a bit less then 9 hours and about 24 Gbyte of RAM on a i3-4330. Its a bit more than 5000 pages long. And about 600 MByte in size. To me it is an interesting test case to see that mediawiki2latex can handle books of very large size. I will soon upload the LaTeX source so you can split it into volumes in case you want to print and bind it. Yours Dirk

M2k~dewiki (talkcontribs)
Dirk Hünniger (talkcontribs)
Straw17 (talkcontribs)

One minor problem I've noticed with the PDF generator is that some text tends to go over the edge of the page.

For example: https://imgur.com/gRzj891 (the right side of the image is the right side of the page).

Ckepper (talkcontribs)

Yes, it seems like the problems stems from the infobox width given in `em` units. I will investigate further and try to fix it.

Theklan (talkcontribs)

Thanks fpr the update. This service should be available for ALL the languages, automatically. That's the purpose of WMF.

Reply to "New Render Server for PDF generation"