Topic on Talk:Reading/Web/PDF Functionality/Flow

Dirk Hünniger (talkcontribs)
MavropaliasG (talkcontribs)

Thank you very much for your work on mediawiki2latex. I really like it, and the output looks amazing.

Steelpillow (talkcontribs)

This looks promising. Could you automate the whole algorithm: build the book in C/Tachyon, then generate the contributions list and append it with Haskell/mw2latex? That is, use modules in C as separate accelerators for the overall Haskell process?

Dirk Hünniger (talkcontribs)

No, that's not the route to go. 50% of the runtime of mediawiki2latex is used for the generation of the lists of contributors and figures. This is because the information is extracted from the pages histories of all pages and all images in the book.

The way to do this right is to directly query the SQL database. As far as I understand WMF allows for that. But the is a problem: my second name is Hünniger which contains the letter ü, which is impossible for the horizon web interface of WMF. So I am sorry to say that this is not possible until new software get installed on horizon, which might take years, since the problem already exists for years.

Another 30% of the runtime of mediawiki2latex is needed for the images. In mediawiki2latex I download the images in the maximum possible resolution and scale the down to 300 dpi and include these rather large images in LaTeX which causes the LaTeX compiler to spend more than half of its total runtime on images, since it is doing a time consuming recompression when embedding the images, which cannot be changed according to the German LaTeX mailing list.

In tachyon I download images in a much lower resolution and don't do any image processing which speeds up things significantly. So a lot of the speedup in tachyon actually come from leaving out contributor information and using lower resolution images.

The actual runtime of the tachyon C code accounts for less that one percent of the actual runtime of a single tachyon run. The rest of the time is needed by wget to download to images and html pages and by xelatex to create the pdf. So tachyon is more an experiment to measure a lower bound to the runtime of such a conversion than the future route of mediawiki2latex.

So to wrap it up. Moving from Haskell to C can make that part of the program significantly faster, but even if we did that we would only affect 20% of the total runtime, since the rest of the runtime is spend by auxiliary programmers, not under my control. So what we can get at most is a 20% speed up when porting to C. If we did direct database querys we will get a 50% speed up for free. An other way is using lower resolution images, but I doubt many people will be happy with that. But yes, if all these issues are solved, we could still go to C.

Steelpillow (talkcontribs)

Thank you for the explanation.

The umlaut issue is just the kind of reason why alias accounts are sometimes allowed. Would it help if you created a second account using the "Huenniger" spelling?

Dirk Hünniger (talkcontribs)

yeah this is likely to help. But it would require some administrative artistry to get the right privileges for that new account. I have not yet decided if I really want to do that. But basically that is a possible solution

Reply to "tachyon continues"