User:GorillaWarfare/pandoc

Per discussion at bug 46517, I've been testing out pandoc as an option to convert wiki pages to other formats. Pandoc is able to convert text to and from several potentially useful formats, including wikimarkup, HTML, LaTeX, markdown, and plaintext. It seems to have some serious difficulty in some areas, which I'm trying to document here for later reference.

Wikimarkup → LaTeX
Running pandoc using the following command:

General issues
Pandoc has some issues that affect almost every article that it tries to parse:
 * Images aren't included in articles. The documentation suggests that images will be downloaded if the standalone flag is set, but they are not. LaTeX attempts to find them in the directory from which it's building, and when it's unable to do so, the build fails.
 * Many accented/special characters aren't recognized.
 * All templates are omitted completely or fragments of the template syntax ends up in the output.
 * No attribution is included.
 * Footnote superscripts appear in the text, and the numbers appear at the bottom of the page, but the footnotes themselves are almost always incomplete or empty.
 * Piped and non-piped links are formatted differently from each other.
 * Links appear to be relative filepaths, which is not feasible considering how many links occur in each article, each linked article, and so on.
 * Endnotes might be more appropriate than footnotes, considering that they can often take up the better part of a page.
 * Categories are displayed as raw text at the end of the page.
 * There are lots of issues with tables:
 * Run off the edge of the page if they're too wide
 * LaTeX formatting is often broken, causing the document to fail to build.
 * They're centered, which is more of a stylistic choice I disagree with.
 * There are no cell dividing borders, which is really important for some tables (particularly those with cells that span multiple rows/columns)

Specific page tests
Bolded issues in the "results" column prevented a successful build. Word count is prose size only—text in tables, templates, etc. is omitted from this count.