Extension talk:Wiki2LaTeX

Math tags added
I needed to convert the $$$$ tag, to use formulas on exporting latex documents, so I added some very simple code:

in w2lConfig.php I added the line

$w2l_tags['math']        = array('w2l_mwext_support', "math");

and, in w2LatexUtil.php, into the w2l_mwext_support class I added the function function math($input, $argv, &$parser, $mode = 'wiki') { $output = "\n\begin{math}\n"; $output .= trim($input)."\n"; $output .= "\end{math}\n"; return $output; } just copied and modified from the "pre" function

and it worked! Hope it can be useful for others. :)

(and hope of being in the correct place to post my hints...)
 * I added your code to the corefiles, so the next version will support $$$$ natively. Thanks! HG 16:41, 20 August 2007 (UTC)

Image Processing
I have implemented a simple solution for processing of internal images. It searches for the filename in the images/ directory and copies it to an image directory under the tmp/tmp-123... directory. You can add my little helper class with the following steps:

First of all you need to add a few lines to function internalLinkHelper in w2lParser.php before the line // First, check for |: if ( (stripos($link, "Bild:") === 0) or (stripos($link, "Image:") === 0) ) { $link = str_replace('Bild:', '', $link); $link = str_replace('Image:', '', $link); return "". $link. ""; }

Then you need to include my class file at the top of w2lExporter.php: require_once('w2lImages.php');

In w2lExporter.php you need to edit function w2l_unknown_action</tt> to process the images. I added the following line $parsed = w2lImages::processImages($parsed, $mytemp); to the sections where $action</tt> is w2lpdf</tt> or w2ltex</tt> right behind these lines: $parsed = $parser->parse($to_parse); $mytemp = $helper->path;

And this is my little class file w2lImages.php</tt>:

<?php define('W2L_ImageDir', "Bilder"); define('W2L_ImageTitle', "Abbildung");

class w2lImages {

public static function processImages($parsed, $mytemp) {       $matches = array; $matchCount = preg_match_all('/(.*)<\/IMAGE>/', $parsed, $matches); if ($matchCount > 0) {               $cntr = 0; foreach ($matches[1] as $link) {                       $imgTag = $matches[0][$cntr]; $links = explode("|", $link); $imgFileName = $links[0]; if (strpos($imgFileName, 'jpg') != false) {                               $imgFile = shell_exec("find ./images -name " . $imgFileName); if ($imgFile) {                                       $imgFiles = explode("\n", $imgFile); foreach ($imgFiles as $if) {                                               if (strlen($if) > 0 && strpos($if, "thumb") == false) $imgFile = $if; }                                       if (!file_exists($mytemp. "/" . W2L_ImageDir)) {                                               mkdir($mytemp . "/" . W2L_ImageDir, 0777); }                                       $copied = copy($imgFile, $mytemp . "/" . W2L_ImageDir . "/" . $imgFileName); if ($copied) {                                               $imgCaption = (isset($links[1])) ? $links[1] : W2L_ImageTitle. " " . $cntr; $imgLatex = '\begin{figure}[htb]'. "\n"; $imgLatex .= '\centering'. "\n"; $imgLatex .= '\includegraphics[width=\textwidth]{'. $imgFileName. '}' . "\n"; $imgLatex .= '\caption{'. $imgCaption. '}' . "\n"; $imgLatex .= '\label{fig:'. W2L_ImageTitle. $cntr. '}' . "\n"; $imgLatex .= '\end{figure}'. "\n"; $parsed = str_replace($imgTag, $imgLatex, $parsed); }                                       else {                                               $parsed = str_replace($imgTag, "Image could not be copied: " . $imgFileName, $parsed); }                               }                        }                        else {                               $parsed = str_replace($imgTag, "Image is not a JPG: " . $imgFileName, $parsed); }                       $cntr++; }               return $parsed; }       else {               return $parsed; } }

} ?>

This will result in images included like this: \begin{figure}[htb] \centering \includegraphics[width=\textwidth]{ImageName.jpg} \caption{Abbildung 1} \label{fig:Abbildung1} \end{figure}

Any comments are welcome ;) http://blog.stefan-macke.de

-

Hi Stefan,

I've cut down your code to this snippet that does case-insensitive matching for image: tags + supports image size in centimeters: Inside internalLinkHelper in w2lParser.php:

if(preg_match("/(?i)image:/",$link, $matches)) { $parts = preg_split("/\|/", $link); $imagename = str_replace($matches[0], '', array_shift($parts)); $imgwidth = "10cm"; foreach ($parts as $part) { if (preg_match("/\d+px/", $part)) continue; if (preg_match("/(\d+cm)/", $part, $widthmatch)) { $imgwidth = $widthmatch[1]; continue; }

if (preg_match("/thumb|thumbnail|frame/", $part)) continue; if (preg_match("/left|right|center|none/", $part)) continue; $caption = trim($part); }               $title = Title::makeTitleSafe( NS_IMAGE, $imagename ); $file = new Image( $title ); $file->loadFromFile; $imagepath = $file->getImagePath; $title = $file->getTitle->getText; return "\\begin{center} \\resizebox{".$imgwidth."}{!}{\includegraphics}\\\\ \\textit\end{center}\n";

Given a tag like: it will output LaTeX like: \begin{center} \resizebox{20cm}{!}{\includegraphics{/var/www/mediawiki/images/a/a1/Colombes.jpg}}\\ \textit{Colombes, France}\end{center}

I think there's a lot of good solutions on this page, now we only need Hans-Georg to choose one for the next release.

Cheers, Ole Dahle

namespace problems
If you're not using custom namespaces the extension will stop since it tries to search $wgExtraNamespaces. Is that LaTeX namespace required? If so - for what? --Flominator 10:32, 14 August 2007 (UTC)
 * Solved, see front page. Anyway it should better print what's wrong instead of simply crashing. --Flominator 10:49, 14 August 2007 (UTC)
 * Yeah, the extension should print an error. It's a little mistake, as the extra namespace is only required, if you are using the pdf-export-feature. Will be fixed in v.0.5, which is not too far away. Sorry for the inconvenience. HG 12:16, 14 August 2007 (UTC)

Problem with 0.6.1
Both, the dev-version and the one from Google produce this message: Fatal error: Call to undefined method Image::getPath in C:\XAMPP\htdocs\itswiki\extensions\w2l\w2lParser.php on line 1025 regards, --Flominator 05:52, 14 September 2007 (UTC)
 * Works fine for me. Might be an issue regarding the filename or you're not using Mediawiki 1.11, which is required to run W2L v.0.6 and above due to several changes to local files which were introduced in Mediawikiversions since 1.9. --HG 09:48, 14 September 2007 (UTC)
 * I go this same error with MW1.10, so upgraded to 1.11 as suggested, now new error when trying to export pdf:

Fatal error: Call to a member function getTimestamp on a non-object in /var/lib/mediawiki-1.11.0/extensions/wiki2latex/w2lExporter.php on line 365
 * The relevant section of w2lExporter.php</tt> is

357                // Get Template-Vars... 358    359                 $template_vars = $w2l_vars; 360                $template_vars = $helper->getTemplateVars($to_parse); 361      //If title was not set by a tag, use page title 362                // Put some special variables in the template vars 363    364                 $rev = Revision::newFromTitle($wgTitle); 365                $date = $wgLang->timeanddate( wfTimestamp(TS_MW, $rev->getTimestamp), true ); 366                if(!in_array("Title", $template_vars)) { 367                        $template_vars['Title']  = $title; 368                }    369                 $template_vars['revision timestamp'] = $date; 370                $template_vars['revision user'] = $rev->getUserText; 371                $template_vars['revision id'] = $rev->getId; 372                $template_vars['page id'] = $helper->getArticleId;
 * So $rev</tt> must be null, is it allowed to be, and if so, does this just need an exception test? Any suggestions? I had no problem with the update.php</tt> on the database. Hoogs 15:54, 21 September 2007 (UTC)
 * That's odd. I can't reproduce that error. In case you don't use the template tags, which are defined by these lines, you can put the lines 364-372 into a comment. Might not be the best way, but should work. But I added a Null</tt>-check to the current code. So it should be fixed in w2l 0.6.2. --HG 17:36, 21 September 2007 (UTC)
 * Thanks. Actually there is another problem that may be the cause.  I'd set up address redirection in apache2 for MW1.10, but now for some reason although articles are displayed properly in MW1.11, when I go to the edit tab, the edit page for "Index.php" comes up, for any page.  This probably affects your code too, I'll post when resolved. Hoogs 01:57, 22 September 2007 (UTC)

0.6.1 on Ubuntu server
Running Ubuntu 6.06.1 LTS LAMP server running MW1.11 (upgraded from MW1.10) with the following extensions:


 * Pdf, prints a page as pdf, Thomas Hempel
 * Cite, adds tags for citations, Ævar Arnfjörð Bjarmason
 * ParserFunctions, enhance parser with logical functions, Tim Starling
 * StringFunctions (version 1.9.3)

Editing Index.php
Pages were displaying ok, but upon editing, a generic "Editing Index.php" page would come up. Turns out there is a simple fix, adding:

$wgUsePathInfo = false;

to LocalSettings.php</tt>

Wiki2LaTeX extension installation and testing
The following problems/fixes relate to the pdf export functionality only which was the quickest "so what can this extension do?" test.

Originally, I had put the extension in a directory named extensions/wiki2latex</tt> but the path extensions/w2l</tt> is hardwired into the php (it may be worthwhile emphasising this a bit more), so corrected.

These warnings were displayed: Warning: mkdir [function.mkdir]: Permission denied in /var/lib/mediawiki-1.11.0/extensions/w2l/w2lExporter.php on line 691

Warning: chmod [function.chmod]: No such file or directory in /var/lib/mediawiki-1.11.0/extensions/w2l/w2lExporter.php on line 692

Warning: file_put_contents(extensions/w2l/tmp/tmp-1190454669-678688419/Main.tex) [function.file-put-contents]: failed to open stream: No such file or directory in /var/lib/mediawiki-1.11.0/extensions/w2l/w2lExporter.php on line 402

Warning: chdir [function.chdir]: No such file or directory (errno 2) in /var/lib/mediawiki-1.11.0/extensions/w2l/w2lExporter.php on line 413

The directory extensions/w2l</tt> must be writeable so this was fixed (lazily) with


 * 1) chmod -R a+w /extensions/w2l

Maybe worth mentioning permissions on the extension directory tree. This error came up in the latex log:

LaTeX Error: File `utf8x.def' not found.

This was fixed in ubuntu with


 * 1) aptitude install latex-ucs

It may be worth starting a list of dependencies. For some wiki pages, even though there was no fatal error in the latex log and Main.{tex,pdf}</tt> etc. seemed ok, the extension would display the message

w2lParser::getContentByTitle: Artikel existiert nicht: Vorlage:Date

Google translated:

w2lParser:: getContentByTitle: Article does not exist: Collecting main: DATE

I couldn't resolve this. The generated pdf seemed to pick up  $$...$$ </tt> perfectly. However 0.6.1 did not display these correctly:

Basic stuff I really need
 * <tt> ... </tt> (have used these to centre equations)
 * tables

Nice-to-haves
 * (currently nil, I haven't tested sufficiently yet)

I have never really found a decent converter and this looks promising so congratulations and many thanks!! I now use my mediawiki as a primary documentation system at work, so if we can build the bridge wiki -> latex/pdf this will be an amazing tool. Hoogs 11:25, 22 September 2007 (UTC)
 * The error saying <tt>"Artikel existiert nicht: Vorlage:Date"</tt> means, that a template could not be found. Will be fixed soon. Regarding the issue with <tt> $$$$ </tt> I'd be interested what exactly does not display right, as that should be fine for long time now. <tt> ... </tt> support is on the todo-list. --HG 17:57, 23 September 2007 (UTC)
 * Sorry, misunderstanding, math tags work fine, I have changed grammar above to make that clearer. What about tables? I think you would need table conversion to be able to say you have complete basic-level functionality. Hoogs 23:32, 23 September 2007 (UTC)
 * Tables are supported. At least simple ones without 'rowspan'-ing or 'colspan'-ing. You just need to add a small attribute, which is documented at Extension talk:Wiki2LaTeX/Development/w2lParser.php. I hope it works for you.--HG 07:03, 24 September 2007 (UTC)
 * Awesome, thanks for the heads up! Hoogs 11:00, 24 September 2007 (UTC)

file path
First off this is one the coolest most useful extensions and almost everything is working great (it's very fool-proof I suppose). But I can't get it to output the pdf in the temp folder. I ran the (pdflatex -interaction=nonstopmode Main.tex) line from w2lConfig file in cmd, and all the tasks got completed one after another (I have MiKTeX installed on my XP of course) except at the end it says: No pages of output. Transcript written on pdflatex.log. entering extended mode ! I can't find file `Main.tex'. <*> Main.tex Please type another input file name ! Emergency stop. <*> Main.tex ! ==> Fatal error occurred, no output PDF file produced! Transcript written on texput.log. It seems that everything's fine except that it can't find the Main.tex

I actually tried running the pdflatex -interaction=nonstopmode Main.tex line with a full path to Main.tex like C:\server\..\temp\temp-123\Main.tex and the cmd stopped because the MiKTeX wanted to download an additional package, it actually never downloaded the package but that's irrelevant.

Basically is there any documentation I could find or things I should know setting up the TeX software for w2l and vice versa? This far the only mention of tex I've seen is on the latex tab saying: "needs working tex-installation on server!". Where can I find more? 354d 09:51, 28 September 2007 (UTC)
 * You could try to install the missing package via the package manager of MikTeX. Basically the tex-instalation does not need to be configured as most should run out of the box. But there are some things you can do to check if there are problems with w2l:
 * Locate the tmp/some_strange_numbers/-folder and check if the file <tt>Main.tex</tt> is existing.
 * if the file exists, change to the directory in the command line and run <tt>pdflatex Main.tex</tt>. (If it doesn't exist, somehow w2l can't write to the tmp-folder)
 * Check if pdflatex prints out any warning or error-message while compiling the texfile.
 * If you can't locate the error, please post the complete log here. Maybe something else is causing the problem.
 * --HG 14:32, 28 September 2007 (UTC)

\multirow, tabulary instead of tabularx
Hi HG +Co., thanks for this extremely promising extension. I found out (using still w2l-0.6.2 on a Suse 10.2) that for my purposes the package "tabulary" (in combination with package "multirow") gives nicer tables than "tabularx". If you want to test my version (not elegant at all and ways too complicated -but I am learning-) do the following: after if ( !defined('MEDIAWIKI') ) die; add the line require_once('doTabulary.inc.php'); add an "if"-argument in maskMwSpecialChars($str) ); if(!preg_match("/}{\*}{/", $str))    $str = strtr($str, $chars); and cut externalTableHelper down to private function externalTableHelper( $matches ) { $t = trim($matches[1]); $t = doTabulary($t); return $t; } \usepackage{tabulary} \usepackage{multirow} <?php function getSpanDetails($spantype, $array_with_span_arguments){ //for $spantype use "rowspan" or "colspan" //this function finds the lines containing a rowspan resp. a colspan argument. //$span_details[0] is a subarray with the span containing line numbers (the last line of the array first) as keys //and the size of the "span" as its value //$span_details[1] is a subarray with the span containing line numbers again as keys and the content of the "spanned" cell as value  $grep_string = "/.*".$spantype."=\"[0-9]+\"/";  $match_string = "/".$spantype."=\"([0-9]+)\"\s\|\s(.*)/";  $idx = preg_grep($grep_string, $array_with_span_arguments); $content = array; foreach($idx as $key => $value){ preg_match_all($match_string, $value, $spanline); $idx[$key] = $spanline[1][0]; $content[$key] = $spanline[2][0]; } $span_details= array($idx, $content); return $span_details; } function evalRowkeys($exploded_str){ //array rowkeys contains position "{|" (!not quite right, but necessary), row separators "|-", table end of exploded string (= array) $rowkeys = array_keys($exploded_str, '|-'); //w2l-0.7.0: $table_end = array_search('|}', $exploded_str); $table_end = array_search('|\\}', $exploded_str); $rowkeys[] = $table_end; $table_begin = preg_grep("/^(!.*)|(^\|[^+].*)/", $exploded_str); reset($table_begin); array_unshift($rowkeys, key($table_begin)); $rowkeys[0] -= 1; //hack: all the others start BEFORE the amino acid return $rowkeys; } function actualRowsize($rowkeys){ $actual_rowsize = array; for ( $i = 1; $i < count($rowkeys); ++$i){ $actual_rowsize[$i-1] = ($rowkeys[$i] - $rowkeys[$i-1]-1); } return $actual_rowsize; } function doTabulary($t) { $splitstr = explode("\n", $t); foreach($splitstr as $key => $cell) $splitstr[$key] = trim($cell); //after each colspan="x"-argument: insert (x-1) colspandummies $colspan = getSpanDetails("colspan", $splitstr); $colspan_idx = $colspan[0]; $colspan_content = $colspan[1]; krsort($colspan_idx); krsort($colspan_content); foreach($colspan_idx as $key => $value){ $temp_fill = array_fill(0, ($value-1), '| §?colspandummy?§ '); array_splice($splitstr, $key + 1, 0, $temp_fill); if(strpos($splitstr[$key], '| ') === 0) $splitstr[$key] = '| \multicolumn{'.$value.'}{|c|}{'.$colspan_content[$key].'}'; if(strpos($splitstr[$key], '! ') === 0)     $splitstr[$key] = '| \multicolumn{'.$value.'}{|c|}{\textbf{'.$colspan_content[$key].'}}'; } //in each row that is in reach of an upstream rowspan-argument: insert a rowdummy // in case there are not enough column-entries in a row up to the rowdummy, fill it up with "row_more_dummies" $rowspan = getSpanDetails("rowspan", $splitstr); $rowspan_idx = $rowspan[0]; $numrowspans = count($rowspan_idx); for($x = 0; $x < $numrowspans; ++$x){ $rowkeys = evalRowkeys($splitstr); $actual_rowsize = actualRowsize($rowkeys); $rowspan = getSpanDetails("rowspan", $splitstr); $rowspan_idx = $rowspan[0]; $rowspan_idx_keys = array_keys($rowspan_idx); $i = 0; while ($rowspan_idx_keys[0] > $rowkeys[$i+1]) ++$i; $rowspan_position_in_row = $rowspan_idx_keys[0] - $rowkeys[$i]; $fill_one = array_fill(0, 1, '| ?§rowdummy§? '); for($k = ($rowspan_idx[$rowspan_idx_keys[0]]-1); $k > 0; --$k){ if($actual_rowsize[$i+$k] >= $rowspan_position_in_row) array_splice($splitstr, ($rowkeys[$i+$k] + $rowspan_position_in_row), 0, $fill_one); else{ $fill_more = array_fill(0, $rowspan_position_in_row - $actual_rowsize[$i+$k]-1, '| ?§row_more_dummies§? '); $fill_more = array_merge($fill_more, $fill_one); array_splice($splitstr, $rowkeys[$i+$k] + $actual_rowsize[$i+$k] + 1, 0, $fill_more); }     $splitstr[$rowspan_idx_keys[0]] = str_replace('rowspan', 'r_span', $splitstr[$rowspan_idx_keys[0]]); } }  $rowkeys = evalRowkeys($splitstr); $actual_rowsize = actualRowsize($rowkeys); //how many cols does the largest row have? $temp = ($actual_rowsize); rsort($temp); $maxcolrow = $temp[0]; //now we know, also needed later for the "|Y|Y|..." in the LaTeX table declaration //Now fill up the table, every row will have $maxcolrow colums then: krsort($actual_rowsize); for ($i = count($actual_rowsize)-1; $i >= 0; --$i){ $tofillupmax = $maxcolrow - $actual_rowsize[$i]; if($tofillupmax > 0){ $fillup_array = array_fill(0, $tofillupmax, '| '); array_splice($splitstr, $rowkeys[$i+1], 0, $fillup_array); }   else {} } $rowspan = getSpanDetails("r_span", $splitstr); $rowspan_idx = $rowspan[0]; $rowspan_content = $rowspan[1]; foreach($rowspan_idx as $key => $value){ if(strpos($splitstr[$key], '| ') === 0) $splitstr[$key] = '| \multirow{'.$value.'}{*}{'.$rowspan_content[$key].'}'; if(strpos($splitstr[$key], '! ') === 0)     $splitstr[$key] = '| \multirow{'.$value.'}{*}{\textbf{'.$rowspan_content[$key].'}}'; } $rowkeys = evalRowkeys($splitstr); //end of row: -> \cline : for($j = 1; $j < count($rowkeys)-1; ++$j){ $clinestr = '\\\\ '; for($i = $rowkeys[$j-1]+1; $i < $rowkeys[$j]; ++$i){ $clinecount = $i - $rowkeys[$j-1]; if(!preg_match("/(rowdummy)|(multirow)/", $splitstr[$i])) $clinestr = $clinestr	.'\\cline{'.$clinecount.'-'.$clinecount.'} '; else{} }	   $splitstr[$rowkeys[$j]] = $clinestr; } //remove colspandummies $rem_colspandummy = preg_grep("/colspandummy/", $splitstr); krsort($rem_colspandummy); foreach($rem_colspandummy as $key => $value) array_splice($splitstr, $key, 1); //some conversions etc. ... foreach($splitstr as $key => $cell){ if(strpos($cell, '! ') === 0)     $splitstr[$key] = str_replace('! ', "& \\textbf{", $cell).'}'; if(strpos($cell, '|+ ') === 0) $splitstr[$key] = str_replace('|+ ', "\multicolumn{".$maxcolrow."}{|c|}{ ", $cell)." }\\\\ \hline "; if(strpos($cell, '| ') === 0) $splitstr[$key] = str_replace('| ', '& ', $cell); //w2l-0.7.0: if(strpos($cell, '|}') === 0) if(strpos($cell, '|\\}') === 0) //w2l-0.7.0: $splitstr[$key] = str_replace('|}', "\\\\ \hline \end{tabulary} \\\\", $cell); $splitstr[$key] = str_replace('|\\}', "\\\\ \hline \end{tabulary} \\\\", $cell); if(strpos($cell, '{|') === 0) $splitstr[$key] = "\\newline \\\\     \\newcolumntype{Y}{>{\centering\arraybackslash}C}      \setlength\extrarowheight{4pt} \setlength\\tabcolsep{2pt} \setlength\\tymin{30pt}      \begin{tabulary}{\\textwidth}{".str_repeat('|Y', $maxcolrow)."|} \hline"; else {} } $newline = preg_grep("/\\\\\s/", $splitstr); foreach($newline as $key => $value){ $splitstr[$key + 1] = str_replace('& ', '', $splitstr[$key + 1]); } $splitstr = str_replace('?§rowdummy§? ', '', $splitstr); $splitstr = str_replace('?§row_more_dummies§? ', '', $splitstr); //here is a hack needed with "vspace", because in "tabulary" colored fonts somehow produce a vertical offset, //thereby enlarging rowheight unneccessarily: $splitstr = str_replace('<span style="color:', '\vspace{-8pt}{\color{', $splitstr); $splitstr = str_replace('">', '}', $splitstr); $splitstr = str_replace(' ', '}', $splitstr); $splitstr = preg_replace('/\'\'\'(.*)\'\'\'/', '\\textbf{$1}', $splitstr); foreach($splitstr as $key => $cell){ if(strpos($cell, '|') === 0) $splitstr[$key] = str_replace('|', '& ', $cell); else{} } $t = implode("\n", $splitstr); return $t; }
 * in w2lParser.php:
 * in the preamble of your LaTeX-template add:
 * and finally save the following lines as "doTabulary.inc.php" (without my test-table, of course): !changes needed for w2l-0.7.0 as comments!

/* My test table (copy & paste into your wiki): ?>

...what do you think about it?

Cheers --Michael 15:23, 30 October 2007 (UTC)
 * Hi Michael! I tried to activate your tablecode but somehow it does not work. But I will look into it. It could take a while because I'm quite busy at the moment. Maybe your code can optionally integrated into W2L, since there are some more ways to write LaTeX-tables. We will see. --HG 18:33, 30 October 2007 (UTC)
 * Hi HG! But it's NOT because "matches" in externalTableHelper( $matches ) from 0.6.2 (the version I used in my description) is an array whereas in 0.7 a string is handed over, right? I will check the posted code again, maybe I created a copy/paste error? --Michael 19:14, 30 October 2007 (UTC)
 * OOPS, a bit more serious than a simple copy/paste mistake: It couldn't work because the "*" in \multirow{x}{*}{text} was not "protected". For a fix, function maskMwSpecialChars in w2lParser.php has to be modified, see description above. Sorry! --Michael 11:13, 31 October 2007 (UTC)