Talk:Blahtex/Archive 1

From mediawiki.org
(Redirected from Talk:Blahtex/Archive1)
Latest comment: 18 years ago by Kingboyk in topic Getting it to work with MediaWiki
The following discussion has been transferred from Meta-Wiki.
Any user names refer to users of that site, who are not necessarily users of MediaWiki.org (even if they share the same username).

http://www.mozilla.org/projects/mathml/ has some TeX-to-mathML converters listed, too. - Omegatron 00:45, 4 August 2005 (UTC)Reply

Actually I believe all the ones listed there (at least the server-side converters) are already listed on this page. Dmharvey 00:51, 4 August 2005 (UTC)Reply

Compiling blahtex on Linux[edit]

The following discussion has been implemented as of blahtex version 0.2.1.
Please see the download page.
Thanks very much Jitse! Dmharvey 01:44, 9 August 2005 (UTC)Reply

I needed make some changes to compile blahtex on Linux (Debian testing to be more specific). Firstly, the following patch has to be applied to keep the compiler happy:

diff -c blahtex-0.2/main.cpp blahtex-0.2-new/main.cpp
*** blahtex-0.2/main.cpp        2005-08-02 21:29:33.000000000 +0100
--- blahtex-0.2-new/main.cpp    2005-08-05 23:50:32.000000000 +0100
***************
*** 106,112 ****
        wchar_t* outputbuf = new wchar_t[input.size()];
        char* dest = reinterpret_cast<char*>(outputbuf);
        char* inputbuf = new char[bufsize];
!       const char* source = inputbuf;
        strcpy(inputbuf, input.c_str());
        size_t inbytesleft = sizeof(char) * input.size();
        size_t outbytesleft = bufsize;
--- 106,112 ----
        wchar_t* outputbuf = new wchar_t[input.size()];
        char* dest = reinterpret_cast<char*>(outputbuf);
        char* inputbuf = new char[bufsize];
!       char* source = inputbuf;
        strcpy(inputbuf, input.c_str());
        size_t inbytesleft = sizeof(char) * input.size();
        size_t outbytesleft = bufsize;
diff -c blahtex-0.2/parser.ypp blahtex-0.2-new/parser.ypp
*** blahtex-0.2/parser.ypp      2005-08-02 21:30:12.000000000 +0100
--- blahtex-0.2-new/parser.ypp  2005-08-05 23:46:51.000000000 +0100
***************
*** 36,41 ****
--- 36,43 ----
        throw user_error(L"Parsing error: " + convert_utf8_to_ucs4(s));
  }

+ int yylex();
+
  %}

  %union {

Secondly, the iconv function is in the standard C library, so you need to remove the -liconv flag from the makefile. After these changes, it compiles. -- Jitse Niesen 23:13, 5 August 2005 (UTC)Reply

Hey Jitse, thanks for pointing that out, but I gotta say it's really weird. My iconv.h header file defines the second parameter of the iconv function to be const char*, so when I make your change, I get a warning about an invalid conversion from char* to const char*. I'm guessing your header has just char*. Mine is a GNU header but probably a little old. Yours seems more standard from what I read on the web. Are you running g++? Which version? Does this mean it's time for me to upgrade? Unfortunately I don't think there is any clever way to use const_cast which will satisfy both compilers. Dmharvey 00:33, 6 August 2005 (UTC)Reply

My header indeed has just char*:

extern size_t iconv (iconv_t __cd, char **__restrict __inbuf,
                     size_t *__restrict __inbytesleft,
                     char **__restrict __outbuf,
                     size_t *__restrict __outbytesleft);

I'm running g++ 3.3.5 and libc version 2.3.2 -- Jitse Niesen 00:47, 6 August 2005 (UTC)Reply

Also, I'm pretty hopeless when it comes to things like makefiles. Do you know how to write a makefile which will select the right -liconv option automatically? Dmharvey 00:33, 6 August 2005 (UTC)Reply

The easiest is to change the command in the makefile to g++ -O3 -o blahtex $(LIBS) $(SOURCES). Then you can use make if you don't need to include extra libraries, and make -liconv if you do need to link the library. It is possible to select the right option automatically by using autoconf. < Actually, it does not quite work yet. Blahtex does run and produce some output, but not the output that I want:

jitse@belfeld:~/wikipedia/phase3-cvs/blahtex-0.2$ cat foo.txt
1+2

jitse@belfeld:~/wikipedia/phase3-cvs/blahtex-0.2$ ./blahtex foo.txt
<mstyle displaystyle="true"><mrow><mi mathvariant="normal">&#x31000000;</mi><mo>
⁢</mo><mi mathvariant="normal">&#x2b000000;</mi><mo>⁢</mo><mi mathvariant=
"normal">&#x32000000;</mi><mo>⁢</mo><mi mathvariant="normal">&#xa000000;</mi>
</mrow></mstyle>

Perhaps something to do with Unicode; does the file have to be in some kind of encoding? By the way, why is the output on your interactive test page so nicely formatted, while my output is just a long string? -- Jitse Niesen 01:30, 6 August 2005 (UTC)Reply

I gather from the source that the input file should be in the UTF-8 encoding, which I guess means that there should not be any problems if only ASCII characters are used. However, I'm still at a loss why it does not work. Oh, and I found the -indented option, which answers my second question. -- Jitse Niesen 01:57, 6 August 2005 (UTC)Reply

Dmharvey told me that the problem is caused by Linux using a different endianness for variables of type wchar_t. This lead me to the following patch:

diff -x CVS -c blahtex-0.2-old/main.cpp blahtex-0.2/main.cpp
*** blahtex-0.2-old/main.cpp    2005-08-02 21:29:33.000000000 +0100
--- blahtex-0.2/main.cpp        2005-08-06 15:54:32.000000000 +0100
***************
*** 98,104 ****
  #if USING_ICONV
  wstring convert_utf8_to_ucs4(const string& input) {
        iconv_t utf8_to_ucs4;
!       utf8_to_ucs4 = iconv_open("UCS-4", "UTF-8");
        if (utf8_to_ucs4 == (iconv_t)(-1))
                throw internal_error(L"Could not create UTF-8 to UCS-4 conversion table.");

--- 98,104 ----
  #if USING_ICONV
  wstring convert_utf8_to_ucs4(const string& input) {
        iconv_t utf8_to_ucs4;
!       utf8_to_ucs4 = iconv_open("WCHAR_T", "UTF-8");
        if (utf8_to_ucs4 == (iconv_t)(-1))
                throw internal_error(L"Could not create UTF-8 to UCS-4 conversion table.");

After applying this patch, and the other patch mentioned above, blahtex compiles and works. -- Jitse Niesen 14:59, 6 August 2005 (UTC)Reply

Getting it to work with MediaWiki[edit]

I'm now trying to force Mediawiki to give the correct headers. What do I need? I got it to invoke blahtex and to say that the page is of mime type text/xml, but I still do not get the maths though the <math> tags appear in the source. Do I need to set the doctype or something like that? -- Jitse Niesen 00:58, 6 August 2005 (UTC)Reply

I ended up using the mime type "application/xhtml+xml". I know nothing about mime types, so maybe others would work too. You do need to change doctype stuff. In the skin files. (e.g. skins/MonoBook.php). I believe that's the only extra thing you need to do. There are also issues with MediaWiki caching the results in its database somewhere, which I don't really understand. When I started playing around with it, I started getting odd behaviour when I tried to save edits and things like that. Let me know how it goes. Dmharvey 01:15, 6 August 2005 (UTC)Reply
I had problems writing an extension which modifies the copyright footer based on the presence of a tag in the wikitext. The page would be cached as an object by mediawiki, but the footer is regenerated every time, meaning I got stale information. A dirty hack, which works, was to store my changes in an object and have mediawiki cache that object too (ParserOutput class, __sleep() and __wakeup()). I'm not sure if this idea helps here as I'm not familiar with the extension in question, but I suspect it might. --Kingboyk 19:02, 18 December 2005 (UTC)Reply

Ah, that's where I have to change it. There is a configuration file in which you can put the doctype, but this is apparently never used. Great! -- Jitse Niesen 01:26, 6 August 2005 (UTC)Reply

Different Display Options[edit]

Hello guys, great job so far! Recently I've decided to create a wiki for a specific community of applied mathematicians (data mining) and found out that the situation with math on the web is still in the enfant stage. Here are some improvements/questions I came up for now:

Displaying MathML with Java Applet[edit]

We could include an option into the blahtex that whould trigger including resulting MathML into a java applet <object> tag, which would render the formula. There are some Java Applets that can render MathML, one of them is included into WebEQ suite (http://www.dessci.com/en/products/webeq/) I'm not sure now if the applet itself is free, but the suite as a whole is definitely not free.

Advantages:

  • Cross-platform formulae display (definitely more browsers support Java than MathML)
  • Ability to select between plain MathML or java applet renderring via blahtex configuration

Disadvantages:

  • Fixed size of an appet (I dunno if it can be adjusted accordingly to the formula size)
  • There are probably no free java applets for displaying MathML... As far as I know, MediaWiki sticks to an open source software, so should we write such an applet if we are able to overcome the sizing issue mentioned in the previous item? :)

Limiting MediaWiki Math Options[edit]

My hosting does not have latex, etc installed... so I would like to completely disallow selecting "display formulae as images" in user settings. Actially, I would like to leave only blahtex option. Is there a way to configure that?

Future MediaWiki Integration[edit]

What is your estimate of the time when blahtex will be included into a standard distribution of MediaWiki? The need to incorporate all the hacks after each new version installation seems to be not very pleasant...

Andrey Kuzmenko 15:42, 16 August 2005 (UTC)Reply

Hi there Andrey, thanks for your interest and enthusiasm.
Regarding the Java applet. I think you're right that it would not be possible for us to use non-free software in mediawiki. Any such non-free software would have to be something the user obtained themselves. If you know how to write a Java applet that displays MathML, please go right ahead! (I'm sure it would only a take a few hours ;-))
Regarding your other questions: keep in mind that the first release of blahtex was only 14 days ago!! We are long way from mediawiki integration, and in fact blahtex itself still needs a lot of work. At the moment we are still at "proof-of-concept" stage. If we can demonstrate that MathML in wikipedia is (a) possible and (b) a Good Thing, then it might become standard part of MediaWiki. That is of course our eventual intention, but there are still plenty of hurdles.
I advise you to stop by the blahtex pages every couple of weeks to see what's going on.
I also invite you to help us make progress, by (a) finding bugs in blahtex and reporting them, and/or (b) helping out Jitse with mediawiki integration issues.
By the way, good luck with your wiki, I think it sounds like an excellent idea. Dmharvey 18:39, 16 August 2005 (UTC)Reply

TeX and MathML structure differences[edit]

(I'm still thinking about the superscript thing.) :-)

"We already have a huge collection of equations written in something that is not quite standard LaTeX."

Ah, yes. The TeX we use isn't quite standard anyway. What kind of changes are there besides the \Omega-type special characters?

Apart from the special characters you mention, the other main difference is the grouping rules. e.g. "\sqrt \sqrt x" is not legal latex. There are others. Dmharvey 15:20, 29 August 2005 (UTC)Reply

Can't it be made to assume that a string of numbers with no spaces is meant to be a single number? Or should we just say "always put brackets around the number that is the base of the superscript, like {123}^4"?

I don't like the idea of generating mathML that looks correct, but isn't structurally what the editor intended, especially since we have to tweak the output in a non-standard way to get it to look like that... User:Omegatron/sig 14:17, 29 August 2005 (UTC)

I'm working on it :-) Blahtex 0.3 will be quite different from 0.2.1 in all of these regards. I think you'll find some things will make you happier and there may be other things you won't like too much. To answer your specific questions:
Q: Can't it be made to assume that a string of numbers with no spaces is meant to be a single number?
A: Yes 0.3 will do this, but perhaps not exactly the way you expect. The question is not whether the input has any spaces in the number, but whether or not the output that TeX normally generates has any spaces in it.
Q: I don't like the idea of generating mathML that looks correct, but isn't structurally what the editor intended.
A: I don't either, but unfortunately when you look at the nitty-gritty details, LaTeX syntax simply doesn't provide enough information for a stupid computer to get it right in all cases. Simply put, MathML is more structured than LaTeX, so we have to make some guesses. I think version 0.3 will do a better job.
To summarise my philosophy on what I want blahtex to do: When someone types X into LaTeX, they expect a certain visual representation Y to come out. My primary goal is for blahtex to produce a result that is visually as close to Y as possible. I accept that in certain contexts this is not the most desirable goal; but in the context of wikipedia, where MathML will be living alongside PNGs for the forseeable future, I think it's important that people get something visually similar. I'm trying as hard as possible not to compromise on semantic intentions, but visual equivalence is my main goal. I recommend waiting for 0.3 before getting too worried about all this.
In any case, if it turns out that people don't like my code, it shouldn't be too hard to plug in a different converter! I won't take it personally, I promise :-) Dmharvey

A joke[edit]

"Its output is only Presentation MathML, not Content MathML."

User:Omegatron/sig 16:10, 5 September 2005 (UTC)
I think you mean . And no, blahtex will not have any idea how to deal with such markup Dmharvey 01:44, 7 September 2005 (UTC)Reply
What's this real life the author refers to? Does not compute! --Kingboyk 18:53, 18 December 2005 (UTC)Reply

From wikipedia[edit]

You should copy the description from Wikipedia:Wikipedia_talk:WikiProject_Mathematics#blahtex:_a_LaTeX_to_MathML_converter to here and make this the central site for it. - Omegatron 21:24, 29 July 2005 (UTC)Reply


See here for a proposal to use... "advanced characters" directly in the markup. User:Omegatron/sig 21:22, 21 October 2005 (UTC)