Talk:Markup spec/OCaml

BTW, someone should fix backslash handling in current parser. Taw 03:28 3 Aug 2003 (UTC)

This page has some commentary about why this might be tough for Wikis. Is this relevant? I don't know enough to judge for myself. Matthewsim 21:16, 20 Aug 2004 (UTC)


 * My understanding of it is that it brings up exactly the problems I've encountered. Consider  This is my dogs bone  vs.  This is my dog's bone .  Wikitax is not context-free. I would suspect resolving this would require having separate start and end tags.  Anthony DiPierro 14:13, 17 Nov 2004 (UTC)


 * Yes. Replacing ticks with pipes, at the end of a document  dog|||s displays as dogs, while dog|||s|| parses as dog|s.  So the lexing of a token can look arbitrarily far in the future, across other tokens. en:User:ciphergoth 82.70.194.38 17:14, 24 September 2006 (UTC)


 * The arguments in that article are not formally valid. An explanation is given in this response. The example  This is my dogs bone  can be solved with operator precedence and  This is my dog's bone  is so ambiguous that even a human would not know what that means (I don't), or even contain tag mismatching. Please do not confuse language ambiguity with context-sensible language. --Kanor 19:17, 26 November 2009 (UTC)

Thanks for your work!

With a real parser of a clean syntax a Wikipedia DTD will be no problem! :-) And there will be wikipedia in RDF, too... (just dreaming :-) --Nichtich 00:19 29 May 2003 (UTC)

Taw 08:49 29 May 2003 (UTC)
 * 1) It isn't going to implement any "clean syntax", rather trying to stay compatible with current one, fixing only things that break generation of correct XHTML.
 * 2) It's not finished, and I'm not completely sure if it will work that way.

What means this? -Smack

let anything = ['\000'-'\255'] we use unicode, don't we?

btw: any parser will define a more "clean syntax" than the current one (that is no parser at all). If you are able to generate correct XHTML you are also able to generate other XML formats. --Nichtich 12:42 12 Jun 2003 (UTC)

['\000'-'\255'] - any byte from 0 to 255 (decimal, yeah, ocaml rox0rz here !!!). It will work with any ASCII-compatible encoding (ISO 8859, UTF-8, ISO 2022, EUC etc.) Taw 03:27 3 Aug 2003 (UTC)

anything_but_close_ ...
Are these "anything_but_close_math" etc. really necessary? In most regexp implementations, there exists "nongreedy" wildcards.

E.g  &lt;math&gt;.*&lt;/math&gt;  will consume all text it can get, but  &lt;math&gt;.*?&lt;/math&gt;  will only consume up to the first  &lt;/math&gt; 

-- Stw 14:19, 25 May 2004 (UTC)

What about ANTLR?
Is there anybody out there, who is using ANTLR as lexer and parser? I am trying to parse MediaWiki syntax in a Java structure. The long term goal would be, to make it possible to generate different output stuff (DocBook, HTML (as well), LaTeX, ordinary ASCII text and so on). But it is really hard to build a grammar file for MediaWiki. At moment i have a (who guessed it?) incomplete version based on the EBNF from this pages. But there are some difficulties to translate EBNF in an ANTLR .g file. I guess, that EBNF would be not so helpful in practice.
 * It seems to me (right now, probably, will change my point of view sometime later), LaTeX generation does not require a full-featured parser.
 * In any case, current lexer description looks very nice, great work. --VictorAnyakin 09:23, 24 November 2006 (UTC)