BTW, someone should fix backslash handling in current parser. Taw 03:28 3 Aug 2003 (UTC)
- My understanding of it is that it brings up exactly the problems I've encountered. Consider This is my ''dog'''s '''bone''' vs. ''This is my '''dog'''s bone''. Wikitax is not context-free. I would suspect resolving this would require having separate start and end tags. Anthony DiPierro 14:13, 17 Nov 2004 (UTC)
- The arguments in that article are not formally valid. An explanation is given in this response. The example This is my ''dog'''s '''bone''' can be solved with operator precedence and ''This is my '''dog'''s bone'' is so ambiguous that even a human would not know what that means (I don't), or even contain tag mismatching. Please do not confuse language ambiguity with context-sensible language. --Kanor 19:17, 26 November 2009 (UTC)
Thanks for your work!
- It isn't going to implement any "clean syntax", rather trying to stay compatible with current one, fixing only things that break generation of correct XHTML.
- It's not finished, and I'm not completely sure if it will work that way.
Taw 08:49 29 May 2003 (UTC)
What means this? -Smack
let anything = ['\000'-'\255']
we use unicode, don't we?
btw: any parser will define a more "clean syntax" than the current one (that is no parser at all). If you are able to generate correct XHTML you are also able to generate other XML formats. --Nichtich 12:42 12 Jun 2003 (UTC)
['\000'-'\255'] - any byte from 0 to 255 (decimal, yeah, ocaml rox0rz here !!!). It will work with any ASCII-compatible encoding (ISO 8859, UTF-8, ISO 2022, EUC etc.) Taw 03:27 3 Aug 2003 (UTC)
Are these "anything_but_close_math" etc. really necessary? In most regexp implementations, there exists "nongreedy" wildcards.
E.g <math>.*</math> will consume all text it can get, but <math>.*?</math> will only consume up to the first </math>
-- Stw 14:19, 25 May 2004 (UTC)
What about ANTLR?
Is there anybody out there, who is using ANTLR as lexer and parser? I am trying to parse MediaWiki syntax in a Java structure. The long term goal would be, to make it possible to generate different output stuff (DocBook, HTML (as well), LaTeX, ordinary ASCII text and so on). But it is really hard to build a grammar file for MediaWiki. At moment i have a (who guessed it?) incomplete version based on the EBNF from this pages. But there are some difficulties to translate EBNF in an ANTLR .g file. I guess, that EBNF would be not so helpful in practice.
- It seems to me (right now, probably, will change my point of view sometime later), LaTeX generation does not require a full-featured parser.
- In any case, current lexer description looks very nice, great work. --VictorAnyakin 09:23, 24 November 2006 (UTC)