Talk:Markup spec/OCaml

From mediawiki.org
Latest comment: 14 years ago by Kanor

BTW, someone should fix backslash handling in current parser. Taw 03:28 3 Aug 2003 (UTC)

This page has some commentary about why this might be tough for Wikis. Is this relevant? I don't know enough to judge for myself. Matthewsim 21:16, 20 Aug 2004 (UTC)

My understanding of it is that it brings up exactly the problems I've encountered. Consider This is my ''dog'''s '''bone''' vs. ''This is my '''dog'''s bone''. Wikitax is not context-free. I would suspect resolving this would require having separate start and end tags. Anthony DiPierro 14:13, 17 Nov 2004 (UTC)
Yes. Replacing ticks with pipes, at the end of a document dog|||s displays as dogs, while dog|||s|| parses as dog|s. So the lexing of a token can look arbitrarily far in the future, across other tokens. en:User:ciphergoth 82.70.194.38 17:14, 24 September 2006 (UTC)Reply
The arguments in that article are not formally valid. An explanation is given in this response. The example This is my ''dog'''s '''bone''' can be solved with operator precedence and ''This is my '''dog'''s bone'' is so ambiguous that even a human would not know what that means (I don't), or even contain tag mismatching. Please do not confuse language ambiguity with context-sensible language. --Kanor 19:17, 26 November 2009 (UTC)Reply

Thanks for your work!

With a real parser of a clean syntax a Wikipedia DTD will be no problem! :-) And there will be wikipedia in RDF, too... (just dreaming :-) --Nichtich 00:19 29 May 2003 (UTC)

  1. It isn't going to implement any "clean syntax", rather trying to stay compatible with current one, fixing only things that break generation of correct XHTML.
  2. It's not finished, and I'm not completely sure if it will work that way.

Taw 08:49 29 May 2003 (UTC)


What means this? -Smack


let anything = ['\000'-'\255']

we use unicode, don't we?


btw: any parser will define a more "clean syntax" than the current one (that is no parser at all). If you are able to generate correct XHTML you are also able to generate other XML formats. --Nichtich 12:42 12 Jun 2003 (UTC)

['\000'-'\255'] - any byte from 0 to 255 (decimal, yeah, ocaml rox0rz here !!!). It will work with any ASCII-compatible encoding (ISO 8859, UTF-8, ISO 2022, EUC etc.) Taw 03:27 3 Aug 2003 (UTC)


anything_but_close_ ...[edit]

Are these "anything_but_close_math" etc. really necessary? In most regexp implementations, there exists "nongreedy" wildcards.

E.g <math>.*</math> will consume all text it can get, but <math>.*?</math> will only consume up to the first </math>

-- Stw 14:19, 25 May 2004 (UTC)Reply

What about ANTLR?[edit]

Is there anybody out there, who is using ANTLR as lexer and parser? I am trying to parse MediaWiki syntax in a Java structure. The long term goal would be, to make it possible to generate different output stuff (DocBook, HTML (as well), LaTeX, ordinary ASCII text and so on). But it is really hard to build a grammar file for MediaWiki. At moment i have a (who guessed it?) incomplete version based on the EBNF from this pages. But there are some difficulties to translate EBNF in an ANTLR .g file. I guess, that EBNF would be not so helpful in practice.

It seems to me (right now, probably, will change my point of view sometime later), LaTeX generation does not require a full-featured parser.
In any case, current lexer description looks very nice, great work. --VictorAnyakin 09:23, 24 November 2006 (UTC)Reply