Talk:Markup spec/ANTLR/draft

From mediawiki.org
Latest comment: 7 years ago by Jabowery in topic GC overhead limit exceeded

Extension Tags[edit]

Missing the <source> tag from the spec is not your fault - nor any surprise. Due to MediaWiki's modularity, any extension can register any (unclaimed) tag and implement its own semantic meaning. Even acceptable HTML tags like <div> can be hijacked.

I noticed that you included the tags belonging to the Cite extension in the draft. This leads to the question of whether the grammar is meant to address native, vanilla MediaWiki, or whether a certain set of extensions are to be considered canonical. The former means that the grammar will not encompass all of the markup in say, the English Wikipedia, while the latter means that the grammar will not correctly reflect a pristine MW install. --Jimbojw (talk | blog) 17:00, 11 February 2008 (UTC)Reply

Compile Warnings?[edit]

I gave this a try using Antlr 3.0.1 and it compiled, but 60 or so warnings were generated. The most common was: "Decision can match input such as ... using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input". Is this to be expected or am I doing something odd? 08:51, 17 March 2008 (UTC)

namespaces and localizations[edit]

namespaces are localizable, and they have synonyms in various localizations. In addition the "image:" namespace is now a aliased synonym of "file:" (also localized, e.g. "Fichier:" in French wiki), and they are not case-sensitive. When parsing links, you should include the COLON as part of the namespace conditional.

My opinion is that images, categories and so on should first be parsed simple as [[target|optional parameters]], by just splitting the content separated by pipes, and parsing the optional parameters recursively for brackets. When the final bracket is reached, then only you can decide that the target is a link, and parse its namespace up to the first colon, before deciding what to do with parameters: namespace names should be parsed after hooking for finding translations and mapping them to namespace ids: if the namespace is not recognized, the whole syntax will becomevoid, and the first opening bracket will be resturned as a literal, and parsing will restart from it.

Note that the target mays contain a path (subpages) : see wikisource that uses page numbers after a slash to designate a specific page in the file. It may also contain area selection parameters using the query syntax (used by the image thumbnail generator). This is not technically a path like in a filesystem (image names can't contain slashes), but looks like a path because the referenced file may contain a directory-like structure (for example: multipages PDF/.jdvu files, videos with frames indexed by timespamp, audio track selection...), plus rendering options (in query parameters) for the image/video itself, separately from the target HTML rendering around the generated thumbnail (after pipes: thumb, border, frame, caption...).

Note also that optional parameters mmay contain literal pipes (e.g. in the "category:" namespace, or its translations like "catégorie:", this parameter is taken literally and not further splitted by pipes. all this means that you cannot parse the optional parameters before fully recognizing the namespace name up to the first colon.

Note also that the first colon will delimit the interwiki id (which must be distinct from namespaces in the local wiki). You can't do that without hooking because all wikis (including in standard Wikimedia projects such as Wikipedia, Wiktionary, Wikisource, Commons...) can have their own sets of namspace names and aliases, or interwiki ids. Verdy p 15:05, 2 October 2010 (UTC)Reply

GC overhead limit exceeded[edit]

Running:

$ antlr3 -version
ANTLR Parser Generator  Version 3.5.2
$ antlr3 mediawiki10.g3
warning(105): mediawiki10.g3:916:9: no lexer rule corresponding to token: LT
warning(200): mediawiki10.g3:205:11: 
Decision can match input such as "NL" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): mediawiki10.g3:208:22: 
Decision can match input such as "NL" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.antlr.analysis.NFAToDFAConverter.closure(NFAToDFAConverter.java:612)
	at org.antlr.analysis.NFAToDFAConverter.closure(NFAToDFAConverter.java:755)
...etc.

Jabowery (talk) 20:45, 15 June 2016 (UTC)Reply