User:OrenBochman/ParserNG/antlr

The preprocessor is tricky business to code.
 * It has recursive rules a complicated (for parsing) format.
 * double curly expressions can have different semantics
 * triple curlies too
 * single curlies also on occasion.

next follow some advanced Antlr specific features illustrated by things they could help to fix.

Recursive Lexer / Parser Rules
When creating a recursive expresions definition, the lowest nesting level is the lowest precedence of the operator matched in that rule.

Asociativity
expr: mult ('+' mult)* ; // left-associative via (...)* mult: pow ('*' pow)* ; pow : atom ('^' pow)? ; // right-associative via tail recursion atom: ID| INT ;

In The PreProcessor
thus:

grammar:PreProcessor;

members { String myName; boolean isTemplate(String name){...} String getTemplate(String name, List args){...} boolean isTemplateArg(String name){...} String getTemplateArg(String name){...} boolean isMagicWord(String name){...} String getMagicWord(String name, List args){...} boolean isParserFunction(String name){...} String getParserFunction(String name, List args){...} } wikiExpr: curlyExpr; curlyExpr: parserFunction| magicWord | templateVar | template ; angleExpr: ext | nowiki | noInclude | includeOnly | comment | pre ; template: {isTemplate(input.LT(2).text)}?=>''; // right-associative via tail recursion templateVar: {isTemplate(input.LT(2).text)}?=>''; // right-associative via tail recursion parserFunction : {isParserFunction(input.LT(2).text)}?=>''; //curly atom magicWord : {isMagicWord(input.LT(2).text)}?=>''; //curly atom //lexer rules PF_ARRGS: ... ; MW_ARGS: ... ; TEMPLATE_ARGS: ... ;

Semantic and Syntactic Predicates
two advanced ideas are

syntactic predicates
parts of a template look like:

Using Semantics To Simplify Syntax Parsing
does lit. '...' reffer to a parser function, magic word or a template ?

since the parser should be privy to a symbol table of parser_functions, magic_words, templates, variables, etc they can be used to simplify (read: speed up parsing using a simple semantic check (isMagicWord) instead of a deep syntax scan.) in Antlr speak lookahead is said to be analogous to checking all options of the maze, using predicates is called sending in a trained monkiess to stand in the required junction points.

IncludeOnly and NoInclude semantics

 * IncludeOnly means include the following text only within the scope of the tansculded page (expanded page).
 * NoInclude means include the following text only within the scope of the template page (expanded page).

Look Ahead
This is an example of using look ahead in the parser. There is also lookahead in the lexer. But it does not return string only ints (of chars).

Implementing some parser functions
if switch etc are the where parsers shine. the parser rules below use sysntactic predicates:

Implementing some Magic Words
Magic words can be trickier to take care of since. one idea is to keep them in a data structure like a hash or a trie.