User:OrenBochman/ParserNG/Preprocessor Antlr

The Preprocessor
This is a Lexer for the Preprocessor in ANTLR

//based on

EOF and broken xxx rules
Antlr offers some remedies. However issue is not so well explained.

The problem with ]]  is a poor choice since it looks like a syntax error. If we consider it is easier to see that the inner most element should be evaluated before the outer ones. Inner most element also starts furthest on the right. If we consider it seemes that the parse tree should become something like:

Parse Tree

 * where:
 * | are pipes
 * /\ are branches
 * null are empty place holders
 * once parsed it should than be processed (and inverted) into an AST by further tree building rules.

EOF rules
As mentioned in the EBNF speck it inefficent to run a test till the end of file fpr on closed tokens. However is the parse tree is built as above it should take a single pass.

Premature eof

 * in case of ]]EOF the EOF is premature and the {{ should wither be treeted as a syntax error or as a literal. (the former makes more sense)

The End Of File Condition
A method is available for reacting to the end of file condition as if it were an event; e.g., you might want to pop the lexer state at the end of an include file. This method, CharScanner.uponEOF, is called from nextToken right before the scanner returns an EOF_TYPE token object to parser: This event is not generated during a syntactic predicate evaluation (i.e., when the parser is guessing) nor in the middle of the recognition of a lexical rule (that would be an IO exception). This event is generated only after the complete evaluation of the last token and upon the next request from the parser for a token.

You can throw exceptions from this method like "Heh, premature eof" or a retry stream exception. See the includeFile/P.g for an example usage.

Tests
Parser tests are in maintenece tests
 * empty syntax
 * [[ is parsed as [[
 * ]] is parsed as ]]
 * ad]]e is parsed as ad]]e
 * using real link and template