User:OrenBochman/ParserNG/WikiTable
From MediaWiki.org
< User:OrenBochman | ParserNG
An Antlr Spec for the WikiTable Markup
Contents |
[edit] ANTLR spec
grammar wikiTable; @header { package p; } //@header {import org.antlr.test;} // not auto-copied to lexer @lexer::header{ package p; //import org.antlr.test; // } @lexer::members { //state check are deeply nested in a table are we? int inTable=0; List tokens = new ArrayList(); public void emit(Token token) { state.token = token; tokens.add(token); } public Token nextToken() { super.nextToken(); if ( tokens.size()==0 ) { return Token.EOF_TOKEN; } return (Token)tokens.remove(0); } } @members{ //int inTable=0; //public void foo(){}; //int rows=0; } //Parser Rules wikiTable : TBL_START xml_attributes? caption? head? rows TBL_END; caption : CAPTION_START HS xml_attributes? captionText=TEXT+; fragment head : (hCell hCellInLine*)+; rows : (firstRow|row) row*; firstRow : cells; row : rowStart xml_attributes? cells; rowStart : ROW_START; cells :((cell|hCell) (cellInline|hCellInLine)*)+; cell : CELL_START xml_attributes? text=TEXT*; cellInline : CELL_INLINE_STRT xml_attributes? text=TEXT*; hCell : HEAD_START xml_attributes? text=TEXT*; hCellInLine : HEAD_INLINE_STRT xml_attributes? text=TEXT*; //this is the recursive definition alowing table nesting //cells :( {input.LT(0)==CELL_START||input.LT(0)==HEAD_START}?=>(HEAD_START | CELL_START) XHTML_ATTRIBUTES? (TEXT|wikiTable)+ (CELL_INLINE_STRT XHTML_ATTRIBUTES? (TEXT|wikiTable)+)* )+ ; //this needs to be in the parser for LT(2) to mean the second parser token xml_attributes: {input.LT(2).getText().equals("=")}? xml_attribute+ PIPE? ; xml_attribute: name=TEXT EQ DQUOTE value=TEXT* DQUOTE ; //Lexer Rules TBL_START : {getCharPositionInLine()==0}?=> '{|'{inTable++; } ; TBL_END : {getCharPositionInLine()==0&&inTable>0}?=> '|}'{inTable--;} ; HEAD_START : {getCharPositionInLine()==0&&inTable>0}?=> '!'; HEAD_INLINE_STRT: {inTable>0}?=> '!!'; CELL_START : {getCharPositionInLine()==0&&inTable>0}?=> '|'; //this should only be recignised within a table PIPE : {getCharPositionInLine()>0||inTable==0}?=> '|'; //outside table or not at tart of line CELL_INLINE_STRT: {inTable>0}?=> '||'; //this should only be recignised within a table ROW_START : {getCharPositionInLine()==0&&inTable>0}?=> '|-' ; CAPTION_START : {getCharPositionInLine()==0&&inTable>0}?=> '|+' ; TEXT : ('a'..'z'|'A'..'Z'|'0'..'9'|'.'|'-'|';'|':'|',')+; //simplified DQUOTE : '"'; //WS : (HS | VS) ; //{ $channel = HIDDEN; } ; HS : ( ' ' | '\t' )+ { $channel = HIDDEN; } ; VS : ( '\r' | '\n' )+ { $channel = HIDDEN; } ; EQ : '=';
[edit] Status
- This is a lexer + a parser.
- Tested against the examples in table.
- A tree grammar or a string template could be used to transform into XHTM etc.
- Does not support full unicode to simplify development - but the string could be changed with minimal impact.
[edit] Problems
The speck has a recognizer nondeterminism [1]
- Antlr is unabile to decide which path to take when meeting a HEAD_START symbol since it could belong to
-
- In the optional header.
- There is no optional header but the body starts with a header. (this is a mistake)
- This is a warning and option #2 is discarded . How could this nondeterminsm be removed ?
-
- adding a variable with a table wide scope
boolean hasHead=TRUE;
- use it in a predicate on the optional header
{hasHead}?;
- add an action after the optional header to flip it
{hasHead=FALSE;}
- adding a variable with a table wide scope
- Antlr complains that the first non-header cell might belong
-
- In the (optional) first row, i.e. the one without a |- indicator.
- In the optional other rows after.
[edit] Table in Table Test
| You type | You get | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<!-- outer -->
{| border="1"
| Orange || Apple || align="right" | 12,333.00
|-
| Bread || Pie || align="right" | 500.00
|-
| Butter || Ice cream || align="right" | 1.00
<!-- inner -->
{| border="1"
| Orange || Apple || align="right" | 12,333.00
|-
| Bread || Pie || align="right" | 500.00
|-
| Butter || Ice cream || align="right" | 1.00
|}
|}
|
|
[edit] Refrences
- ↑ The Definitive ANTLR Reference: Building Domain-Specific Languages; Terence Parr; 2007; ISBN 0-9787392-5-6 p.127