User:OrenBochman/ParserNG/WikiTable

From MediaWiki.org
Jump to: navigation, search

An Antlr Spec for the WikiTable Markup

Contents

[edit] ANTLR spec

grammar wikiTable;
 
@header {
package p;
 
}
 
//@header {import org.antlr.test;} // not auto-copied to lexer
@lexer::header{
package p;
//import org.antlr.test;
//
}
 
@lexer::members {
//state check are deeply nested in a table are we?
int inTable=0;
List tokens = new ArrayList();
public void emit(Token token) {
        state.token = token;
        tokens.add(token);
}
public Token nextToken() {
        super.nextToken();
        if ( tokens.size()==0 ) {
            return Token.EOF_TOKEN;
        }
        return (Token)tokens.remove(0);
}
}
 
@members{
//int inTable=0;
//public void foo(){};
//int rows=0;
}
 
//Parser Rules
wikiTable       : TBL_START  xml_attributes?  caption? head?  rows TBL_END;
caption         : CAPTION_START HS xml_attributes? captionText=TEXT+;
fragment
head            : (hCell hCellInLine*)+;        
rows            : (firstRow|row) row*;
firstRow        : cells;
row             : rowStart xml_attributes? cells;
rowStart        : ROW_START;
 
cells           :((cell|hCell) (cellInline|hCellInLine)*)+;
cell            : CELL_START xml_attributes? text=TEXT*;
cellInline      : CELL_INLINE_STRT xml_attributes? text=TEXT*;
hCell           : HEAD_START xml_attributes? text=TEXT*;
hCellInLine     : HEAD_INLINE_STRT xml_attributes? text=TEXT*;
 
 
//this is the recursive definition alowing table nesting
//cells         :( {input.LT(0)==CELL_START||input.LT(0)==HEAD_START}?=>(HEAD_START | CELL_START) XHTML_ATTRIBUTES? (TEXT|wikiTable)+ (CELL_INLINE_STRT XHTML_ATTRIBUTES? (TEXT|wikiTable)+)* )+  ;
 
//this needs to be in the parser for LT(2) to mean the second parser token
xml_attributes: {input.LT(2).getText().equals("=")}? xml_attribute+ PIPE? ;
xml_attribute: name=TEXT EQ DQUOTE value=TEXT* DQUOTE ;
//Lexer Rules
TBL_START       : {getCharPositionInLine()==0}?=> '{|'{inTable++; }     ;
TBL_END         : {getCharPositionInLine()==0&&inTable>0}?=> '|}'{inTable--;}   ;
HEAD_START      : {getCharPositionInLine()==0&&inTable>0}?=> '!';
HEAD_INLINE_STRT: {inTable>0}?=> '!!';
 
CELL_START      : {getCharPositionInLine()==0&&inTable>0}?=> '|';       //this should only be recignised within a table
PIPE            : {getCharPositionInLine()>0||inTable==0}?=> '|';       //outside table or not at tart of line
 
CELL_INLINE_STRT: {inTable>0}?=> '||';                                  //this should only be recignised within a table
ROW_START       : {getCharPositionInLine()==0&&inTable>0}?=> '|-' ;
CAPTION_START   : {getCharPositionInLine()==0&&inTable>0}?=> '|+'       ;
 
 
TEXT            : ('a'..'z'|'A'..'Z'|'0'..'9'|'.'|'-'|';'|':'|',')+;                                    //simplified
 
DQUOTE          : '"';
//WS            :  (HS | VS)  ; //{ $channel = HIDDEN; } ;
HS              : ( ' ' | '\t'  )+ { $channel = HIDDEN; } ;
VS              : ( '\r' | '\n' )+ { $channel = HIDDEN; } ;
EQ              : '=';

[edit] Status

  • This is a lexer + a parser.
  • Tested against the examples in table.
  • A tree grammar or a string template could be used to transform into XHTM etc.
  • Does not support full unicode to simplify development - but the string could be changed with minimal impact.

[edit] Problems

The speck has a recognizer nondeterminism [1]

  1. Antlr is unabile to decide which path to take when meeting a HEAD_START symbol since it could belong to
  • In the optional header.
  • There is no optional header but the body starts with a header. (this is a mistake)
  1. This is a warning and option #2 is discarded . How could this nondeterminsm be removed ?
  1. adding a variable with a table wide scope
    boolean hasHead=TRUE;
    
  2. use it in a predicate on the optional header
    {hasHead}?;
    
  3. add an action after the optional header to flip it
    {hasHead=FALSE;}
    
  • Antlr complains that the first non-header cell might belong
  • In the (optional) first row, i.e. the one without a |- indicator.
  • In the optional other rows after.

[edit] Table in Table Test

You type You get
<!-- outer -->
{| border="1"
| Orange || Apple     || align="right" | 12,333.00
|-
| Bread  || Pie       || align="right" | 500.00
|-
| Butter || Ice cream || align="right" | 1.00
<!-- inner -->
{| border="1"
| Orange || Apple     || align="right" | 12,333.00
|-
| Bread  || Pie       || align="right" | 500.00
|-
| Butter || Ice cream || align="right" | 1.00
|}
|}
Orange Apple 12,333.00
Bread Pie 500.00
Butter Ice cream 1.00
Orange Apple 12,333.00
Bread Pie 500.00
Butter Ice cream 1.00

[edit] Refrences

  1. The Definitive ANTLR Reference: Building Domain-Specific Languages; Terence Parr; 2007; ISBN 0-9787392-5-6 p.127
Personal tools
Namespaces

Variants
Actions
Navigation
Support
Download
Development
Communication
Print/export
Toolbox