PEG tokenizer

From MediaWiki.org
Jump to navigation Jump to search

<< Parsoid/Internals

The PEG tokenizer is a component of the Parsoid. It is a PEG-based wiki tokenizer which produces a combined token stream from wiki and html syntax.The PEG grammar is a context-free grammar that can be ported to different parser generators, mostly by adapting the parser actions to the target language. Currently we use pegjs to build the actual JavaScript tokenizer for us. We try to do as much work as possible in the grammar-based tokenizer, so that the emitted tokens are already mostly syntax-independent.

Source code: https://phabricator.wikimedia.org/diffusion/GPAR/browse/master/lib/wt2html/pegTokenizer.pegjs.txt