Parsoid/Debugging

Debugging tips
This section assumes you are in the tests/ directory.

Debugging the wt2html mode
node parse --help is a useful command to remember. Continue reading to find out more about a few of these options. Since Parsoid processes wikitext in a pipeline composed of synchronous and asynchronous phases, it is sometimes useful to know how to examine the contents of the pipeline at various stages.

1. If you want to debug the tokenizer, node parse --trace peg-tokens is useful. Each time the tokenizer emits a token array to the next stage in the pipeline, this option prints out the token array.

The end-of-output is signalled by the EOFTk. Also note that multiple tokenizers might be active at the same time because of concurrent template expansions. Future enhancement of this debugging output would assign debug ids to every tokenizer and use that id to distinguish output between tokenizers.

2. If you want to look at the fully expanded and in-order token-stream, node parse --trace tsp is your friend. This emits the tokens as seen by the TokenStreamPatcher handler which is the very first handler in the in-order third phase synchronous transformation passes. So, it is a good proxy for the in-order token stream of the top-level document.

3. If you want to look at the fully processed and transformed token stream (post all tranformations), node parse --trace html is a good proxy. The output is a little bit noisier than it needs to be. Refining it and making it more useful is left as an enhancement.

4. If you want to look at the DOM at different stages of transformation, --dump dom:post-builder, --dump dom:pre-dsr, --dump dom:pre-encap are useful DOM debug options which can be combined as --dump dom:pre-dsr,dom:pre-encap

5. Sometimes, it is useful to look at the preprocessed template source that Parsoid then tokenizes. --dump tplsrc is useful in those scenarios.

6. There are a bunch of other handler-specific tracing flags. "node parse --help" should tell you what they do. There are tracers for the PreHandler, ListHandler, and ParagraphWrapper. There is no tracer currently for the QuoteTransformer.

Debugging the html2wt mode
node parse --help has a few options to help debug the serializer (this converts HTML to Wikitext).

1. If you want to trace the actions of the regular serializer, --trace wts is what you want.

2. If you want to debug the wikitext escaping behavior of the serializer, --trace wt-escape is what you want.

Debugging the selective serializer (selser)
In order to test the selective serializer, you need (a) original wikitext (b) original html (c) modified html. Strictly speaking, (b) is not necessary since selser reparses (a) to generate (b) as necessary. However, in certain cases where you want to control testing conditions, it is useful to provide original HTML as well.

Running selser
Let us first look at ways to test the selective serializer on the command line.

There are entirely commandline options for running selser for very simple examples. Check node parse --help to find out more.

FIXME: --selser does not seem to automatically enable --html2wt. Both options are required.

Debugging DOMDiff
Selser first compares the old and new html and generates a diff-marked DOM. This is the DOMDiff class. There is a commandline script to test and debug this functionality in isolation.

You can look at (currently very) verbose output of DOM-diffing by turning on the --debug option.

Debugging selser
FIXME: --trace selser does not seem to automatically enable tracing of wts. Both options are required for output to be useful.