Parsoid/Roundtrip testpages

= Kinds of RT problems =


 * Wikitext escaping diffs: Lots of pages have wikitext escaping of text in brackets ( [..] ). This piece of wikitext escaping code needs fixing.
 * Whitespace/quote diffs for ref and other extension tags: Lots of pages have whitespace/quote diffs around extension tags ( vs,  vs )
 * Template diffs: Missing template output / duplicate template output. Most of these are probably because of incorrect DSR (DOM wikitext source range) information.
 * Unbalanced quote diffs: Some wikipages dont close quotes and Parsoid adds the missing balancing quotes which shows up as RT diffs.
 * Table wikitext diffs: Missing "-" or Missing "|" chars or some such -- maybe a serializer or html/missing-rt-data issue -- to be investigaged.

Zero diffs

 * Medha_Patkar
 * John_McCain
 * Political_science
 * Hindu_reform_movements
 * PHP
 * Middle_Way
 * Substitution Cipher

Extension (ref, source, code, etc.) whitespace and quote diffs

 * No-self
 * Mahatma_Gandhi

Crashes

 * Advaita_Vedanta

When I rescue errors and continue, the real problem emerges -- the parsed output for this page is mangled up. The output from templates are interlaced rather than being serialized correctly. This messes up the template encapsulation code -- something is up with the async pipeline that needs fixing.

This page now parses properly with a few different fixes. There are still a few semantic round-trip failures though which are due to unbalanced quotes in tpl-args. -- this fix would require treating tpl-arg positions to be a self-contained context for the purpose of quote-balancing rather than extending it to end-of-line.


 * Help:Templates parser crashes in template encapsulation code.

Few major template diffs

 * Buddha

Several template diffs

 * Sanskrit -- some code committed in the recent past leads to this page being "stuck" parsing. Yet to investigate.
 * Anna_Hazare -- " " is the reduced test case that causes most of the RT problems on this page because of nested pre-tags showing up in the token stream. Yet to fix.
 * Adi_Shankara -- some newline/whitespace issues in one rt-ed cite template use. Yet to investigate.
 * Theravada -- now mostly whitespace, nowiki, and mismatched quote diffs.
 * Dependent_origination
 * Yoga
 * Vijayanagar Empire -- excess RT output in two cases, lost RT output in one.

Lots of Whitespace/quote diffs

 * Pāli
 * Sanskrit
 * Node.js
 * Paul Ryan
 * Bellary -- cite template uses are losing newlines in RT-ing

Unbalanced quote diffs

 * Theravada

Table wikitext diffs

 * Dont_Ask, Dont_Tell Repeal Act of 2010 -- templates used to set td attributes are not RT-ing.
 * Nagarjuna scroll down to find the table char diffs.
 * Enigma Machine

Other RT diffs

 * JRuby