Parsoid/Round-trip testing/Diffs

From mediawiki.org

False positive reports (mostly fixed with a better rt-testing diffing strategy)[edit]

In many cases, it seems to be because the double-rt-diffing is comparing mismatched sections .. probably because of the diffs that the wt-diff algo returns. In some cases, it could be because of DSR inaccuracies.

Auto <references /> insertion (most false reports now fixed with a better rt-testing diffing strategy)[edit]

Lots of pages where <refererences /> is missing and has references section auto-generated has RT diffs when the <references /> tag is serialized. This is being classified (incorrectly) as a semantic diff.

Link in links[edit]

{{lang|..}} template in plwiki[edit]

Several pages on plwiki seem to be affected by the use of this in links like: [http://google.com Foo {{lang|en}}]

[subbu@earth lib] echo "[http://google.com foo {{lang|en}}]" | node parse --normalize --prefix plwiki --dump tplsrc
=================================
Szablon:Lang
---------------------------------
<span style="color:#009">([[język angielski|<span style="color:#005" title="Treść w języku angielskim (English)">ang.</span>]])</span>
---------------------------------
<p><a href="http://google.com">foo (</a><a href="Język_angielski" title="Język angielski"><span style="color:#005" title="Treść w języku angielskim (English)">ang.</span></a>)</p>

Should now be fixed after MatmaRex used a bot to fix 1000+ plwiki pages that had this broken wikitext.

Empty list items lost in RTing[edit]

Fostered content from tables[edit]

Fostering of lists from tables[edit]

Loss of duplicate transclusion params[edit]

Seems to show up in multiple pages in rt-testing

After the fixes to mimic newline suppression before categories, these are now properly recognized as syntactic diffs.

Paragraph-wrapping related false-positive semantic error reports (see https://phabricator.wikimedia.org/T89628) (now fixed with rt-diff fixes)[edit]

Lots of reports which should really be a syntactic diff

Block-tag generating transclusions with leading whitespace introduce conservative nowiki protection around whitespace during RTing[edit]

Weird partial {{! output in rt-ing

This turned out to be a bug in DSR computation. Patch now in gerrit.

Implicit <td> insertion[edit]

Nowiki-ing of bad transclusion[edit]

Bad tokenization of !! in <td> (https://phabricator.wikimedia.org/T91411)[edit]

Multi-line xml tag parsing[edit]

Other[edit]

http://localhost:8000/_rt/enwiki/Markus_Fagervall -- seems to be fixed

http://localhost:8000/_rt/kowiki/%ED%8C%A8%EB%B9%84%EC%BD%98 -- the following snippet demonstrates the issue

[subbu@earth tests] echo '<link rel="shortcut icon" href="<nowiki>http://www.example.com/myicon.ico</nowiki>" />' | node parse --wt2wt
<link rel="shortcut icon" href="&lt;nowiki&gt;http://www.example.com/myicon.ico&lt;/nowiki&gt;" />

Bad quoting (<ref name="foo'>..</ref>)[edit]

Bad rt-ing of chess table[edit]

Semantic errors now fixed -- these are all syntactic errors now.

Bad tr attribute (filed bug report)[edit]