Topic on Talk:Parsing/Replacing Tidy

Dipsacus fullonum (talkcontribs)

Hi,

At dawiki we found that some pages have changed appearance because Tidy appearantly used to exhange div and span HTML elements so div wasn't placed inside span. "<span><div>Text here</div></span>" was changed to "<div><span>Text here</span></div>". Would it be possible for Special:LintErrors to find all such cases?

SSastry (WMF) (talkcontribs)

For God's sake, Tidy!! :-(

Yes, we can find them. But, it will take a few days to get it deployed. Do you have a sense of how many pages are affected? Is it from a template? If it is coming from a template, perhaps you can fix those right away and see what happens?

197.218.81.173 (talkcontribs)

It does find it in the Special:LintErrors/html5-misnesting. Although it seems that it ignores certain cases like the one you noted. Perhaps it was ignored because it didn't affect the how the final page looked. Other cases like the one below will be detected properly.

<span>bb<div>Text here</div></span>
SSastry (WMF) (talkcontribs)

Yes, Parsoid doesn't modify <span><div>foo</div></span> because paragraph tags aren't added around it. So, it doesn't trigger a html5-misnesting error.

But, <span>x<div>foo</div></span> has p-tags added because of the text node in the span tag. That is then broken up by the HTML5 parser and triggers a html5-misnesting error.

197.218.81.173 (talkcontribs)
SSastry (WMF) (talkcontribs)
IKhitron (talkcontribs)

Dipsacus fullonum (talkcontribs)

Hi, all known occurrences at dawiki was from one template which contained '<span style="style 1">{{{Text}}}</span>'.

That template was then used in other templates like '{{foo|Text=<div style="style 2">bar</div>}}

So when the span and div tags were exchanged by Tidy, it changed the order of the style attributes, and thus the appearance. I guess it may have affected around 30,000 pages at dawiki, mostly user talk pages.

The template with the span tag is already changed. But there may be unknown cases and similar cases at other wikis.

197.218.91.75 (talkcontribs)

This seems to occur in several cases outside divs or spans. It might be best to develop a general solution, but whitelist such reports in tags only when someone reports any occurences. That way it becomes a simpler case of a configuration change rather than coding.

See :

If one adds something like "style="background-color:green" to the parent tag it does show rendering differences. It might be good to over https://phabricator.wikimedia.org/tag/tidy/and close all tasks that became obsolete or decline as tidy is disabled.

The labels of categories in the linter could probably be changed too, eventually the concept of tidy will become irrelevant. Maybe it should instead focus on invalid html vs wrong / undesireable parser output.

SSastry (WMF) (talkcontribs)

This linter category is now live, but it probably has a number of false positives since rendering is affected in only in a small number of cases. We'll take a look at that. It shows up in the Miscellaneous-Tidy-Replacement-Issues category.