Topic on Help talk:Extension:Linter

Do we need one more high priority Linter category?

15
Summary by SSastry (WMF)

We have the unclosed-quotes-in-headings linter category now.

IKhitron (talkcontribs)

Hello. I saw now something bad. The linter categories the page included in talk about other things. These categories did not change after the fix. I am terrified that we'll need one more high priority category. Now after 170 wikis left Tidy. Please tell me I'm stupid and it's nothing. The problem is unclosed italic/bold wiki markup in headers. Thousands of lines become unreadable. See my last edit in w:he:ויקיפדיה:דלפק ייעוץ/ארכיון118. Thank you.

SSastry (WMF) (talkcontribs)

Yes, it is a problem that is not caught by anything right now. I am not convinced this is a very common error. But, since the breakage is so evident, the fix should be easy even after RemexHTML is deployed. In any case, we'll check if we need to introduce a new category.

Thanks for the report! :)

IKhitron (talkcontribs)

No!!!!!!!!

Thank you, SSastry (WMF). You are welcome. Could you please cc me on phab, if it will be there? I'd like to know, what's happening with it.

SSastry (WMF) (talkcontribs)

Will do.

We identified the high-priority linter categories by running visual diff tests on a random sample of 50K-70K pages from lots of wikis. We had to introduce the tidy-font-bug category based on enwiki reports. That is most commonly seen on non-article pages (talk pages primarily) and that didn't show up in our visual diff tests since those tests didn't sample from non-article pages. Anyway, that is the reason why I think it is unlikely to be seen on too many pages.

http://mw-expt-tests.wmflabs.org/topfails/2 is the latest set of results. If you click on the remote link there, it will show you screenshots for tidy and for remex to compare. You don't need to do anything with this -- but that is just an FYI.

MarcoSwart (talkcontribs)

I agree the fix is probably easy and the error might not be very frequent. However: Dutch Wiktionary has over 600.000 pages. If there is no Linter category, how are we to find out which pages we need to fix?

IKhitron (talkcontribs)

By the way, could you check, please, if the problem is also in other html tags? For example, div ... '' ... /div, and others? If it looks fine in Tidy, and wrong in HTML5, the definition should be expanded.

SSastry (WMF) (talkcontribs)

Indeed. I am doing all kinds of tests right now to see what kind of weird things Tidy does ... I cannot wait to see Tidy gone. :)

IKhitron (talkcontribs)

Indeed you can do, or indeed it happens in more tags?

SSastry (WMF) (talkcontribs)

Indeed, I am checking for other tags .. so far as I can tell this only affects '' and ''' markup .. not HTML tags because of interactions with the parser and tidy/remex and the table of contents. Will continue investigating later. Time for a break. :)

IKhitron (talkcontribs)

No, SSastry (WMF), I'm talking about the same markup, but in different tags, for example div instead h2.

SSastry (WMF) (talkcontribs)

I am testing those combinations as well, yes.

SSastry (WMF) (talkcontribs)

So, based on all my testing, this will only be a problem when there is an unclosed '' or ''' in headings. Since the PHP parser closes the '' and ''' at the end of the line, Tidy will simply fix the misnesting and the effect is limited to the heading. However, in the case of Remex, the unclosed i/b tag leaks over to the TOC and affects the entire page.

For regular tags, Tidy and Remex are similarly affected (with some minor caveats which can be ignored).

IKhitron (talkcontribs)

Thank Cat.

SSastry (WMF) (talkcontribs)
SSastry (WMF) (talkcontribs)

The Parsoid and Linter patches have been merged and will be deployed next week. So, by Thursday, these pages should be live on wikis. The corresponding help pages are now linked from Help:Extension:Linter.