The Linter extension identifies wikitext patterns that must or can be fixed in pages along with some guidance about what the issues are with those patterns and how to fix them.
The Special:LintErrors page groups the errors by type. Some of these issues may be easier to find with Special:Expandtemplates. On this page, we will classify lint issues according to the severity of the issue vis-a-vis goals that are blocked by those issues. More information and discussion about this is provided further below.
We will continue to improve functionality to eliminate noise, fix bugs, and make the linter output more actionable, but the current output is ready to use and act on.
Documentation of lint issues
Why and what to fix
Going forward, the parsing team plans to leverage the Linter extension to identify wikitext patterns:
- that are erroneous (ex: bogus image options – usually caused by typos or because media option parsing in MediaWiki is fragile).
- that are deprecated (ex: self-closing tags)
- that can break because of changes to the parsing pipeline (ex: replacing Tidy with RemexHTML)
- that are no longer valid in HTML5 (ex: obsolete tags like center, font)
- that are potentially broken and can be misinterpreted by the parser compared to what the editor intended them to be (ex: unclosed HTML tags, misnested HTML tags)
Not all of them need to be fixed promptly or even ever (depending on your tolerance for lint). Different goals are advanced by fixing different subsets of the above lint issues. We (the parsing team) will try to be transparent about these goals and will provide guidance about which goals are advanced by fixing which issues.
Simplified instructions are provided in the FAQ page.
Goal: Replacing Tidy
As part of addressing technical debt in the parsing pipeline of MediaWiki, we replaced Tidy with a HTML5-based tool. However, doing so would have broken the rendering of a small subset of pages unless certain wikitext patterns were fixed. Specifically, issues found in the
tidy-font-bug categories. In order to do a timely replacement of Tidy, we classified all these issues as high priority.
Right now, the HTML generated by the PHP parser is used for read views and the HTML generated by Parsoid is used by editing tools and the Android app among others. The parsing team, as one of its long-term objectives, wants to enable the use of Parsoid's output for both read views as well as for editing. Since Parsoid and RemexHTML are both HTML5-based tools, the lint categories that affect RemexHTML's rendering also affect Parsoid's rendering. We haven't yet identified any newer lint issues that affect Parsoid's rendering at this time, but will update this list as we identify any such.
Goal: HTML5 output compliance
This is a somewhat complex goal and we haven't yet arrived at an understanding about how important it is to pursue this goal or how far we should go with this. Additionally, it is not yet clear what mechanisms we wish to leverage towards this goal. For example, based on a bunch of discussions in different venues, User:Legoktm/HTML+MediaWiki outlines a proposal for handling the html5-deprecated big tag. In any case, fixing issues in the
self-closed-tag categories advance this goal. Given lack of clarity around this goal, we have accordingly marked the obsolete-tag category as a low-priority goal.
Goal: Clarifying editor intent
Getting markup right is hard.
Errors inadvertently creep through.
While the parser does its best in recovering from these errors, in many cases, what the parser does might not truly reflect the editor's original intent.
Given that, we recommend that it is best to fix the issues identified here to clarify the editor's intention.
Issues in the
missing-end-tag categories seem to affect this goal.
Since this is a fairly important goal, we have marked most of them with medium priority.
However, we have marked the missing-end-tag category with a low priority since in a vast majority of cases, the parser does seem to recover fairly accurately.
Nevertheless, we recommend fixing whatever can be fixed without too much effort, if only to assist comprehension by other human editors and tools.
Goal: Clean markup
Getting markup right is hard. Even in the presence of errors, the parser does a fairly decent job in most cases in figuring out accurately how that piece of markup is supposed to render. But, in much the same way that typos, punctuation and minor grammatical errors can feel unsettling, some editors or those with a developer-mindset might find lint issues in these categories unsettling. We don't recommend spending an inordinate amount of time fixing these issues and, in many scenarios, bots might be able to fix these up as well.
stripped-tag lint categories affect this goal.
When are lint errors for a page updated?
Currently, all lint categories are populated by errors identified by Parsoid while parsing a page. When a page (or, template transcluded on a page) is edited, ChangeProp requests a re-parse of that page from Parsoid, which will send the fresh results to the Linter extension.
This means that when a new category is introduced (or a correction is made to a previous category), it may take a while for all the results to be updated (if ever for pages that are rarely touched). Making a null edit would speed up the process individually. However, in phab:T161556, we're exploring ways to reprocess all pages.
Should pages in X namespace (e.g. talk) be fixed
- WPCleaner – a Java program that interfaces with Linter and can also detect some of the errors
- ja:User:MawaruNeko/ShowPageLintError.js – a user script that shows all lint errors on a page
- Bot by User:星耀晨曦 that can fix multiple-unclosed-formatting-tags errors.