Topic on Extension talk:Linter

Suggestion: Suggest autofixes for simple lint errors

6 comments • 16:43, 6 April 2018 6 years ago

6

Summary by SSastry (WMF)

Autofixing is tricky, especially when wikis might have different policies around that. Bots are still the go-to solution for this at this point.

197.218.90.50 (talkcontribs)

Issue

It can be quite time consuming coming up with fixes to lint errors.

Proposed solution

Use the parser or cleanup tool to suggest fixes to content. This could be a within a triggered fromSpecial:Linterrors.

The simplest solution is to only suggest an autofix when:

There is simple markup (no templates, parser functions / tag extensions within the snippet);

Simple hacky example for lint error found here (https://www.mediawiki.org/w/index.php?title=User:SPage_(WMF)/VE_span_bug&oldid=1163248) :

var sText = '<p style="font-size: 1.2em; margin-top: 1.2em;">When Vagrant is done configuring your machine, browse to <span class=plainlinks>http://127.0.0.1:8080/</span> to find your MediaWiki instance. The admin password is <code>vagrant<code>.</span>';
    disablelimitreport: '1',
    $.post(window.location.origin + '/w/api.php?action=parse', {
        format: 'json',
        text: sText,
        disableeditsection: '1',
        disablelimitreport: '1',
        contentmodel: 'wikitext',
        formatversion: 2,
        wrapoutputclass: ''
    
    }).then(function(data) {
        if (data && data.parse && data.parse.templates.length === 0) {
            $.post(window.location.origin + '/api/rest_v1/transform/html/to/wikitext', {
                html: data.parse.text
            }).then(function(data) {
                console.log("Suggested fix:\n\n" + data);
            });
        }
    });

Output:

When Vagrant is done configuring your machine, browse to <span class="plainlinks">http://127.0.0.1:8080/</span> to find your MediaWiki instance. The admin password is <code>vagrant</code>.

Currently there doesn't seem to be a way to check if the content contains parser functions or tag extensions. It might be possible to strip manually strip these before testing. Or potentially make that into its own feature request for the API.

It might also be interesting to isolate this cleanup tool (tidy, etc) and expose it as a separate API, assuming it doesn't already exist somewhere.

11:13, 26 October 2017 6 years ago

SSastry (WMF) (talkcontribs)

Interesting idea. Have to ponder that for a bit -- in this form, this will only be usable on a very small set of pages, but wonder if we can build a variant that will work in the presence of templates since pretty much every page will have templates / extensions.

04:14, 27 October 2017 6 years ago

SSastry (WMF) (talkcontribs)

Extracting "atomic" snippets reliably from a wikitext page is hard -- consider the simplest case where the entire page is wrapped in an unclosed <div> tag -- but there is something here worth experimenting with.

04:19, 27 October 2017 6 years ago

197.218.88.106 (talkcontribs)

I'd say start small and build up from there, rather than get stuck in analysis paralysis.

Anyway, considering that parsoid seems to keep track of template parameters, it can probably detect if a particular argument contains a lint error, and if so suggest a localized fix. Of course wikitext in general is very messy and complicated, changing a single token in a template can completely alter the whole page and trigger even more lint errors. That could be detected using a basic diff tool, e.g. if the proposed automated change significantly alters the page rendering, then it can either not suggest the fix, or provide a serious warning.

- consider the simplest case where the entire page is wrapped in an unclosed <div> tag

Well, for the simple case the tool still works. For better or worse the linter highlights the whole page in some of those cases, so it will close the snippet at the end (of the page). It is also prudent to set a limit to the snippet size (in bytes) to reduce destructive "fixes".

It might be worth making the linter smarter by providing the offset of the closest html parent element. For a table cell, that might be the whole table, or the top most table in case it is a nested table.

Generally, the exact solution depends on the exact problem:

Invalid self-closed - simply to strip it, e.g. <span/>. That's what the parser does anyway and that's what the code above will do.
Table tag that should be deleted - same as above, do what the parser does (but it needs the topmost parent of the nested table
Bogus file option - the simplest fix here is to just comment out (e.g.[[File:boo.svg|1|2|3]] ->[[File:boo.svg|3]]) the bogus option, and leave a note. In many cases these are typos.
Misnested- pure wikitext / html cases (stuff like <b>\n * list \n *list </b> can easily be solved by the code above. It might be better to give up when there are templates involved
Stripped tags - code above fixes that too by removing them from the markup. Although that might not always be the right fix.
Some of them can be fixed with a reversion, as described here (https://www.mediawiki.org/w/index.php?title=Topic:U0snrs4zticv0uxs). s that.

----

In many cases there is no need to reinvent the wheel, editors have created tools to identify and fix some errors automatically with high degrees of accuracy.

11:39, 27 October 2017 6 years ago

197.218.88.106 (talkcontribs)

Extracting "atomic" snippets reliably from a wikitext page is hard -- consider the simplest case where the entire page is wrapped in an unclosed <div> tag -- but there is something here worth experimenting with.

Personally, it to seems to me that the parser really facilitates this kind of "misbehaviour". A better approach would be for it to transform the tag into a simple <div/> on output. That would get editors to fix it without any linter, and if they don't they don't cause problems for anyone else who doesn't understand html but still edits the page.

11:45, 27 October 2017 6 years ago

PerfektesChaos (talkcontribs)

I am working on automatic syntax fixes by WikiSyntaxTextMod for 8 years now.

I strongly discourage such an approach as automatic means triggered by MediaWiki on all Wikis for all users.

Dealing with the strange things that happen here needs editors who know the Linter methodology and have very good skills in wikisyntax. Less than 1 % of the editors in German Wikipedia could deal with that suggestions. Newbies get entirely smashed.

Error causes are ambiguous. It is necessary to understand the intention of the author first before appropriate remedy can be chosen. It might have been the idea to put some words <small><small>minor into an empty element, or creating an anchor by <span id="here" /> and making a mess of it. An auto-correction will silently remove that and leaves no trace and no hint.

And popping up somewhere a <div/> on output is a very bad idea. That will be presented to all readers on all such pages, while it worked for a decade with that error andlooke pretty. The only problem an unclosed div causes that a validator complains, but it does not affect readability of the page.

Leaving short comments in the syntax would be understood by a few people only. Causing more confusion in a wide majority of editors rather than helping to solve problems. Leave such things to syntax experts.

11:49, 31 October 2017 6 years ago