Extension talk:Linter

Jump to navigation Jump to search

About this board

He7d3r (talkcontribs)

Why is this considered a "Fostered content" error?

{| class="wikitable"
{{{{{subst|}}}!}}<div>a</div>
|}
This post was hidden by Bdijkstra (history)
Arlolra (talkcontribs)

It's sort of a bug in Parsoid, it's struggling with the templated template name.

The php parser sees this as {{{subst|}}}, a top-level template argument, whose default value is the empty string. So, gets expanded to {{!}}<div>a</div>, which gives the desired wikitext table syntax.

Substitution (subst) is supposed to occur when saving a page, and shouldn't show up in wikitext.

Arlolra (talkcontribs)
SSastry (WMF) (talkcontribs)

I've responded there, but why is the subst showing up in saved source? Or is the complaint that Parsoid isn't handling PST before saving?

Arlolra (talkcontribs)

The author used the wrong syntax, three { instead of two.

SSastry (WMF) (talkcontribs)

But, {{{subst}}} substs .. i just tested ...

SSastry (WMF) (talkcontribs)

oh .. never mind ... i see what you are saying about top-level argument.

Reply to "Fostered content"
Prh47bridge (talkcontribs)

I've installed the extension and modified the config.yaml for Parsoid as per the instructions. VisualEditor is working fine so I know Parsoid and Restbase are both working. However, I am not getting any lint errors at all. This seems unlikely. I've tried deliberately putting an error on a page but it isn't picked up. Any ideas? Is there anything I can do to test that everything is working correctly?

Arlolra (talkcontribs)
Prh47bridge (talkcontribs)

Thanks for your advice. I think I've solved it. It seems that the version of Parsoid I'm using is expecting the config variable linterSendAPI rather than the current variable. So lint errors were being detected but were going straight to the log, not to the API. However, when I put "linterSendAPI: true" in the config Parsoid won't work at all! Very frustrating! Would I be correct in thinking that this means I'm using an older version of Parsoid and need to update?

Prh47bridge (talkcontribs)

I've now established that, for some reason, my server was no longer picking up Parsoid updates from the repository. Fixed that and will be upgrading from the currently installed version (0.7.1) to the latest in the repository. Hopefully that will fix this issue.

Arlolra (talkcontribs)
Prh47bridge (talkcontribs)

Thanks for the pointer. I have now upgraded to 0.8.0. However, it still isn't working. I'm getting the following in the Parsoid log on startup:

WARNING: For config property linter, required a value of type: object

Found null; Resetting it to: {"sendAPI":false,"apiSampling":1,"tidyWhitespaceBugMaxLength":100}

A diff comparing my config.yaml with the distribution version comes up with the following (ignoring domain name stuff);

83,87c83,86

<         linting: true

<         # Send lint errors to MW API instead of to the log

<         linter:

<               sendAPI: true

<               apiSampling: 10

---

>         #linting: true

>         #linter:

>         #  sendAPI: false # Send lint errors to MW API instead of to the log

>         #  apiSampling: 10 # Sampling rate (1 / 10)

97,99d95

<

<       # Enable the batch API

<       useBatchAPI: true

Any ideas? I'm assuming there is something I need to change in config.yaml to make this work?

Arlolra (talkcontribs)

Maybe a whitespace issue? Do you have any tabs hanging around?

Prh47bridge (talkcontribs)

Yes, it was a whitespace issue. I had tabs rather than spaces on the sendAPI and apiSampling lines of config.yaml. I am no longer getting lint errors in the Parsoid log. I can't see anything relevant in the API debug log but I suspect that is because the page I've been using no longer has any lint errors! I'll have to try and provoke one to make sure it is working.

Reply to "No lint errors"
星耀晨曦 (talkcontribs)

See this query. The error location is wrong, because 278-341 are

<code>」字號以利排版,要複製再利用文獻時,不用複製「<code>*<code>」字號。

the error text should be are

<code>*<code>」字號以利排版,要複製再利用文獻時,不用複製「*</code>」字號。

This location caused my bot to make a wrong repair.

Reply to "Wrong location"
MarcoSwart (talkcontribs)

Our special page suggests a missing end-tag on this page we use to access an archive by year. The page is mainly a transclusion of this single archive page, that contains no errors. Using lintHint 0 errors are reported for this original page and 4 for the transcluding page! As far as I can see, 3 error messages are generated by the "{{qtu}}" in section 66.1 and 1 by section 79, a WMF message. As there seems to be no problem rendering the page as intended, might this be a bug in Linter?

Reply to "False errors on transcluded page"
星耀晨曦 (talkcontribs)

Assuming there are multiple errors on the same page, will the location of other errors be updated after an error is fixed?

SSastry (WMF) (talkcontribs)

It will get updated once the page is reparsed.

星耀晨曦 (talkcontribs)

Will only change the location, other properties (e.g. error id) will not change?

SSastry (WMF) (talkcontribs)

Error id also changes.

星耀晨曦 (talkcontribs)

It's bad news for me. If I query out 1, 2, 3, 4 error id before fix, assuming these errors are on the same page. I fixed No.1 error, after the error fix will wrong, because the location of the errors is based on the previous context (before fixed N0.1). Do you have any good ideas?

Now I think to fix only the first error and give up fixing the errors behind. If the error id is not changes..I can append a API query to update location..

SSastry (WMF) (talkcontribs)

Fix it in the reverse order of offsets .. i.e. fix the error with the largest offset first .. that way, offsets for errors earlier in the page won't change.

Reply to "Update location of the lint error"

"Missing end tag" results in numerous extra <code> tags (and rendering issues)

6
197.218.91.45 (talkcontribs)

Issue

Paste the following in a wiki page:

<code>A piece of computer code
asdasdasd
<h2> buuu</h2>
zsadsdasd

Expected:

Output html should be similar to above.

Actual:

<p><code>A piece of computer code
asdasdasd
</code></p><code><h2><span class="mw-headline" id="buuu">buuu</span></h2></code><code><p>zsadsdasd
</p></code>

See it in action on this page (https://www.mediawiki.org/w/index.php?title=Project:Sandbox&oldid=2583108)

Possible solutions:

  • Increase priority of this lint error - This seems to be creating huge rendering differences, and seems bad enough to either increase the priority of the "misnested" tag
  • Set it as a separate high priority category
  • Or decide whether this is actually a bug in the parser - and fix that

Notes: I've been seeing this for quite some time in some help pages (e.g. https://www.mediawiki.org/w/index.php?title=Help_talk:Searching&oldid=2475830 scroll down), and couldn't understand where it came from until I tried it in a different wiki, and it became obvious.

SSastry (WMF) (talkcontribs)

Note that high priority rendering categories are about whether rendering will change when Tidy is replaced with a RemexHTML based tool. It is not about whether the HTML output for a piece of wikitext matches expectations.

In this case, the actual HTML structure is slight different for Tidy, RemexHTML and Parsoid. However, the rendering in the browser is identical.

So, this doesn't belong in the high priority category. Right now, we don't want to confuse editors and add more work to their plate beyond what is required to switch from Tidy to RemexHTML. Once Tidy is replaced, we could change the priority of linter categories based on new desired goals.

197.218.90.179 (talkcontribs)
SSastry (WMF) (talkcontribs)

Ah, sorry, you are right. I looked at output in my browser outside the mediawiki CSS settings where the differences don't show up. This doesn't happen with the small tag (except for the vertical whitespace diff). Parsoid's analysis for linter errors treats all formatting tags identically, and both code and small are formatting tags. But, looks like CSS for the code tag changes how the heading and paragraph render.

We could add a special case for the code tag, but I am inclined to let this be for just now unless there are a lot of code-tag-related issues in practice.

197.218.90.179 (talkcontribs)

Hmm, you're right. Those come from the wiki-specific CSS styles. Still, the way the way those code tags are nested results in invalid html, see :

https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.mediawiki.org%2Fw%2Findex.php%3Ftitle%3DProject%3ASandbox%26oldid%3D2583108

That might be a completely separate problem, maybe https://phabricator.wikimedia.org/T134469. I suppose that code tags aren't used that much in articles, and because the linter detects this it might be reasonable to let editors take their time fixing it.

SSastry (WMF) (talkcontribs)

Your IP address has not been whitelisted to report lint errors

2
Niagarafalls (talkcontribs)

I have installed mediawiki with Linter, VisualEditor and Parsoid extensions.

I get "Your IP address has not been whitelisted to report lint errors" error when I want to edit a page using VisualEditor.

Should I white list my IP anywhere? What is this IP, the visitor IP, server public IP or localhost?

SSastry (WMF) (talkcontribs)

What does "Output not from a single template" mean?

6
星耀晨曦 (talkcontribs)

What does this mean?

NicoV (talkcontribs)

I think that Linter wasn't able to pinpoint which template is causing the problem, but only a selection of several templates

Bdijkstra (talkcontribs)

Indeed. Note that the problem might be caused by the template itself, or by the template call.

星耀晨曦 (talkcontribs)

Look this error. The tagged field contains 21 templates, then I use this URL to query if these templates have linter errors and eventually I don't get anything.

P.S. I noticed Parsoid doesn't recognize the error in Lua modules.

Bdijkstra (talkcontribs)

If a template doesn't have linter errors, that can mean that the linter hasn't seen it yet, or that the errors aren't transcluded in the template itself. In any case you can use Special:ExpandTemplates to look at the generated wikicode. In the first few lines you can already see multiple unclosed font-tags obviously caused by Template:User rainbow.

星耀晨曦 (talkcontribs)

For bot, the difficulty of repairs has increased. :)

Reply to "What does "Output not from a single template" mean?"

Issue: Linter is highlighting the wrong string

1
Summary by SSastry (WMF)

Might have been a transient bug issue from earlier days of linting. If this is seen again, worth a phab ticket.

197.218.81.110 (talkcontribs)

Steps to reproduce:

  1. Go to https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Barack_Obama&action=edit&lintid=56
  2. If the highlighted text is not currently being show, scroll down until you find it
  3. Look at the highlighted text

Expected

[[File:20090124 WeeklyAddress.ogv|right|thumbtime=1:10|thumb|Obama presents his first [[commons:Obama Administration weekly video addresses|weekly address]] as President of the United States on January 24, 2009, discussing the [[American Recovery and Reinvestment Act of 2009]]|alt=Photograph]]

Actual

in the general election since the system was created in 1976.<ref name="Bloomberg_Salant_20080619">{{cite news |author = Salant, Jonathan D. |title = Obama Won't Accept Public Money in Election Campaign |url = https://article.wn.com/view/2008/06/19/Obama_Wont_Accept_Public_Money_in_Election_C

Lint error:

Page title Bogus file option Through a template?
Barack Obama (edit | history) thumbtime=1:3

Note: This is potentially a temporary error. If someone edits the page it will probably disappear, because the problem is gone and this is a report on the cached document.

Suggestion: Suggest autofixes for simple lint errors

6
Summary by SSastry (WMF)

Autofixing is tricky, especially when wikis might have different policies around that. Bots are still the go-to solution for this at this point.

197.218.90.50 (talkcontribs)

Issue

It can be quite time consuming coming up with fixes to lint errors.

Proposed solution

Use the parser or cleanup tool to suggest fixes to content. This could be a within a triggered fromSpecial:Linterrors.

The simplest solution is to only suggest an autofix when:

There is simple markup (no templates, parser functions / tag extensions within the snippet);

Simple hacky example for lint error found here (https://www.mediawiki.org/w/index.php?title=User:SPage_(WMF)/VE_span_bug&oldid=1163248) :

var sText = '<p style="font-size: 1.2em; margin-top: 1.2em;">When Vagrant is done configuring your machine, browse to <span class=plainlinks>http://127.0.0.1:8080/</span> to find your MediaWiki instance. The admin password is <code>vagrant<code>.</span>';
    disablelimitreport: '1',
    $.post(window.location.origin + '/w/api.php?action=parse', {
        format: 'json',
        text: sText,
        disableeditsection: '1',
        disablelimitreport: '1',
        contentmodel: 'wikitext',
        formatversion: 2,
        wrapoutputclass: ''
    
    }).then(function(data) {
        if (data && data.parse && data.parse.templates.length === 0) {
            $.post(window.location.origin + '/api/rest_v1/transform/html/to/wikitext', {
                html: data.parse.text
            }).then(function(data) {
                console.log("Suggested fix:\n\n" + data);
            });
        }
    });

Output:

When Vagrant is done configuring your machine, browse to <span class="plainlinks">http://127.0.0.1:8080/</span> to find your MediaWiki instance. The admin password is <code>vagrant</code>.

Currently there doesn't seem to be a way to check if the content contains parser functions or tag extensions. It might be possible to strip manually strip these before testing. Or potentially make that into its own feature request for the API.

It might also be interesting to isolate this cleanup tool (tidy, etc) and expose it as a separate API, assuming it doesn't already exist somewhere.

SSastry (WMF) (talkcontribs)

Interesting idea. Have to ponder that for a bit -- in this form, this will only be usable on a very small set of pages, but wonder if we can build a variant that will work in the presence of templates since pretty much every page will have templates / extensions.

SSastry (WMF) (talkcontribs)

Extracting "atomic" snippets reliably from a wikitext page is hard -- consider the simplest case where the entire page is wrapped in an unclosed <div> tag -- but there is something here worth experimenting with.

197.218.88.106 (talkcontribs)

I'd say start small and build up from there, rather than get stuck in analysis paralysis.

Anyway, considering that parsoid seems to keep track of template parameters, it can probably detect if a particular argument contains a lint error, and if so suggest a localized fix. Of course wikitext in general is very messy and complicated, changing a single token in a template can completely alter the whole page and trigger even more lint errors. That could be detected using a basic diff tool, e.g. if the proposed automated change significantly alters the page rendering, then it can either not suggest the fix, or provide a serious warning.

- consider the simplest case where the entire page is wrapped in an unclosed <div> tag

Well, for the simple case the tool still works. For better or worse the linter highlights the whole page in some of those cases, so it will close the snippet at the end (of the page). It is also prudent to set a limit to the snippet size (in bytes) to reduce destructive "fixes".

It might be worth making the linter smarter by providing the offset of the closest html parent element. For a table cell, that might be the whole table, or the top most table in case it is a nested table.

Generally, the exact solution depends on the exact problem:

  • Invalid self-closed - simply to strip it, e.g. <span/>. That's what the parser does anyway and that's what the code above will do.
  • Table tag that should be deleted - same as above, do what the parser does (but it needs the topmost parent of the nested table
  • Bogus file option - the simplest fix here is to just comment out (e.g.[[File:boo.svg|1|2|3]] ->[[File:boo.svg|<!-- Parameter error here 1|2|-->3]]) the bogus option, and leave a note. In many cases these are typos.
  • Misnested- pure wikitext / html cases (stuff like <b>\n * list \n *list </b> can easily be solved by the code above. It might be better to give up when there are templates involved
  • Stripped tags - code above fixes that too by removing them from the markup. Although that might not always be the right fix.
  • Some of them can be fixed with a reversion, as described here (https://www.mediawiki.org/w/index.php?title=Topic:U0snrs4zticv0uxs). s that.

----

In many cases there is no need to reinvent the wheel, editors have created tools to identify and fix some errors automatically with high degrees of accuracy.

197.218.88.106 (talkcontribs)
Extracting "atomic" snippets reliably from a wikitext page is hard -- consider the simplest case where the entire page is wrapped in an unclosed <div> tag -- but there is something here worth experimenting with.

Personally, it to seems to me that the parser really facilitates this kind of "misbehaviour". A better approach would be for it to transform the tag into a simple &lt;div/> on output. That would get editors to fix it without any linter, and if they don't they don't cause problems for anyone else who doesn't understand html but still edits the page.

PerfektesChaos (talkcontribs)

I am working on automatic syntax fixes by WikiSyntaxTextMod for 8 years now.

I strongly discourage such an approach as automatic means triggered by MediaWiki on all Wikis for all users.

Dealing with the strange things that happen here needs editors who know the Linter methodology and have very good skills in wikisyntax. Less than 1 % of the editors in German Wikipedia could deal with that suggestions. Newbies get entirely smashed.

Error causes are ambiguous. It is necessary to understand the intention of the author first before appropriate remedy can be chosen. It might have been the idea to put some words <small><small>minor into an empty element, or creating an anchor by <span id="here" /> and making a mess of it. An auto-correction will silently remove that and leaves no trace and no hint.

And popping up somewhere a &lt;div/> on output is a very bad idea. That will be presented to all readers on all such pages, while it worked for a decade with that error andlooke pretty. The only problem an unclosed div causes that a validator complains, but it does not affect readability of the page.

Leaving short comments in the syntax would be understood by a few people only. Causing more confusion in a wide majority of editors rather than helping to solve problems. Leave such things to syntax experts.