Topic on Help talk:Extension:Linter

Change in Special:LintErrors ?

9
Summary by SSastry (WMF)

The linter extension now displays a prominent note that the counts are estimates, not actual numbers.

Lsj (talkcontribs)

I'm working on cleaning up Lint errors on cebwp with my bot. Been making steady progress for some time, but today suddenly the error counts in Special:LintErrors on cebwp jumped back up by the hundreds of thousands. What has changed? Lsj (talk) 20:10, 9 February 2018 (UTC)

SSastry (WMF) (talkcontribs)

Interesting. Yes, this is a fallout of https://phabricator.wikimedia.org/T184280 wherein we changed our error counts from exact numbers to "estimates". And, so, that is probably why you saw this. @Legoktm FYI. The counts went from about 480K to 710K. Is this kind of change expected?

SSastry (WMF) (talkcontribs)

The same is true of enwiki as well -- the counts have all more or less doubled from what they where. Hmm ...

Lsj (talkcontribs)

The numbers decrease steadily with further bot running - but they decrease by much more than one per bot edit. Fixing 1,000 errors decreases the estimate by several thousands.

Lsj (talkcontribs)

Correction to previous post. Numbers are jumping up and down without obvious pattern, +/- 10,000-100,000 or so.

197.218.89.76 (talkcontribs)

Apparently the dashboard (https://tools.wmflabs.org/wikitext-deprecation/) simply extracts the data from the wiki / API, so it too is wrong. It seems like might be simpler to extract the whole list and count it clientside using the api directly as it doesn't seem useful for a bot to check the count anyway (Extension:Linter#API).

Unfortunately, that is likely to take an hour or more on a wiki with so many errors. Once it gets to a lower number of errors counting doesn't seem to be a problem anymore. Another benefit of doing it clientside is that ability to get a count of errors of other namespaces.

Long term, it might be useful for the linter to have an export facility like Special:Export that dumps all lint errors.

SSastry (WMF) (talkcontribs)

But, your comment points to something else ... an alternative to doing live counts (besides fixing the code to run background queries) would be to have the deprecation dashboard run precise counts off database replicas on a labs VM.

197.218.89.196 (talkcontribs)

Yes, there is little value in updating the counting regularly for wikis with lots of errors. They'll still have 10 Milllion + errors one hour later or a day later. In fact, in any highly active wiki the count will be wrong more often than not (because of race conditions), e.g. by the time someone looks at the category page, a delayed updated may add 100 more items.

Perhaps it would be wise to add a huge disclaimer that the count is an estimate and add a link to a separate resource if they really want relatively up to date statistics updated hourly or so.

In any case people are used to it (e.g. with maintenance reports).

SSastry (WMF) (talkcontribs)

The problem right now are that are far too many errors in low priority categories on english and commons wikipedias. So, dumping will take far too long. (70M on commons, and 20-30M on english).

@Legoktm one compromise would be to use precise counts for high and medium priority categories and estimates for lower priority categories.