Topic on Talk:Parsing

Wikitext vs VisualEditor lint errors

8
197.218.91.124 (talkcontribs)

I'm sure this has been asked thousands of times, and previously ignored because there was no possibility of checking automatically. However, now that the parsoid / linter extension combo is able to detect and report markup errors this might be a perfect time to create an automatic log / graph of errors introduced by each of these tools to make it possible to somewhat measure if the visualeditor really lives up to its aspiration of reducing mistakes.

This may also help find previously unknown errors caused by either of these tools, so having such logging will certainly be a good idea anyway. This data could also be mined by automated tools such as bots and ORES to facilitate its cleanup.

Considering that smaller wikis don't have bots to regularly clean up these errors, it might also be useful to evaluate how long these errors tend to stick on articles, especially for errors that prevent the page from being viewed properly.

Legoktm (talkcontribs)

What kind of errors are you specifically thinking of?

197.218.91.124 (talkcontribs)

Mostly errors that either break the whole page or prevent it from working properly, some of these are quite severe:

There are plenty more, such as an unclosed html comment affecting page rendering under certain conditions (https://phabricator.wikimedia.org/T30939).

Some are quite easy to miss when writing wikitext or when section editing since markup in one section may be perfectly fine and preview well enough, but break the whole page when saved. Most of them however are quite visible in VE.

197.218.82.68 (talkcontribs)

Here's a classic and pretty nasty one, nested extension tags, e.g. (https://phabricator.wikimedia.org/T22707):

<ref>
<ref> Eureka
</ref>
</ref>
<Gallery> file 2.png
<gallery> file1.png <gallery/>
</gallery>

If I had to rate the most common issues (along with the others above) they'd probably be in this order:

Issue Task Frequency
Misnested tables T64323 High
Unclosed tags T59196 High
Misnested or broken Lists T3581 High
Links being misnested T13239 Low

Most of these occured in a non-WMF wiki, so if this happens on a low traffic wiki, it must be much much worse in a wiki used by thousands of editors. Just look through phabricator and you'll find hundreds more cases. Lists in particular are pretty problematic because they rarely work the way people hope they will. The team might also want to look into wiki validator (https://github.com/Wikia/wiva).

Perhaps there might be something useful in the code.

Jdforrester (WMF) (talkcontribs)

Hey there.

One of the design philosophies of VisualEditor as a tool is that it helps editors make edits "directly", and doesn't ever mysteriously do things just because we think the user should do this. We felt very strongly that this was a fundamental blocker to adoption on Wikimedia wikis, as it has to live in an environment with people using other tools to edit via wikitext. Consequently we've spent a lot of effort to avoid changing things unexpectedly, especially in the Parsoid service, and even when we know that there was something wrong. For example, when you edit an image we'll fix up the syntax, as if they were an expert wikitext user, but we won't ever change the syntax of things the editor didn't touch.

Because of this, measuring the before/after impact of parsing error from VisualEditor edits probably wouldn't be very helpful – you'd expect them to go down, but you wouldn't see a nullification of all such errors.

I'd be pretty worried about tools to do mass-level changes inside VisualEditor, as people might struggle to understand exactly what they're changing and why. In the future, we're planning (T128511) to provide a "prompt the user to do something", but that would still be a per-item-fix interface.

Hope that helps explain my thinking.

197.218.82.95 (talkcontribs)
 Because of this, measuring the before/after impact of parsing error from VisualEditor edits probably wouldn't be very helpful – you'd expect them to go down, but you wouldn't see a nullification of all such error.

The design philosophy is pretty reasonable, and you've raised a very valid point about the error detection not really proving much for existing articles. However, it probably would provide very useful data for page creation, e.g.:

  1. On average how many pages created have markup errors
  2. How many errors are introduced by new editors vs experienced editors
  3. Which types of errors occur most often
  4. What page components generate most errors

VE is very intuitive, but it does sometimes have surprising interactions when things are pasted , dragged, or when templates are mixed with other extensions.

I'd be pretty worried about tools to do mass-level changes inside VisualEditor, as people might struggle to understand exactly what they're changing and why

I agree entirely. Magic like behaviour can be very problematic, and could make even simple edits create complex revisions.

In the future, we're planning (T128511) to provide a "prompt the user to do something", but that would still be a per-item-fix interface.

Long live microsoft clippy!

Anyway, this seems like a very good idea for providing to warn the about some things they've overlooked or that may cause problems. The wiva tool has some functionality that fits in nicely with that task, e.g. makes mention of huge images that may not render properly on mobile.

This is useful because currently neither VE nor the WTE makes any attempt to warn the user about markup issues, article readability, or usability problems.

Jdforrester (WMF) (talkcontribs)
However, it probably would provide very useful data for page creation

OK, you've convinced me; I've created https://phabricator.wikimedia.org/T162958 and hopefully we'll have a moment to measure it soon enough.

Anyway, this seems like a very good idea for providing to warn the about some things they've overlooked or that may cause problems. The wiva tool has some functionality that fits in nicely with that task, e.g. makes mention of huge images that may not render properly on mobile.

Yeah, accessibility and content scale/size/depth hints are one of the things on my wishlist for this prompt tool.

197.218.88.255 (talkcontribs)
OK, you've convinced me; I've created https://phabricator.wikimedia.org/T162958 and hopefully we'll have a moment to measure it soon enough.

Great. My hunch is that on new pages, novices don't add as much markup (on average) with the "source editor", yet introduce more errors than VisualEditor users. When they do add complicated markup they probably cause errors by imperfectly copying the markup from existing pages.

Yeah, accessibility and content scale/size/depth hints are one of the things on my wishlist for this prompt tool.

Indeed, this is a very hard problem when one considers the variety of devices. Most editors are not trained to understand usability, and so it is something that is hard for them to grasp. "Online help"/ tooltips about why these are bad might be helpful to users.

There is probably a big issue with the perception of visualeditor vs wikitext editor. As all VisualEditor uses are tagged but wikitext edits aren't, it gives the impression that everything that isn't tagged is automatically a wikitext edit, which isn't the case because one could conceivably be editing using their own customized wiki application, the api, a fridge, a bot, or something else.

Maybe a new tag "API edit" or "unknown editing tool" should be added.

Reply to "Wikitext vs VisualEditor lint errors"