Talk:Parsoid/Normalizations

The AbuseLog suggests some others:

Empty style tags

 * Examples: here or here.


 * Nico, Amir, looks like empty style tags (with only whitespace)   is not that uncommon on wikis. We found so many that we had to add a special tweak to our rt-testing script to suppress those normalizations (when doing roundtrip testing) to reduce noise in our diffs. https://gerrit.wikimedia.org/r/#/c/230926/ is the fix. But, this normalization is enabled for edited content in VE. So, we will strip style tags in edits that only wrap whitespace. FYI. Please flag if this not desirable. cc: Arlo, Elitre, James SSastry (WMF) (talk) 22:55, 11 August 2015 (UTC)

Whitespace at the start of a paragraph
These nowikis are to prevent roundtripping as preformatted text.


 * This seems to be fixed in production in Wikimedia sites as of June 26 2015. I cannot reproduce it any more. --Amir E. Aharoni (talk) 07:30, 29 June 2015 (UTC)
 * Not true. See this diff from frwiki from 30th June for example. So, VE's fixes aren't foolproof yet. We'll start handling these on our end, and it won't hurt in any case. SSastry (WMF) (talk) 04:47, 1 July 2015 (UTC)
 * SSastry, any update about this? I still see this happening up to five times a day in he.wikipedia (I follow nowikis maniacally in my home wiki ;) ). --Amir E. Aharoni (talk) 08:32, 26 July 2015 (UTC)
 * https://phabricator.wikimedia.org/T105239 was deployed on the 14th, but there may be a few things to fix yet. I had discussed about this with User:Arlolra because of this diff. He told me that he'd work on that. (I managed to reproduce that even worse here). --Elitre (WMF) (talk) 11:53, 27 July 2015 (UTC) PS. I now notice he had filed https://phabricator.wikimedia.org/T106909 for the nl.wiki issue. --Elitre (WMF) (talk) 15:45, 27 July 2015 (UTC)
 * Amir, please see discussion on https://phabricator.wikimedia.org/T106909 SSastry (WMF) (talk) 23:11, 27 July 2015 (UTC)
 * SSastry, to make sure that I understand it correctly, does this mean that it's on the way and will be live in the coming couple of weeks? --Amir E. Aharoni (talk) 12:28, 28 July 2015 (UTC)
 * It should have gone out on the 14th, as mentioned above, but the  parameter wasn't being forwarded by RESTBase. They deployed a fix for that just now (see T106909) and I've confirmed that the issue we were seeing on nlwiki is resolved. So, moving forward, this should no longer be an issue. Further, we've merged https://gerrit.wikimedia.org/r/#/c/226667/ to be deployed tomorrow, which should address https://phabricator.wikimedia.org/T104554. Arlolra (talk) 22:14, 28 July 2015 (UTC)
 * Thanks, Arlolra! --Amir E. Aharoni (talk) 16:11, 29 July 2015 (UTC)
 * Amir, so, are things looking better now? SSastry (WMF) (talk) 21:54, 30 July 2015 (UTC)
 * SSastry - Looks like it does! I cannot reproduce it myself, and I haven't seen this happening in other edits yet. --Amir E. Aharoni (talk) 14:22, 31 July 2015 (UTC)

Empty links

 * It's not clear what to do automatically in such a case, but I noticed that often the output is Italie, and then it's pretty obvious that the output should be Italie . --Amir E. Aharoni (talk) 16:00, 10 July 2015 (UTC)
 * I suspect the particular case you describe here has probably been fixed by VE enabling the scrubWikitext flag. This should fall into the bucket with Parsoid/Normalizations. But the general case might still be worth looking into if it's more prevalent. Arlolra (talk) 18:28, 3 August 2015 (UTC)
 * Patch for this in https://gerrit.wikimedia.org/r/#/c/229597/ Arlolra (talk) 23:06, 5 August 2015 (UTC)

An empty nowiki tag without anything at the end of the line
For example in the Hebrew Wikipedia here.
 * Maybe the nowiki is being leftover from the removed line ie. deleting in VE doesn't remove the nowiki meta. How common is this scenario? Arlolra (talk) 18:43, 3 August 2015 (UTC)

A link that ends in space
Such as: Berlin is the capital of Germany.

This should be Berlin is the capital of Germany.

(It's barely imaginable that somebody actually prefers the former over the latter.)
 * This one was addressed in https://gerrit.wikimedia.org/r/#/c/228895/ Arlolra (talk) 22:51, 3 August 2015 (UTC)


 * Nico, Amir, looks like trailing white space for links is fairly common on wikis. We found so many that we are now adding another special tweak to our rt-testing script to suppress those normalizations (when doing roundtrip testing) to reduce noise in our diffs. But, this normalization is enabled for edited content in VE. So, we will migrate trailing whitespace out of the links (independent of whether it might introducee a nowiki or not). FYI. Please flag if this not desirable in which case we'll restrict this normalization to only nowiki-introducing links. I think we should update our normalization docs to more clearly document these behaviors. cc: Arlo, Elitre, James SSastry (WMF) (talk) 22:55, 11 August 2015 (UTC)

Cases from frwiki
I would suggest also to look at frwiki and not just enwiki, where VE is deactivated by default for everyone, to get a lot more examples of nowiki tags added by VE/Parsoid in places where a better solution could be provided. You can use frwiki abuse log for nowiki. --NicoV (talk) 17:05, 3 August 2015 (UTC)
 * I check the corresponding data for Hebrew every day, and look for repeating patterns, manually - w:he:WP:VE/nowiki.
 * I cannot expect Parsoid developers to search for such info in all languages, so I suggest that people who care about wikis in their languages do this manually. Together we'll eradicate this :) --Amir E. Aharoni (talk) 17:22, 3 August 2015 (UTC)
 * I started this list today, but it takes a long time to analyze the edits made by VE: fr:Wikipédia:ÉditeurVisuel/Avis/Nowiki. Is it interesting for the developers ? --NicoV (talk) 21:30, 3 August 2015 (UTC)

Non-nowiki related normalization
See this. These scenarios can be handled by simply swapping the sole I/B child of an A-tag around .. i.e. .. ==> ..