Talk:Parsoid/Normalizations

The AbuseLog suggests some others:

Empty style tags

 * Examples: here or here.

Whitespace at the start of a paragraph
These nowikis are to prevent roundtripping as preformatted text.


 * This seems to be fixed in production in Wikimedia sites as of June 26 2015. I cannot reproduce it any more. --Amir E. Aharoni (talk) 07:30, 29 June 2015 (UTC)
 * Not true. See this diff from frwiki from 30th June for example. So, VE's fixes aren't foolproof yet. We'll start handling these on our end, and it won't hurt in any case. SSastry (WMF) (talk) 04:47, 1 July 2015 (UTC)


 * SSastry, any update about this? I still see this happening up to five times a day in he.wikipedia (I follow nowikis maniacally in my home wiki ;) ). --Amir E. Aharoni (talk) 08:32, 26 July 2015 (UTC)
 * https://phabricator.wikimedia.org/T105239 was deployed on the 14th, but there may be a few things to fix yet. I had discussed about this with User:Arlolra because of this diff. He told me that he'd work on that. (I managed to reproduce that even worse here). --Elitre (WMF) (talk) 11:53, 27 July 2015 (UTC) PS. I now notice he had filed https://phabricator.wikimedia.org/T106909 for the nl.wiki issue. --Elitre (WMF) (talk) 15:45, 27 July 2015 (UTC)
 * Amir, please see discussion on https://phabricator.wikimedia.org/T106909 SSastry (WMF) (talk) 23:11, 27 July 2015 (UTC)
 * SSastry, to make sure that I understand it correctly, does this mean that it's on the way and will be live in the coming couple of weeks? --Amir E. Aharoni (talk) 12:28, 28 July 2015 (UTC)
 * It should have gone out on the 14th, as mentioned above, but the  parameter wasn't being forwarded by RESTBase. They deployed a fix for that just now (see T106909) and I've confirmed that the issue we were seeing on nlwiki is resolved. So, moving forward, this should no longer be an issue. Further, we've merged https://gerrit.wikimedia.org/r/#/c/226667/ to be deployed tomorrow, which should address https://phabricator.wikimedia.org/T104554. Arlolra (talk) 22:14, 28 July 2015 (UTC)
 * Thanks, Arlolra! --Amir E. Aharoni (talk) 16:11, 29 July 2015 (UTC)
 * Amir, so, are things looking better now? SSastry (WMF) (talk) 21:54, 30 July 2015 (UTC)
 * SSastry - Looks like it does! I cannot reproduce it myself, and I haven't seen this happening in other edits yet. --Amir E. Aharoni (talk) 14:22, 31 July 2015 (UTC)

Empty links

 * It's not clear what to do automatically in such a case, but I noticed that often the output is Italie, and then it's pretty obvious that the output should be Italie . --Amir E. Aharoni (talk) 16:00, 10 July 2015 (UTC)
 * I suspect the particular case you describe here has probably been fixed by VE enabling the scrubWikitext flag. This should fall into the bucket with Parsoid/Normalizations. But the general case might still be worth looking into if it's more prevalent. Arlolra (talk) 18:28, 3 August 2015 (UTC)

An empty nowiki tag without anything at the end of the line
For example in the Hebrew Wikipedia here.
 * Maybe the nowiki is being leftover from the removed line ie. deleting in VE doesn't remove the nowiki meta. How common is this scenario? Arlolra (talk) 18:43, 3 August 2015 (UTC)

A link that ends in space
Such as: Berlin is the capital of Germany.

This should be Berlin is the capital of Germany.

(It's barely imaginable that somebody actually prefers the former over the latter.)

Cases from frwiki
I would suggest also to look at frwiki and not just enwiki, where VE is deactivated by default for everyone, to get a lot more examples of nowiki tags added by VE/Parsoid in places where a better solution could be provided. You can use frwiki abuse log for nowiki. --NicoV (talk) 17:05, 3 August 2015 (UTC)
 * I check the corresponding data for Hebrew every day, and look for repeating patterns, manually - w:he:WP:VE/nowiki.
 * I cannot expect Parsoid developers to search for such info in all languages, so I suggest that people who care about wikis in their languages do this manually. Together we'll eradicate this :) --Amir E. Aharoni (talk) 17:22, 3 August 2015 (UTC)