Parsoid/Normalizations

While serializing (html2wt), Parsoid performs a number of normalizations, some behind a  flag.

Most can be found in normalizeDOM.js

Default
These are the normalizations that Parsoid performs by default.
 * Tag minimization ( / tags)
 * Serialize invalid  tags to text
 * Enforce single-line context (in headings and lists)

scrub_wikitext
These normalizations are enabled if the  parameter is passed to the Parsoid API. Other normalizations that work around issues in Parsoid / VE+clients as a simpler solution for generating clean wikitext (at least for now)
 * Strip empty headings and style tags (only performed on new nodes)
 * Tag minimization ( tags, when at least one is new)
 * Whitespace at the start of paragraphs
 * New links that end in spaces
 * New table cells starting with escapable prefixes
 * Force category links and behaviour switches to serialize before/after headings (only performed on new nodes)
 * Strip  tags in headers (introduced by Parsoid in some paragraphs which when converted to headings in VE stick around)
 * Strip trailing &lt;nowiki/&gt; from wikitext lines (this one will be unnecessary once Parsoid stops introducing these)

Tag minimization ( / tags)
and

Force category links and behaviour switches to serialize before/after headings
and

Serialize invalid  tags to text
and

Enforce single-line context
and

However, newlines in transclusion parameters are preserved.

Strip empty headings and style tags
Normally, but with scrubbing it's all dropped.

Tag minimization ( tags)
and

Move formatting from link text to the entire link (with some exceptions)
This enables a simplified wikilink format if the href and link text formatting match. Without the reordering  would be emitted. With the reordering  will be emitted.

Exceptions:


 * If the formatting tags have attributes like color, style, class since the reordering can change rendering in some cases. The A-tag's color style will override the outer style, i.e.  doesn't render the same as
 * If the link text is not identical to the href, the reordering is not done since the simplified link form is not enabled in this case.

Whitespace at the start of paragraphs
These nowikis are to prevent roundtripping as preformatted text.

New links that end in spaces
The nowiki here is to prevent link trails.

New table cells starting with escapable prefixes
// normally serializes to

// but with scrubbing becomes

Related links

 * w:he:WP:VE/nowiki
 * w:fr:Wikipédia:ÉditeurVisuel/Avis/Nowiki