Parsoid/tmp

Paragraph-related regressions
FAILED: Comment on its own line post-expand FAILED: Definition list with empty definition and following paragraph FAILED: Incorrecly removing closing slashes from correctly formed XHTML FAILED: Failing to transform badly formed HTML into correct XHTML FAILED: Template with complex arguments FAILED: BUG 529: Template with table, not included at beginning of line FAILED: Parsing crashing regression (fr:JavaScript) FAILED: Bug 6200: blockquotes and paragraph formatting FAILED: Nesting tags, paragraphs on lines which begin with

Likely not paragraph-related: FAILED: Unclosed and unmatched quotes FAILED: to

Templates producing empty output break p-wrapping logic
Consider the 4 examples below.

Example 1 - a

c

Example 2 - a

c

Example 3 - a

c

Example 4 - a

c

In the PHP parser * Examples 2,3,4 lead to separate paragraphs for a and c. * In example 1, the entire block is a single paragraph.

In Parsoid, * All examples are wrapped in a single paragraph.

At least the case of example 4 is a Parsoid bug and should be fixed. Cases 2 and 3 are unclear -- unsure of what the logic is for that. Maybe fixing example 4 might automatically fix 2 and 3 -- unclear. But, worth fixing 4 first and re-evaluate.

The other regression seems related and we earlier talked about whitelisting this since there is no semantic/visual layout diff in php parser and parsoid output foo generates a table. Since the  tag is on the same textual line as foo, Parsoid does not wrap foo in a p-tag. PHP parser probably treats this as a line break, unsure, and wraps foo in a p-tag.

Blockquote tags
PHP parser treats blockquote tag specially. It is not like other block tags. * p and pre tags are suppressed within blockquotes * Hence the blockquote failure for the example below.

foo

bar

baz

Examples illustrating diffs a b

a b In the div-output, a is wrapped in a p-tag, b in a pre-tag. In the blockquoe-output, nothing is wrapped in these tags.

Result of looking at the php parser's doBlockLevels code: [11:40] subbu: the reason for the diff in blockquote handling is that a div is not considered to be a block element in doBlockLevels [11:40] while blockquote,td,li etc are

BR tags
tags are block tags, so, they are not wrapped in  tags. * But the PHP parser wraps them and treats them like an inline tag? * Hence the br failures above.



  

Sanitization
Sanitizer converts some tags into text which may then need to wrapped in p-tags. So, maybe run that before P-wrapper?



Inline tags start on a block tag line and ending on another line
The PHP parser generates malformed HTML for these examples and are disabled. Parsoid generates well-formed HTML but semantically incorrect output.

Handling this correctly may require tree context which we dont have in the token stream transformers.

Examples

a b

a b

List handler swallows a newline?
In the example below, in the output, foo ends up on the same line as the  tag. This means that the list handler is probably swallowing the trailing newline after ":"

Example --- foo &gt;/pre&lt;
 * term:

Round-trip test regressions
Bug 529: Uncovered bullet Nesting tags, paragraphs on lines which begin with