Parsoid/tmp

Crashes from rt testing
TypeError: Cannot call method 'removeChild' of null at encapsulateTemplates (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:735:28) at Array.encapsulateTemplateOutput [as 4] (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:1056:3) at DOMPostProcessor.doPostProcess (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:1083:22) at EventEmitter.emit (events.js:88:17) at FauxHTML5.TreeBuilder.onEnd (/data/project/parsoid/js/lib/mediawiki.HTML5TreeBuilder.node.js:55:7) at SyncTokenTransformManager.EventEmitter.emit (events.js:85:17) at SyncTokenTransformManager.onEndEvent (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:766:7) at AsyncTokenTransformManager.EventEmitter.emit (events.js:85:17) at AsyncTokenTransformManager.emitChunk (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:263:8) at TokenAccumulator._callParentCB (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:1073:16) at TokenAccumulator._returnTokens (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:1027:8)

TypeError: Cannot call method 'match' of null at WSP._serializeToken (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1512:17) at WSP._serializeDOM (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1757:10) at WSP._serializeDOM (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1787:10) at WSP._serializeDOM (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1787:10) at WSP.serializeDOM (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1676:9) at roundTripDiff (/data/project/parsoid/js/tests/roundtrip-test.js:277:47) at fetch (/data/project/parsoid/js/tests/roundtrip-test.js:310:6) at DOMPostProcessor.EventEmitter.emit (events.js:88:17) at DOMPostProcessor.doPostProcess (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js
 * 1088:7)

at EventEmitter.emit (events.js:88:17) SyntaxError: Unexpected end of input at Object.parse (native) at Array.patchUpDOM [as 0] (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:402:22) at DOMPostProcessor.doPostProcess (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:1083:22) at EventEmitter.emit (events.js:88:17) at FauxHTML5.TreeBuilder.onEnd (/data/project/parsoid/js/lib/mediawiki.HTML5TreeBuilder.node.js:55:7) at SyncTokenTransformManager.EventEmitter.emit (events.js:85:17) at SyncTokenTransformManager.onEndEvent (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:766:7) at AsyncTokenTransformManager.EventEmitter.emit (events.js:85:17) at AsyncTokenTransformManager.emitChunk (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:263:8) at TokenAccumulator._callParentCB (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:1073:16)

Paragraph-related regressions
FAILED: Comment on its own line post-expand FAILED: Definition list with empty definition and following paragraph FAILED: Incorrecly removing closing slashes from correctly formed XHTML FAILED: Failing to transform badly formed HTML into correct XHTML FAILED: Template with complex arguments FAILED: BUG 529: Template with table, not included at beginning of line FAILED: Parsing crashing regression (fr:JavaScript) FAILED: Bug 6200: blockquotes and paragraph formatting FAILED: Nesting tags, paragraphs on lines which begin with

Likely not paragraph-related: FAILED: Unclosed and unmatched quotes FAILED: to

Templates producing empty output break p-wrapping logic
Consider the 4 examples below.

Example 1 - a

c

Example 2 - a

c

Example 3 - a

c

Example 4 - a

c

In the PHP parser * Examples 2,3,4 lead to separate paragraphs for a and c. * In example 1, the entire block is a single paragraph.

In Parsoid, * All examples are wrapped in a single paragraph.

At least the case of example 4 is a Parsoid bug and should be fixed. Cases 2 and 3 are unclear -- unsure of what the logic is for that. Maybe fixing example 4 might automatically fix 2 and 3 -- unclear. But, worth fixing 4 first and re-evaluate.

The other regression seems related and we earlier talked about whitelisting this since there is no semantic/visual layout diff in php parser and parsoid output foo generates a table. Since the  tag is on the same textual line as foo, Parsoid does not wrap foo in a p-tag. PHP parser probably treats this as a line break, unsure, and wraps foo in a p-tag.

Blockquote tags
PHP parser treats blockquote tag specially. It is not like other block tags. * p and pre tags are suppressed within blockquotes * Hence the blockquote failure for the example below.

foo

bar

baz

Examples illustrating diffs a b

a b In the div-output, a is wrapped in a p-tag, b in a pre-tag. In the blockquoe-output, nothing is wrapped in these tags.

Result of looking at the php parser's doBlockLevels code: [11:40] subbu: the reason for the diff in blockquote handling is that a div is not considered to be a block element in doBlockLevels [11:40] while blockquote,td,li etc are

BR tags
tags are block tags, so, they are not wrapped in  tags. * But the PHP parser wraps them and treats them like an inline tag? * Hence the br failures above.



  

Sanitization
Sanitizer converts some tags into text which may then need to wrapped in p-tags. So, maybe run that before P-wrapper?



Inline tags start on a block tag line and ending on another line
The PHP parser generates malformed HTML for these examples and are disabled. Parsoid generates well-formed HTML but semantically incorrect output.

Handling this correctly may require tree context which we dont have in the token stream transformers.

Examples

a b

a b

List handler swallows a newline?
In the example below, in the output, foo ends up on the same line as the  tag. This means that the list handler is probably swallowing the trailing newline after ":"

Example --- foo &gt;/pre&lt;
 * term:

Round-trip test regressions
Bug 529: Uncovered bullet Nesting tags, paragraphs on lines which begin with