Parsoid/tmp

From mediawiki.org

Crashes from rt testing[edit]

TypeError: Cannot call method 'removeChild' of null
    at encapsulateTemplates (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:735:28)
    at Array.encapsulateTemplateOutput [as 4] (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:1056:3)
    at DOMPostProcessor.doPostProcess (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:1083:22)
    at EventEmitter.emit (events.js:88:17)
    at FauxHTML5.TreeBuilder.onEnd (/data/project/parsoid/js/lib/mediawiki.HTML5TreeBuilder.node.js:55:7)
    at SyncTokenTransformManager.EventEmitter.emit (events.js:85:17)
    at SyncTokenTransformManager.onEndEvent (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:766:7)
    at AsyncTokenTransformManager.EventEmitter.emit (events.js:85:17)
    at AsyncTokenTransformManager.emitChunk (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:263:8)
    at TokenAccumulator._callParentCB (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:1073:16)
    at TokenAccumulator._returnTokens (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:1027:8)
Posting a result for Teofil Mirghesiu....
Running a test on Wikipedia:Articles for deletion/Mandalism....
Posting a result for Hop sing....
Posting a result for Wikipedia:Articles for deletion/Mandalism....
Running a test on Top grossing Bollywood Films of 2011....
Running a test on Speed skating at the 1952 Winter Olympics – Men's 10000 metres....
Running a test on Wikipedia:WikiProject Films/Assessment/Tag & Assess 2009-2010/118....
Posting a result for Speed skating at the 1952 Winter Olympics – Men's 10000 metres....
Posting a result for Wikipedia:WikiProject Films/Assessment/Tag & Assess 2009-2010/118....
Running a test on Unicode Fonts....
Posting a result for Mmo2....
Running a test on Metal Gear Saga Vol. 1....
Running a test on Justice Vinson....
Running a test on Chiquita Banana....
SyntaxError: Unexpected end of input
    at Object.parse (native)
    at Array.patchUpDOM [as 0] (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:402:22
)
    at DOMPostProcessor.doPostProcess (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js
:1083:22)
    at EventEmitter.emit (events.js:88:17)
    at FauxHTML5.TreeBuilder.onEnd (/data/project/parsoid/js/lib/mediawiki.HTML5TreeBuilder.node.
js:55:7)
    at SyncTokenTransformManager.EventEmitter.emit (events.js:85:17)
    at SyncTokenTransformManager.onEndEvent (/data/project/parsoid/js/lib/mediawiki.TokenTransfor
mManager.js:766:7)
    at AsyncTokenTransformManager.EventEmitter.emit (events.js:85:17)
    at AsyncTokenTransformManager.emitChunk (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:263:8)
    at TokenAccumulator._callParentCB (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:1073:16)
TypeError: Cannot call method 'match' of null
    at WSP._serializeToken (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1512:17)
    at WSP._serializeDOM (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1757:10)
    at WSP._serializeDOM (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1787:10)
    at WSP._serializeDOM (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1787:10)
    at WSP.serializeDOM (/data/project/parsoid/js/lib/mediawiki.WikitextSerializer.js:1676:9)
    at roundTripDiff (/data/project/parsoid/js/tests/roundtrip-test.js:277:47)
    at fetch (/data/project/parsoid/js/tests/roundtrip-test.js:310:6)
    at DOMPostProcessor.EventEmitter.emit (events.js:88:17)
    at DOMPostProcessor.doPostProcess (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js
:1088:7)
    at EventEmitter.emit (events.js:88:17)
SyntaxError: Unexpected end of input
    at Object.parse (native)
    at Array.patchUpDOM [as 0] (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:402:22)
    at DOMPostProcessor.doPostProcess (/data/project/parsoid/js/lib/mediawiki.DOMPostProcessor.js:1083:22)
    at EventEmitter.emit (events.js:88:17)
    at FauxHTML5.TreeBuilder.onEnd (/data/project/parsoid/js/lib/mediawiki.HTML5TreeBuilder.node.js:55:7)
    at SyncTokenTransformManager.EventEmitter.emit (events.js:85:17)
    at SyncTokenTransformManager.onEndEvent (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:766:7)
    at AsyncTokenTransformManager.EventEmitter.emit (events.js:85:17)
    at AsyncTokenTransformManager.emitChunk (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:263:8)
    at TokenAccumulator._callParentCB (/data/project/parsoid/js/lib/mediawiki.TokenTransformManager.js:1073:16)

Paragraph-related regressions[edit]

FAILED: Comment on its own line post-expand
FAILED: Definition list with empty definition and following paragraph
FAILED: Incorrecly removing closing slashes from correctly formed XHTML
FAILED: Failing to transform badly formed HTML into correct XHTML
FAILED: Template with complex arguments
FAILED: BUG 529: Template with table, not included at beginning of line
FAILED: Parsing crashing regression (fr:JavaScript)
FAILED: Bug 6200: blockquotes and paragraph formatting
FAILED: Nesting tags, paragraphs on lines which begin with <div>

Likely not paragraph-related:
FAILED: Unclosed and unmatched quotes
FAILED: <br> to <br />


Templates producing empty output break p-wrapping logic[edit]

Consider the 4 examples below.

Example 1
---------
a
<!--b-->
c

Example 2
---------
a
{{echo|<!--b-->}}
c

Example 3
---------
a
{{echo|}}<!--b-->
c

Example 4
---------
a
{{echo|}}
c

In the PHP parser

* Examples 2,3,4 lead to separate paragraphs for a and c.
* In example 1, the entire block is a single paragraph.

In Parsoid,

* All examples are wrapped in a single paragraph.

At least the case of example 4 is a Parsoid bug and should be fixed. Cases 2 and 3 are unclear -- unsure of what the logic is for that. Maybe fixing example 4 might automatically fix 2 and 3 -- unclear. But, worth fixing 4 first and re-evaluate.

The other regression seems related and we earlier talked about whitelisting this since there is no semantic/visual layout diff in php parser and parsoid output

foo {{table}}

{{table}} generates a table. Since the <table> tag is on the same textual line as foo, Parsoid does not wrap foo in a p-tag. PHP parser probably treats this as a line break, unsure, and wraps foo in a p-tag.

Blockquote tags[edit]

PHP parser treats blockquote tag specially. It is not like other block tags.

* p and pre tags are suppressed within blockquotes
* Hence the blockquote failure for the example below.
<blockquote>
foo
</blockquote>

bar

 baz

Examples illustrating diffs

<div>
a
 b
</div>

<blockquote>
a
 b
<blockquote>

In the div-output, a is wrapped in a p-tag, b in a pre-tag. In the blockquoe-output, nothing is wrapped in these tags.

Result of looking at the php parser's doBlockLevels code:

[11:40] <gwicke> subbu: the reason for the diff in blockquote handling is that a div is not considered to be a block element in doBlockLevels
[11:40] <gwicke> while blockquote,td,li etc are

BR tags[edit]

<br> tags are block tags, so, they are not wrapped in <p> tags.

* But the PHP parser wraps them and treats them like an inline tag?
* Hence the br failures above.
------------
<br style="clear:both;" />
------------
<br style="clear: left;">
<br style="clear: right;">
<br style="clear: both;">
------------

Sanitization[edit]

Sanitizer converts some tags into text which may then need to wrapped in p-tags. So, maybe run that before P-wrapper?

------------
</body></x>
------------

Inline tags start on a block tag line and ending on another line[edit]

The PHP parser generates malformed HTML for these examples and are disabled. Parsoid generates well-formed HTML but semantically incorrect output.

Handling this correctly may require tree context which we dont have in the token stream transformers.

Examples

--------
<div></div><span>a
b</span>

<div></div><strong>a
b</strong>


List handler swallows a newline?[edit]

In the example below, in the output, foo ends up on the same line as the </dl> tag. This means that the list handler is probably swallowing the trailing newline after ":"

Example

-------
; term:
foo
>/pre<

Round-trip test regressions[edit]

Bug 529: Uncovered bullet
Nesting tags, paragraphs on lines which begin with <div>