Parsoid/tmp

Crashes from rt testing
Tokenizer error in Template:Infobox: TypeError: Object function Object { [native code] } has no method 'get' TypeError: Object function Object { [native code] } has no method 'get' at TemplateHandler.lookupArg (/home/gwicke/parsoid/js/lib/ext.core.TemplateHandler.js:430:8) at TemplateHandler.fetchArg (/home/gwicke/parsoid/js/lib/ext.core.TemplateHandler.js:409:3) at TemplateHandler.onTemplateArg (/home/gwicke/parsoid/js/lib/ext.core.TemplateHandler.js:404
 * 7)

at AsyncTokenTransformManager.transformTokens (/home/gwicke/parsoid/js/lib/mediawiki.TokenTra nsformManager.js:448:17) at AsyncTokenTransformManager.onChunk (/home/gwicke/parsoid/js/lib/mediawiki.TokenTransformMa nager.js:294:17) at SyncTokenTransformManager.EventEmitter.emit (events.js:88:17) at SyncTokenTransformManager.onChunk (/home/gwicke/parsoid/js/lib/mediawiki.TokenTransformMan ager.js:752:7) at SyncTokenTransformManager.process (/home/gwicke/parsoid/js/lib/mediawiki.TokenTransformMan ager.js:670:7) at ParserPipeline.process (/home/gwicke/parsoid/js/lib/mediawiki.parser.js:340:20) at Frame.expand (/home/gwicke/parsoid/js/lib/mediawiki.TokenTransformManager.js:1245:13)

Tokenizer error in Template:Ibara_Railway_Ibara_Line: TypeError: Cannot call method 'clone' of un defined Getting a title.... TypeError: Cannot call method 'clone' of undefined at defaultNestedDelimiterHandler (/home/gwicke/parsoid/js/lib/ext.core.NoIncludeOnly.js:128:4 0) at noIncludeHandler (/home/gwicke/parsoid/js/lib/ext.core.NoIncludeOnly.js:160:15) at TokenAndAttrCollector.inspectAttrs (/home/gwicke/parsoid/js/lib/ext.util.TokenAndAttrColle ctor.js:136:16) at TokenAndAttrCollector.onAnyToken (/home/gwicke/parsoid/js/lib/ext.util.TokenAndAttrCollect or.js:159:16) at SyncTokenTransformManager.onChunk (/home/gwicke/parsoid/js/lib/mediawiki.TokenTransformMan ager.js:722:22) at PegTokenizer.EventEmitter.emit (events.js:88:17) at PegTokenizer.onCacheChunk (/home/gwicke/parsoid/js/lib/mediawiki.tokenizer.peg.js:135:7) at emitChunk (eval at (/home/gwicke/parsoid/js/lib/mediawiki.tokenizer.peg.js:69: 44))   at eval (eval at (/home/gwicke/parsoid/js/lib/mediawiki.tokenizer.peg.js:69:44)) at Object.parse_start [as start] (eval at (/home/gwicke/parsoid/js/lib/mediawiki. tokenizer.peg.js:69:44))

Tokenizer error in Template:Ibara_Railway_Ibara_Line: TypeError: Cannot call method 'clone' of un defined TypeError: Cannot call method 'clone' of undefined at defaultNestedDelimiterHandler (/home/gwicke/parsoid/js/lib/ext.core.NoIncludeOnly.js:128:4 0) at noIncludeHandler (/home/gwicke/parsoid/js/lib/ext.core.NoIncludeOnly.js:160:15) at TokenAndAttrCollector.inspectAttrs (/home/gwicke/parsoid/js/lib/ext.util.TokenAndAttrColle ctor.js:136:16) at TokenAndAttrCollector.onAnyToken (/home/gwicke/parsoid/js/lib/ext.util.TokenAndAttrCollect or.js:159:16) at SyncTokenTransformManager.onChunk (/home/gwicke/parsoid/js/lib/mediawiki.TokenTransformMan ager.js:722:22) at PegTokenizer.EventEmitter.emit (events.js:88:17) at PegTokenizer.onCacheChunk (/home/gwicke/parsoid/js/lib/mediawiki.tokenizer.peg.js:135:7) at emitChunk (eval at (/home/gwicke/parsoid/js/lib/mediawiki.tokenizer.peg.js:69: 44))   at eval (eval at (/home/gwicke/parsoid/js/lib/mediawiki.tokenizer.peg.js:69:44)) at Object.parse_start [as start] (eval at (/home/gwicke/parsoid/js/lib/mediawiki. tokenizer.peg.js:69:44))

Paragraph-related regressions
FAILED: Comment on its own line post-expand FAILED: Definition list with empty definition and following paragraph FAILED: Incorrecly removing closing slashes from correctly formed XHTML FAILED: Failing to transform badly formed HTML into correct XHTML FAILED: Template with complex arguments FAILED: BUG 529: Template with table, not included at beginning of line FAILED: Parsing crashing regression (fr:JavaScript) FAILED: Bug 6200: blockquotes and paragraph formatting FAILED: Nesting tags, paragraphs on lines which begin with

Likely not paragraph-related: FAILED: Unclosed and unmatched quotes FAILED: to

Templates producing empty output break p-wrapping logic
Consider the 4 examples below.

Example 1 - a

c

Example 2 - a

c

Example 3 - a

c

Example 4 - a

c

In the PHP parser * Examples 2,3,4 lead to separate paragraphs for a and c. * In example 1, the entire block is a single paragraph.

In Parsoid, * All examples are wrapped in a single paragraph.

At least the case of example 4 is a Parsoid bug and should be fixed. Cases 2 and 3 are unclear -- unsure of what the logic is for that. Maybe fixing example 4 might automatically fix 2 and 3 -- unclear. But, worth fixing 4 first and re-evaluate.

The other regression seems related and we earlier talked about whitelisting this since there is no semantic/visual layout diff in php parser and parsoid output foo generates a table. Since the  tag is on the same textual line as foo, Parsoid does not wrap foo in a p-tag. PHP parser probably treats this as a line break, unsure, and wraps foo in a p-tag.

Blockquote tags
PHP parser treats blockquote tag specially. It is not like other block tags. * p and pre tags are suppressed within blockquotes * Hence the blockquote failure for the example below.

foo

bar

baz

Examples illustrating diffs a b

a b In the div-output, a is wrapped in a p-tag, b in a pre-tag. In the blockquoe-output, nothing is wrapped in these tags.

Result of looking at the php parser's doBlockLevels code: [11:40] subbu: the reason for the diff in blockquote handling is that a div is not considered to be a block element in doBlockLevels [11:40] while blockquote,td,li etc are

BR tags
tags are block tags, so, they are not wrapped in  tags. * But the PHP parser wraps them and treats them like an inline tag? * Hence the br failures above.



  

Sanitization
Sanitizer converts some tags into text which may then need to wrapped in p-tags. So, maybe run that before P-wrapper?



Inline tags start on a block tag line and ending on another line
The PHP parser generates malformed HTML for these examples and are disabled. Parsoid generates well-formed HTML but semantically incorrect output.

Handling this correctly may require tree context which we dont have in the token stream transformers.

Examples

a b

a b

List handler swallows a newline?
In the example below, in the output, foo ends up on the same line as the  tag. This means that the list handler is probably swallowing the trailing newline after ":"

Example --- foo &gt;/pre&lt;
 * term:

Round-trip test regressions
Bug 529: Uncovered bullet Nesting tags, paragraphs on lines which begin with