Talk:Parsoid/Todo

In this page we track and report Parsoid parsing, round-tripping or serialization issues. Problematic wikitext snippets can be added in Parsoid/Bug_test_cases for direct testing.

Misc issues
parser outputs a duplicate &lt;/pre&gt; in parsed html ; foo: bar roundtrips to     ; foo:
 * Try to emulate PHP parser in treating foo as foo (low priority)
 * search for 'listItem' in http://parsoid.wmflabs.org/_rt/Takeda%20clan.
 * SSS: This is a "syntax error" with mismatched ref tags in wikitext. The specific segment that crashes it is this:  . Note the error in &lt;ref name="enc-shingen"/&gt;.  This ref tag should not be closed. This is a similar bug as the previous one where there are mismatched tags which are usually handled by the Tidy post-processor.  We need a strategy for this in general. Here is the smallest test case to reproduce this:  boo yahoo 
 * Two issues reported in Thread:User talk:GWicke/Normalization of wiki text
 * The weather box in http://parsoid.wmflabs.org/Broken_Hill,_New_South_Wales is rendered incorrectly
 * Parser: For this example below,
 * Serializer: This example

bar . But, this example ; foo: baz bar roundtrips correctly. &lt;h2&gt;foo&lt;/h2&gt;&lt;p&gt;bar
 * Serializer: Lost newline

&lt;/p&gt; serializes to: ==foo== bar * foo ** bar *** baz with master revision 33dc9abb0db364bb41ca0b06d368bde386719d6a. This is a problem with diffWords which swallows newlines. diffChars works better, but it takes too long and too much memory. Alternative would be to use diffChars on "small" lines. :foo ::bar roundtrips as     :foo :bar ;i1 :d1 :d2 parses as dt -&gt; [dd, dt] instead of dt -&gt; [dd, dd]
 * Parser: Text not wrapped in &lt;p&gt; tags. Look at HTML output for http://parsoid.wmflabs.org/_rt/mw:Parsoid/Todo  In several sections, text after headings in certain context appears bare.  I haven't yet reduced this to a small test case.
 * Diffing bug: Try roundtrip diff on a page with content
 * Serializer: Indenting is broken
 * Parsing is broken for multiple defns
 * Roundtripping of html attributes -- needs fixing
 * Escaping/serialization of html entities -- not done

Issue on http://parsoid.wmflabs.org/_rt/pt:Foo
Is it possible to have an https or protocol relative link for reporting bugs on this page? The address https://parsoid.wmflabs.org/_rt/pt:Foo doesn't seems to work. Helder 15:38, 8 June 2012 (UTC)


 * It certainly is possible, but not really our top priority right now. There is no authentication info involved, and all the content is public. -- Gabriel Wicke (GWicke) (talk) 22:17, 20 June 2012 (UTC)

Issue on http://parsoid.wmflabs.org/_rt/pt:HTML
The article [//pt.wikipedia.org/w/index.php?title=HTML&oldid=30591267 pt:HTML] uses the non-existent " " to exemplify the way HTML works, but the code There is no " " in HTML is converted back to something else: There is no " should produce something like   rather than marking the entire paragraph as template-generated.
 * This is pretty much what we intend to do: see Parsoid/HTML5_DOM_with_microdata. -- Gabriel Wicke (GWicke) (talk) 10:03, 20 June 2012 (UTC)


 * Add round-tripping of category links and the like, right now these are lost
 * Fix the newline-at-the-end-of-an-li-or-before-a-ul behavior such that
 * the parser doesn't output newlines before each  and before each  -within-a-
 * the serializer doesn't depend on these newlines to output correct wikitext
 * (newline handling in general is slated to be revamped but I wanted to document this case specifically because VE works around it)
 * Feature request: the first  inside an   should be ignored, whereas every subsequent   should be treated as if it had stx=html (the latter is already done). This means that   should be serialized to
 * This is because the parser doesn't wrap the text in a list item in a paragraph (i.e. the text is directly in the list item) whereas VE's linear model does wrap it in a paragraph because listItem nodes can't contain text directly. The HTML->linmod converter can deal with adding the paragraphs quite easily and cleanly, but removing these paragraphs in the linmod->HTML converter with the conditions being this specific is a pain (we currently do do this as a workaround, but it's ugly). So we can tolerate input that doesn't have wrapped first paragraphs, but Parsoid doesn't tolerate input that does have wrapped first paragraphs; it would make our lives easier if it did
 * -> Listed in the Parsoid/Todo. Will also be needed for table cells. -- Gabriel Wicke (GWicke) (talk) 10:03, 20 June 2012 (UTC)