Jump to content

Parsoid/Known differences with Core Parser output

From mediawiki.org

This page tracks known HTML output differences between Parsoid and PHP Parser and what the proposed solution is to resolve that difference. For a more user-oriented version see Parsoid/Parser Unification/Known Issues.

Differences because of implementation differences or functionality gaps

Difference Description Proposed resolution Status
Parsoid generates <figure> tags for block images whereas PHP parser uses <div> This is once again a HTML4 / HTML5 fallout. Parsoid uses semantic markup available in HTML5 that wasn't available in HTML4 at the time PHP parser was written.

Once this code is ready to be merged and deployed (and before we deploy this), we'll work with bot and gadget authors to use the new markup that will be generated.

T118517 is the RFC for updating PHP parser output. See Parsing/Media structure Yes Done
Parsoid doesn't handle language variants yet Parsoid doesn't yet parse language variant markup and doesn't provide a variant-specific rendering for reading clients. Language variant support in Parsoid has landed and has been deployed. The TODO at this point is finishing up support for all languages In progress In progress
Edge case differences between Parsoid's native implementation of some extensions compared to PHP implementations of the same For any extensions that process wikitext (ex: Cite, Gallery), Parsoid needs a native implementation of the same in Parsoid. However, because of implementation differences, there are edge cases where the output differs (ex: T51538, T96555, and a few others related to gallery). Some of these (T104662, T96555) will be fixed in Parsoid. Others might be tweaked in the PHP implementation, or we might just treat the edge case differences as undefined behavior which shouldn't be relied on by editors. Since these are edge cases, they will be fairly uncommon usage in wikis (otherwise, we would have fixed them). In progress In progress
Unavailability of some parser hooks in Parsoid compared to PHP parser Parsoid and PHP parser have different internals and hence not all the PHP parser's tag hooks are available in Parsoid. This page with parser hook stats lists extensions and the parser hooks they use. Some hooks like ParserBeforeStrip, ParserAfterStrip have no equivalent in Parsoid. So, in a Parsoid-only world, this could affect output and functioning of extensions like ‎<translate> We are going to develop a parser hooks API that is implementation independent (without exposing the internal details of how parsing happens) and port all the Wikimedia extensions to use this new API.

Parsoid is developing an extension API to support existing Parsoid-native extensions cleanly (Cite, Gallery, Poem, etc). We plan to extend the API gradually based on experience with adapting more extensions to work with Parsoid. In parallel, we will continue to deprecate unnecessary hooks and possibly rename some to reflect desired semantics.

This task is likely going to be completed after Parsoid moves to core.

In progress In progress

See Parsoid/Extension API

Parsoid doesn't handle pages in some namespaces properly (ex: File, Category) Parsoid doesn't have special handling for pages in namespaces that has generated content. For example, the content for a page in a Category namespace is generated dynamically. Content for a page in a File namespace similarly has some generated content. There is a good argument to be made that Parsoid shouldn't be duplicating this support and that clients should fetch this from the MediaWiki API directly. However, this does leave Parsoid clients in a bit of a bind because they don't know which of these namespaces are special in that content for those pages is better fetched from the MediaWiki API directly. So, some good resolution of this problem would be helpful. Maybe Parsoid should handle requests for content in all namespaces, and where that content is better served from the MediaWiki API, redirect the client to the right url?

See T153801, T151223, T148118

With Parsoid's integration into core and the ParserOutputAccess and RevisionRenderer and ContentModelHandler classes, Parsoid is only involved where wikitext processing is needed. Elsewhere, core code handles other functionality. Yes Done
Parsoid doesn't generate metadata needed for updating the links and page_props tables. See T310512 We'll have to add that at some point before Parsoid can replace the existing Parser class. To do To do

Differences identified via visual diff testing


We run mass visual diff tests comparing rendering of Parsoid output and PHP parser output. This table will be filled out as we inspect the visual diffs and identify the underlying cause for those diffs. In addition to the above source of diffs, here are a few more specific ones that we discovered.

Difference Explanation Bug / Proposed Resolution Status
Long tail of bugs related to read views Fix bug filed under the Read Views column on the Parsoid Phabricator workboard Fix all the bugs! In progress In progress
Missing resource modules in Parsoid output http://sv.wikipedia.org/wiki/Mir has a bunch of modules (ext.gadget.*) which the Parsoid output is missing T161278 In progress In progress
CSS differences in Cite Cite output needs styling (T156351 and T156350). This should also cover the styling requirements for cite ref links - some wikis like eswiki and frwiki skip the brackets. In addition, knwiki (Kannada) uses Kannada numerals for the ref text. The necessary styles for these various wikis are being added to visual diffing code. Most of these styles for wikis are good to be added to commons.css on these specific wikis.

However, as part of this, we've also identified some limitations in the Cite CSS output. We'll have to figure out how to resolve that.

In progress In progress

This is mostly done now for desktop views -- see dedicated wiki page here.

The only thing left is updating CSS for mobile rendering. That is right now left pending till we figure out how to prevent FOUC on mobile pages.

Broken / missing support for some extensions Pages extension output for wikisource pages is missing some wrapping divs (with associated styles). (Example)

Pages on viwiki are missing mapframe / osm maps (Example)

To be investigated To do To do

See also