Core Platform Team/Initiative/Unify Parsers-Phase 2/Initiative Description

Project Leads
Subbu Sastry

Current state
Blocked, waiting for phase 1 to be complete.

Some work is less defined until several tasks are complete which are expected to define the rest of the project. See milestones and major tasks below.

Expected start
Late FY1920 Q1

Summary
TBD

Significance and motivation
TBD

Milestones and major tasks

 * Fix known issues in Parsoid relating to using Parsoid HTML for read views
 * Complete language variant support
 * Address any other issues in Parsoid/Known differences with PHP parser output
 * Finish updating legacy PHP parser media output to match Parsoid
 * This might require updates to some bots and gadgets
 * Identify any other Parsoid feature gaps (This can/will reveal new work)
 * Finalize new parser hooks API (Parsoid and legacy PHP parser have different pipelines and internals)
 * Migrate over Wikimedia extensions using existing hooks
 * Compatibility Testing (this can/will reveal new work)
 * Establish regular visual diff QA runs to identify uncaught issues
 * Analyze results and file Parsoid bugs or identify any wikitext changes required on wikis
 * Decide on what compatibility is acceptable (100% compatibility is not achievable and there might be insignificant output differences)
 * Connect with CL and engage with community if we require any wikitext / templates to be fixed (This can/will reveal new work)
 * Production Readiness
 * Improve Parsoid performance (undefined until phase 1 is complete)
 * Switch over all read views to Parsoid on the Wikimedia cluster

Outcome
Reduce complexity in core

Baseline

 * TBD

Target

 * TBD

Methodology and rationale
TBD

Time and resource estimate
18-24 months

3.5 FTE and .5 Engineering and Project Manager for the duration

Possible augmenting of other engineers, but more clarity is needed.

Dependencies
Reduce Extension Interface Surface Area

Collaborators

 * Parsing Team
 * Core Platform
 * Performance
 * SRE

Stakeholders

 * Client teams (Web, VE, Flow, CX, Apps)
 * Bot, Gadget, and Extension authors (only as pertaining to the Wikimedia cluster initialy)
 * Editing community
 * Core Platform

Open questions

 * To what extent do we want to refactor the Parsing Interface in Core? It is currently coupled to the templating engine.
 * What is acceptable feature parity between Parsoid and the PHP parser? How do we decide this? What qualitative analysis should be used.
 * What are our strategies for engaging with the community on this change?
 * What additional work is required on the linter?

Phabricator
https://phabricator.wikimedia.org/tag/parsoid-read-views/

Plans and RFCs

 * The Long And Winding Road To Making Parsoid The Default MediaWiki Parser ( Slides Video )

Other documents related to parser unification

 * Parsoid/Known differences with PHP parser output
 * Parsing/Parser Hooks Stats
 * Parsing/Media structure
 * Parsoid/LanguageConverter