Core Platform Team/Initiative/Unify Parsers-Phase 2/Initiative Description

From mediawiki.org

< Unify Parsers-Phase 2

Summary

MediaWiki currently has two wikitext parsers: the (legacy) parser and Parsoid supporting different use cases. This project aims to arrive at a single parser that supports all use cases.

Significance and Motivation

Parsoid was developed to support HTML-editing clients but is also used by some read view use cases but not all of them. It is not tenable to have two parsers in the long term since it hamstrings development and upgrades to the parsing codebase, wikitext, and templates since we would have to add that support to both codebases. More importantly, the parsing pipelines in the two parsers are different which makes replicating functionality in both parsers more complex.

We would like to consolidate behind Parsoid as the new default parser given its support for HTML clients, annotated HTML output, and more structured internal pipeline. This requires identifying all output and feature incompatibilities between Parsoid and the legacy parser and bridging those gaps. This may also require updating (a) bots (b) gadgets (c) extensions (d) wikitext. This project aims to minimize all such changes by handling any differences with appropriate tooling and support.

Once Parsoid is deployed as the default and only parser for all wikitext-based use cases, we can embark upon much needed work to enhance wikitext and templates and make them easier to use, more performant, less error-prone, and easier to write tools for.

Outcomes

Reduce complexity in core

Baseline Metrics

None given

Target Metrics

None given

Stakeholders
  • Client teams (Web, VE, Flow, CX, Apps)
  • Bot, Gadget, and Extension authors (only as pertaining to the Wikimedia cluster initially)
  • Editing community
  • Core Platform
Known Dependencies/Blockers

Reduce Extension Interface Surface Area