Core Platform Team/Initiative/Unify Parsers-Phase 1/Initiative Description

Project Leads
Subbu Sastry

Current state
Implementing, currently porting Parsoid to PHP

Expected start
Started in FY1819 Q2

Summary
Parsoid is a Node.js codebase. This project aims to (a) port Parsoid to PHP (b) integrate it with MediaWiki core (c) Deploy Parsoid/PHP on the Wikimedia cluster and switch over all clients to use Parsoid/PHP.

Significance and motivation
The larger project is to make Parsoid the default parser for MediaWiki starting with the Wikimedia cluster wikis. The simplest and shortest path to getting there is to port Parsoid to PHP.

The reasoning for this is covered on Parsing/Notes/Moving Parsoid Into Core and in the Tech Talk about making Parsoid the default MediaWiki parser (see the Links and Resources section below).

But, the TLDR is that porting lets us (a) fix the architectural complaints about Parsoid as a standalone service (b) leverage code from the MediaWiki core codebase to bring Parsoid and the legacy PHP parser closer together (c) provide simpler installation options for non-Wikimedia wikis while providing VisualEditor and Wikitext Linting out of the box (d) reduce some of the async-related complexity from the codebase.

Milestones and major tasks

 * Prototyping: Early experiments with porting to evaluate feasibility of the port, potential performance issues, anticipated roadblocks, and expected difficulty. (Status: ✅)
 * Preparation: Fix the JS codebase to smoothen and simplify the porting process -- might include some significant code refactoring. (Status: ✅)
 * Porting: Port the Parsoid codebase to PHP including building interfaces to integrate Parsoid into MediaWiki core. (Status: )
 * Testing & QA: Rigorous testing and performance tuning to establish production readiness of Parsoid/PHP. (Status: )
 * Switchover: Switch existing clients to use Parsoid/PHP.
 * Switch off Parsoid/JS

Outcome
Reduce complexity in core

Baseline

 * Percentage of clients using Parsoid: 0%

Target

 * Percentage of clients using Parsoid: 100%

Methodology and rationale
This is a step in a larger project, as such the metric is about completing the porting process so we can get to the next phase of development. The best way to do this is to ensure no clients are using the JS version of Parsoid. The next phase will focus on the ultimate goal of moving to a single parser.

Time and resource estimate
9-12 months, completion is expected near end of FY1819 Q4 / early FY1920 Q1

3.5 FTE for the duration

Augmenting with 2+ FTE for 3 months

0.5 Engineering and Project Management for the duration

Dependencies
Build new HTTP API

Collaborators

 * Parsing Team
 * Core Platform
 * Performance
 * SRE

Stakeholders

 * Client teams: Web, VE, CX, Android, Growth (for Flow)
 * Editing community
 * Core Platform

Open questions
None (some for Phase 2)

Phabricator
https://phabricator.wikimedia.org/tag/parsoid-php/

We are not tracking porting of individual files in Phabricator. We are using the Parsoid-PHP Phabricator board for tracking everything that is not about mechanical porting of individual files in the codebas.

Plans and RFCs
The Long And Winding Road To Making Parsoid The Default MediaWiki Parser ( Slides Video )

Other documents
Parsing/Notes/Moving Parsoid Into Core