Parsoid/Deployments/2021

Dec 14-16: ✅

 * Revert "Disable translate annotations + revert 2.3.0 to 2.4.0 version bump"
 * T228616: Fix serializing links when a namespace conflicts with an local interwiki

Dec 7-9: ✅
Non-translate patches:


 * T287216: Add ContentMetadataCollector interface
 * WikiLinkHandler: use the value offset for the fragment TSR
 * Remove the return value from DOMNormalizer::normalize
 * When splitting nodes in PWrap, clone the NodeData
 * T214651: Implement diffHandler for Cite extension
 * T263203: Handle stripped characters in free external links in html2html
 * T292022: Fix serializing links using local interwiki plus language link

Translate patches (all disabled except for HTML -> WT to provide compatibility for HTML version 2.4.0)
 * Disable translate annotations + revert 2.3.0 to 2.4.0 version bump
 * Fix "undefined index" on annotation nesting removal
 * T295233: Avoid crashers on bad annotation nesting
 * T296107: Treat translation unit marker comments as not-SOL transparent
 * T296169: Don't generate nested annotated ranges in HTML output
 * Only accept  for tvar annotation when the page has annotations
 * Only accept  when tvar is annotation tag
 * T295233: Hack: Add partial support for older tvar syntax to prevent crashers
 * T295406: Hack: Drop annotation tokens in template context
 * T295406: Don't break template continuity when moving annotation range metas
 * T295330: Fix DSR computation when end tag is pulled out of range
 * T295233, T295236: Fix global state of annotation id and DOMPostProcessor pass order
 * T295243: Fix regression in DOMNormalizer
 * Bump content version from 2.3.0 -> 2.4.0
 * T261181: Add support for annotation tags in Parsoid

November 16-18: ✅

 * T295104: Don't record serialization metrics for empty html
 * Add RemexPipeline
 * Get rid of the foster comment hack
 * T214648, T294450: DOM diff galleries
 * T214651: Add an experimental method for extensions to diff nodes
 * Remove check for duplicate data IDs
 * Have Remex tell us when it clones attributes instead of detecting that

November 2-4: ✅

 * Deduplicate subtree recursion in DOMDiff.php
 * Dump data attributes without cloning
 * Rename isTopLevel to atJsonRoot in DOMDiff.php
 * T235295: Replace DOMCompat::attributes with DOMUtils::attributes

Oct 27: tagged v0.15.0-a6 for vendor

 * T226428: DOMRangeBuilder cleanup and data flow improvements
 * T288640: WrapSections: Don't get tripped by tags found in content
 * Require RemexHtml 3.0.0
 * T283560: Bump some dependencies to match upstream
 * Drop composer v1
 * Bump composer for dependabot alert

October 26-28: ✅

 * T292923: Deal with more malformed transclusion parts
 * T291692: Account for php zero string, "0"
 * Unbreak ContentUtils::dumpDOM for DocumentFragments
 * Remove deprecated PegTokenizer::tokenizesAsURL
 * Remove temporary tsrDelta
 * Add a namespace for HTML5TreeBuilder
 * Set the bag property on child documents
 * WrapSections cleanup
 * WrapSections OOP state class
 * Update wikimedia/zest-css to 2.0.2
 * Don't call stashDataAttribs for text tokens
 * Split out TemplateHandler::encapTokens to its own class

October 19-23: ✅

 * Merge TempData booleans into a bit field
 * T226428: Add a class for template ranges
 * Replace $dp->tmp->tplRanges with SplObjectStorage
 * Use OOP in WrapTemplates
 * Fix call to deprecated method ParserOutput::getProperties
 * Sanitizer: Replace RFC 3454 by RFC 8264 for clearUrl
 * Sanitizer: Use \u{xxxx} syntax in cleanUrl
 * T293308: massageLoadedDataParsoid: Ignore null source ranges
 * Followup to 6bddf56e: Resync Grammar.pegphp and Grammar.php
 * Declare TempData->tagId
 * Add TempData class for DataParsoid::$tmp
 * Lazy-initialise the DataParsoid->tmp property
 * Have massageLoadedDataParsoid return a DataParsoid object
 * Clone DataParsoid property by property
 * Remove unused property brokenHTMLTag
 * Make DataParsoid be a real class
 * Add NodeData class to replace the stdClass objects that DataBag stores

October 12-14: ✅

 * T292115: Generate timing metrics per KB of HTML (output or input)
 * Fix typo in data-parsoid property use in Linter
 * Log slow wt2html operations in ParsoidHandler
 * Allow composer 2.1
 * Don't try to deep clone DOM nodes
 * T251624: Deduplicate file info requests
 * Refactor WTSUtils::origSrcValidInEditedContent to pass SerializerState

October 5-7: ✅

 * T292250: Log errors from malformed data-mw->parts
 * T261181: Add annotation tags to SiteConfig.php
 * Improve test for pipeline expansion of complex attributes
 * T291741: Account for clients leaving off the template params array in data-mw

September 28-30: ✅

 * Make extension-output p-wrapper stripping more robust
 * T291452: Raise a client error if data-mw->parts is not an array
 * Remove duplicate EOFTk stripping from ATM attribute processing
 * T291234: Set an appropriate locale in maintenance scripts
 * 804cd7b6 followup: Fix breakages
 * ParagraphWrapper: fix slow array_shift loop
 * Followup to 1051b80f: Simplify statsd metric name for extension tracking
 * html2wt: Don't update sol state in appendSep
 * Remove in-actionable / stale warning log messages
 * DOMUtils: Get rid of isElt, isText, isComment helpers
 * Use instanceof Element instead of DOMUtils::assertElt in conditionals

September 21-23: ✅

 * Improve mw-empty-elt detection in Cleanup DOM handler
 * T291234: Replace preg_replace_callback with strtolower
 * Faster dedupeHeadingIds
 * T291234: Verify that strtolower works for all byte values
 * ParagraphWrapper: fix property grouping
 * ParagraphWrapper: fix bug from array_merge patch
 * Remove emit(Start|End)Tag helpers
 * AttributeTransformManager: Don't duplicate tokens unnecessarily
 * Don't defeat the empty-argument expansion cache
 * Reimplement TokenHandler profiling and tracing using a proxy
 * Simplify skipOnAny usage in token handlers
 * T290938: Loosen constraints on wrapper unmodified,
 * HTML5TreeBuilder: use associative arrays for attributes
 * Bug fix followup to 05cccaa5
 * Use a class for the return value of TokenHandler subclass methods
 * Make TokenHandler not override PipelineStage
 * T289358: Drop config vars if they fail to JSON strinfigy
 * In TemplateHandler use string functions instead of regexes

September 14-16: ✅

 * Optimize access of attribute name/value
 * T290697: Allow PHP 8.0 polyfills
 * Fix safesubst matching
 * Use CharacterData::nodeValue instead of CharacterData::data
 * T282031: Suppress end format newline if no params
 * T290044, T271566: Force block imagemaps
 * Learnings from User:Wyang/basic-Chinese-words
 * T221488: Add "decoding=async" attribute to img tags
 * Hand-coded matcher for plain sequences of urltext
 * Track uses of extensions in Talk namespaces via statsd
 * Fix inappropriate usages of array_merge
 * Fix TokenHandler::process O(N^2) performance

August 31-Sept 2: ✅

 * Migrate out valid follow contents after processing refs


 * Reserialize processed refs if content differs


 * Cite: Rename functions pushing/popping embedded content flags


 * T289331: Don't process ref-in-ref as embedded, unless content differs


 * Move content differ check up higher


 * Only call ReferencesData::add when adding


 * html2wt: Tweak handling of excess nls around rendering transparent nodes


 * T264027: Stop stripping trailing &lt;nowiki /&gt;s


 * T266406, T257629, T289330: Make magic words case-sensitive

August 24-26: ✅

 * T289107: Suppress recursion protection when doing a full table parse
 * T288715: Add class="extiw" to interwiki link  tags
 * T272186: Add noresize class on imagemaps
 * T287156: Replace Content::preSaveTransform call to ContentTransformer::preSaveTransform
 * Update wikipeg in package.json

August 9-12: ✅

 * Update RemexHtml namespace
 * Update WikiPEG namespace

August 8: ✅

 * T287972: Update Dodo to 0.3.0, Remex to 2.3.2, and Zest to 2.0.1
 * T287972: Update wikimedia/langconv to 0.4.2
 * T287972: Bump versions of wikimedia/alea and wikimedia/wikipeg
 * T287163: Avoid using deprecated ParserOptions::getUser
 * Reinstall service-runner to bump loose dependencies

August 3-August 5: ✅
Contains all the changes in -a11, -a12, and in addition:
 * Don't use non-standard Document::saveHTML method
 * The ::querySelectorAll and ::getElementsBy* helpers don't always return array
 * T254804: Copy some language used in the core sanitizer
 * T254804: Remove unused methods from TestUtils.js
 * Allow Node::getAttribute to return `null`
 * Introduce DOMCompat::nodeName($node)
 * Move nodeNameCheck to linting
 * html2wt: Simplify logic to make separators indent-pre safe
 * And followup: Fix regexp that looks for indent-pre whitespace
 * Bump content version from 2.2.0 -> 2.3.0
 * Sync  with Cite extension

July 30: ✅
As with -a11, this version of Parsoid is being released to mediawiki-vendor to verify that CI and other issues are fixed in beta before the train rolls for production, and to unblock rt testing. It is expected that 1.37.0-wmf17 will have a follow up build as  was deployed to beta without rt-testing (but has since been rt-tested).

This version contains all the changes from  and in addition:
 * T287611: Fixes for Dodo issues w/ CI and DiscussionTools:
 * Use Parsoid's version of idle-dom and dodo when testing in integrated mode
 * Only set up DOM aliases once
 * Minor cleanup in ExtensionHandler.php
 * T162399: Export ResourceLoader modules & JS config vars in meta tags in
 * T275444: Add baseconfig for banwiki
 * Documentation updates:
 * Include CODE_OF_CONDUCT.md and docs/ in our generated Doxygen documentation
 * Add documentation about information representation in Parsoid output
 * Get rid of unneeded li-hack handler which is a Tidy-era relic

July 29: ❌ *REVERTED*
This version of Parsoid is being released to mediawiki-vendor in order to verify that T287419 and similar issues are fixed in beta before the train rolls for production. It is expected that 1.37.0-wmf17 will have a follow up build as  was deployed to beta without rt-testing (but has since been rt-tested).
 * Fixes for Dodo issues w/ CI and DiscussionTools:
 * T287419: Upgrade wikimedia/dodo to 0.2.0
 * Add DocumentType and ProcessingInstruction to our DOM alias list
 * Be DOM-agnostic in DOMCompat/TokenList
 * T287611: Don't strictly enforce type hints in DOMCompat methods
 * T287463: Wrap next siblings in fixUpMisnestedTagDSR
 * html2wt: Centralize wikitext escaping to one place
 * html2wt: Use consistent casing for escapeWikitext
 * Add  to the automatically-generated documentation
 * Rename  to
 * PHPUtils::jsonEncode: Borrow some code from FormatJson::encode in core
 * T286840: Add leniency for active formatting elements in AddMediaInfo

This version was reverted after causing issues with DiscussionTools CI (fixed with 708618) and Parsoid CI (two issues under investigation). Plan is to investigate and fix the two Parsoid CI issues, then tag an -a12 to beta tomorrow.

July 27-29: ✅

 * T286839: P-wrap: Fix failing invariant by fixing undoIndentPre handling
 * P-wrap: Minor code simplification
 * T286786: Fix backtracking in solRegexp
 * Remove optional match
 * P-wrap: Minor code consistency tweaks

July 26: ❌ REVERTED and UNPUBLISHED
V0.14.0-a9 was the same as -a10 with the addition of patches to prepare Parsoid for a shift to using the Dodo DOM library. These patches caused core CI to break, and the problem wasn't solved by reverting Parsoid's deployment on  due to a bug in the   job (T287419). The v0.14.0-a9 tag was deleted from gerrit and removed from packagist to prevent its installation by the buggy  job.

July 20-22: ✅

 * T286786: Strip the double underscores from the extension bswRegexp
 * T286401: Fix DSR for unstripped stray closing tags & add b/c handling in selser
 * Ultra rare edge case: Fix bad check in html2w
 * T276512: Tweak heuristic to trim excess newlines from a separator string
 * html2wt: Rename awkwardly named function ( isForceSOL -> forceSOL )
 * html2wt: Fix incompatible min/max nl constraints early
 * html2wt: Minor code simplification and assorted minor cleanups
 * SerializerState: Minor cleanup to mimic code in Separators.php

July 13-15: ✅

 * T277760: Stop adding newlines in manglePreprocessorResponse
 * Lower the level of some noisy logs
 * Attribute exceeding limit to the right resource
 * T280381, T211946, T221238: Stop throwing on arbitrary resource limits,

June 29-July 1: ✅

 * html2wt: Simplify mergeSeparatorConstraints
 * T280381, T211946, T239841: Enforce wikitext limits like in the legacy parser
 * T283273: Replace freenode references with libera references
 * T283961: Prevent inline breaks in language variant text

June 8-10: ✅

 * Followup on : Add missing update to DDHandler

May 25-27: ✅

 * T247143: Extension: require MW 1.37+ and remove support for Revision objects
 * Minor cleanups: Remove dead code, use foreach, simplify exprs
 * Merge encapsulateTemplate and encapTokens
 * Pass media structure to figure handler

May 18-20: ✅

 * Account for all trailing newlines when wrapping text nodes
 * Upgrade to mediawiki/mediawiki-codesniffer 36
 * Only select on mw:Image when adding info
 * Magic word pipe isn't going to match table_start_tag

May 4-6: ✅

 * T279682: Handle optional spaces after table_attributes for table_row_tag
 * T279963: Fix interplay between recoverTrimmedWhitespace and DisplaySpace
 * T278565: Don't p-wrap tags in extension HTML
 * Always provide string to preg_match subject param

April 27-29: ✅

 * Bump wikimedia/zest-css version
 * T279682: Don't break on pipe in linkdescs if we're in an ext tag
 * T279803: Fixing asserting an about id
 * T264028: Disable single line context when serializing nowikis
 * T280050: UnpackDOMFragments: Improve DSR fixup for misnested A tags
 * T279867: Fix collecting attributish content in table fixups
 * html2wt: Use trimmed whitespace recovery heuristics only if needed
 * T280449: Add some logging around failing preg_match when serializing
 * T280672: Account for nested transclusions in table fixups

Apr 13-15: ✅

 * T279451: Use a protected key to distinguish comments internal to Parsoid
 * Remove option to
 * T279184: Fix undefined DSR notice
 * T279182: Handle comments that decode to valid json
 * T279223: Handle empty text nodes in Selective Serializer
 * Only call encapTokens if we're wrapping

Apr 6-8: ✅

 * Process jsconfigvars from core parser output
 * Minor bug fix in handling of in TemplateHandler
 * T277800: Add some logging around failing preg_replace when serializing

Mar 30-Apr 2: ✅

 * T269749, T277415: ListHandler: when in EOL state, close lists always
 * DOMPostProcessor: Extract function to update classes
 * DOMPostProcessor: Extract function to export style modules in
 * T276620: html2wt: Improve heuristics enabling reuse of separators from source
 * T274521, T30980: Be more permissive for extension tag names
 * T278074: Handle wikilinks misnested in media links
 * T278074: Log an error if media structure is messed up
 * Allow use of newest version of wikimedia/remex-html

Mar 23-25: ✅

 * T275918: French spacing: don't require non-space before French spacing
 * T223797: Strip newlines from Category sortkeys
 * ListHandler: Close holes in tracing code
 * WrapTemplates: Extract functions to improve code comprehensibility

Mar 17-18: ✅

 * Fix roundtripping interwiki links with complex targets that have colons (follow up to patch in -a27 for T276649 to fix regression).
 * Separators: Code cleanup and documentation fixes

Mar 16-18: ✅

 * T199070: More permissive regexp for nested extension start tags
 * No need to protect opening angle bracket in extension tag
 * Stop allowing spaces before extension closing tag name
 * Add some explanatory comments for ref in ref
 * T276649: Subpages on interwiki / language links are invalid
 * T276388: Check for multiples doesn't apply to follows

Mar 2-4: ✅

 * T248369: Follow on patch to wikilink in extlink for video and audio content
 * T248369: Adding linter case for media in extlink
 * T275503: WTUtils::isFirstEncapsulationWrapperNode expects a node
 * TplWrap: Fix edge case bug that expanded template scope unnecessarily
 * T240642: WrapSections: Don't crash if we have incomplete DSR information

Feb 23-25: ✅

 * T215999 Lint duplicated media width options; lint bogus media width options
 * T255007 Don't apply French spacing in raw text elements
 * T272232 Modify UTF-8 regex to use builtin PCRE validation
 * T242068 Add lint for Parsoid wikilinks in extlinks with italic or bold
 * T265720 TableFixups: One more mishandled scenario with newlines
 * Minor robustness fix in WikitextEscapeHandlers
 * WrapTempates: Get rid of unused property in template ranges

Feb 16: ✅

 * TableFixups: Minor tweaks
 * Don't apply border class to thumbs

Feb 16: ✅

 * Template Wrapping: don't expand range unnecessarily
 * T270373: Use prefixed text for content of links up the path
 * Separate arguments to getPipeline
 * Get rid of parseToplevelDoc
 * Add $frame to ParserPipeline and remove from pipeline stages
 * Refactor sanitization in a normalizeKey function
 * T267974: Contract multiple underbars in a row in refnames to a single underbar
 * Get rid of rtTestMode (used for pre-production testing only)

Jan 18 - 22: No deploy
No deploy due to week shortened by WMF holiday.

Jan 11 - 15: ✅

 * T270180: Handle selser edge case for first content-node of (follow up to T262448 patch included in )
 * T267974: Fix for Parsoid Cite refname whitespace handling
 * T237538: Disentangle Disambiguator extension from Parsoid
 * T260082, T271357: More papering over in References.php (follow up to T259676 patch included in )
 * T265094: Handle newlines in wikilinks for selser as well (follow up to T265094 patch included in )
 * Other: Disable rt-testing mode, clean up most old code from Parsoid/JS, tweak rt test configuration

Jan 5 - 7: ✅

 * T251641: Emit span tags instead of figure-inline
 * Bump output content version to 2.2.0
 * T51538: Add parameters to various cite errors
 * T270307: Allow Parsoid extension modules to be unregistered
 * Tokenizer: Don't eat leading spaces from template values
 * T269719: PHP 8.0 compatibility, Remove PHPUtils::coalesce