Parsoid/Deployments/2020

Dec 15 - 17: ✅

 * T269901: More fixing of bogus recovery of trimmed whitespace
 * T263052: Parsoid Cite add class mw-ref-follow for refs with follow
 * T269748: Cast references attributes to strings
 * T269068: Bump composer/semver versions
 * Inject almost all dependencies into the SiteConfig
 * extension: Inject HookContainer into DataAccess
 * Purge Sanitizer proxying from ParsoidExtensionAPI
 * Split Sanitizer into Core\Sanitizer and Sanitizer Token Handler
 * Use SiteConfig in Sanitizer methods instead of Env
 * Remove dead code

Dec 8 - 10: ✅

 * T265737: Section Wrapping: Relax section start constraints
 * Followup to e5862895: Flip incorrect langVariants checks
 * DRY out SiteConfig::mwaToRegex by moving implementation to core
 * T184779, T265737, T268992: Redo template-section boundary conflict code
 * T242298: Fix buggy DSR computation in section-template conflict resolution code
 * T269068: Update composer/semver constraints
 * T269022: Refine adding module(style)?s in extapi
 * Dedupe modules

Dec 1 - 7: ✅
html2wt fixes and refactoring
 * T262448: html2wt: Don't test for separator validity between unmodified nodes
 * T268737: Selser: Fix incorrect recovery of trimmed whitespace
 * Fix which parent to check for potential change
 * Minor refactoring: Move separators object to SerializerState
 * Minor refactoring: Don't pass around $state in Separators.php

Use a single live document in html2wt pipeline
 * T265061: Accept DOMDocumentFragment in $serializer->serializeDOM
 * T265061: Split up ContentUtils::ppToDOM to avoid document creation
 * T265061: Stop keeping liveDocs on the environment
 * Localize knowledge of a document data bag to DOMDataUtils

Table Fixups
 * TableFixups: Properly handle newlines in collected attributes
 * T112300: Get rid of incorrect assumptions from TableFixups

Parsoid Cite extension changes
 * T51538: Cite error tag name defined in references not used before
 * T51538: Adding cite error ref in reference with mismatched group
 * T51538: Adding cite error ref in reference no content defined
 * T51538: Add reporting of cite error of a ref in reference without name specified
 * T249742: Fix porting bug from 71662ea
 * Reconcile some ref errors cases with $hasFollow
 * Use inReferencesContent flag to get rid of processRefsInReferences
 * Add method to check if in references content

Other assorted fixes
 * Replace references to FakeConverter
 * T266143, T251641: Shore up serializing spans for inline media
 * T94793: Native ImageMap implementation
 * T266926: body is unset in ExtensionHandler when serializing to self-closed
 * T268207: composer: Relax monolog version, per MW 1.36
 * composer: Drop object-factory 2.x compatibility, MW is on 3.x now

Nov 17 - 24: ✅

 * getDOMHandler always returns a handler
 * Use $extApi->pushError for invalid references parameters
 * Give extension tag handlers a way to indicate an error while parsing
 * T266666, T214994, T267059: Add a localization html2html pass
 * T265094: html2wt: Newlines in piped wikilinks are okay in single-line context
 * T109793: Reset slc before serializeChildrenToString
 * Only look for data-mw.body.id in the top level dom
 * T262408: Selser: Skip over templates while adding selser-wrapper &lt;span>s
 * T51538: Adding check for illegal attributes in references tag
 * Add selser preprocessing time to domdiff component
 * T247212: Allow wikimedia/ip-utils ^3.0.0
 * T267158: Keep all meta markers with transclusion info
 * Bail if sealed fragments are found in link target position
 * T266356: Mark up cite errors in embedded content
 * Switch some uses of matchTypeOf to hasTypeOf in Cite
 * T51538: Fix adding 'cite_error_group_refs_without_references' to unnamed refs
 * Clean up signatures of ref group accessors
 * Suppress linkbacks for all refs in embedded content
 * T267156: Fix old wt2wt bug for lists nested in li-hack lists
 * More consistent access to env in serializers
 * T265930: Add "reference" class to &lt;sup> in addition to the "mw-ref" class
 * T267074: Accept 3.x version of ObjectFactory as well as 2.1+

Nov 4 - 9: ✅

 * T267146: Don't access MWServices in onRegistration hook
 * T262408, T262448: Prevent selser corruption from unclosed tags in list items in some cases
 * Tighten type for WikitextSerializer::getAttributeValue
 * Fix up the return type from serializeDOM
 * T262408: SelSer: Preprocess DOMs to wrap text children of &lt;li> in &lt;span>s
 * Setup for parsing oldHTML if none provided
 * Consolidate adding ref errors at references insertion
 * T51538: Refs in group with no &lt;references group="xxx" />
 * T265061: Use fragments when dom diff'ing
 * T265061: Set a top level doc when serializing
 * Fix porting bug in TokenUtils:shiftTokenTSR
 * DRY out 'atTopLevel' pipeline stage attribute
 * T265954: Add setting to enable/disable REST API.
 * T230219: Port time profiling code from Parsoid/JS and adapt it for Parsoid/PHP
 * T266294, T266356: Traverse with inEmbeddedContent for ref in ref
 * T266171: Fix processing embedded keys in processHiddenHTMLInDataAttributes
 * T266171: Templated template targets are already expanded to dom
 * Make $inEmbeddedContent an explicit stack
 * Set inEmbeddedContent when running processRefsInReferences
 * Get rid of $selser->preprocessDOM
 * Move section tag stripping to preprocessDOM
 * Section Wrapping: Extract a proper Section class
 * T264782: Pass dom to entrypoint instead of string
 * Bump wikipeg to a fork of 2.0.4

Oct 26: ✅

 * T266285: Revert "Don't generate 6-element DSRs yet"

Oct 20 - 22: ✅

 * Don't generate 6-element DSRs yet (Deploy hack for wmf.14, will be reverted in wmf.15 a backport to wmf.14)
 * T263502: Move whitespace heuristics for trimmable whitespace nodes to buildSep
 * T51538: Added cite error checks for numeric digit in name; added cite error checks for invalid text direction
 * T240642: Fix broken nested &lt;noinclude> handling
 * T260959: Rest: Use upstreamed ResponseException
 * SerializerState: Bug fix + code refactoring in current line handling
 * T262408: Record length of trimmed whitespace in additional DSR fields
 * Always wrap follows content (avoids crashers when there's a valid follow with some other error)
 * Backwards incompat for invalid follows will result in the contents of the ref being dropped; selser will save us.
 * Fix invalid indexing of null which triggers notices in PHP 7.4
 * HTML->WT metrics: Tweak html2wt timing metrics
 * T254501: Introduce preprocessing in the HTML -> WT direction
 * T230861: Release fragments once they're no longer needed
 * T221790, T179082, T217705: One document to rule them all (do entire parse in a single document context)

Oct 13: ✅

 * Move SelSer fallback code to the parent wikitext handler
 * Move applyPageBundle to PageBundle class
 * Remove unused Parsoid::html2html
 * Convert offsets in pb2pb after data attribs are loaded
 * Make ParsoidHandler::updateRedLinks consistent with ::languageConversion
 * T260960, T263502: Avoid double dip whitespace reuse when getting trailing space
 * Minor: Remove node and npm references in a couple of comments
 * T255007: Handle mw:DisplaySpace when serializing mw:Nowiki content
 * T254502, T260960: Avoid modified-wrapper on mw:DisplaySpace

Sep 22 - Sep 24: ✅

 * T249743: Fix PHP log notice when origHref is '' or null
 * T214241: Preserve typeof on extension first encap wrapper nodes
 * T254502: Remove mw:Placeholder from mw:DisplaySpace
 * TableFixups: s/$node/$cell for code clarity
 * T261711: Fix internal links without short URLs set up
 * Highlight when we have a valid follow
 * T262986: Removed parsoid cite extension follow ref id's
 * T262452: Handle exceptional conditions in pb2pb endpoint pageconfig creation
 * T262637, T263025: Fix crashers in TableFixups handler + document unhandled scenarios
 * TableFixups: Simplify combinable cell checks
 * T262409: Refine source reusability check for nested list items
 * Restructure [[File:...|link=...] handling to avoid complex if-condition

Sep 14 - Sep 16: ✅

 * Fix integrated phan testing when optional langconv component is not present
 * T51538: Adding "follow" functionality to the Cite extension
 * T262636: Coalescing null rel to empty string
 * Remove data-mw->attribs hack in template wrapper
 * Drop html5shiv
 * "\n", not '\n': Fix regression introduced by 18a4fc70
 * T262454: html2wt: Catch RevisionAccessException & return HTTP 404 instead
 * T108504: Don't generate |link=... in image markup when href is encoded

Sep 8 - Sep 10: ✅

 * Handle media-in-link scenario
 * Bail on wikilink-in-wikilink scenarios
 * AddMediaInfo need not be idempotent since it runs before unpacking
 * Implement DOMCompat::replaceChildren
 * Move AddMediaInfo earlier in pipeline
 * ExtensionHandler: Process extension metadata on DOM instead of HTML
 * T179082, T217705, T221790: Remove special case for the html extension when unpacking
 * Convert error Response returns to throws of HttpException
 * TableFixups: &  cells can combine, not just   &
 * Get rid of modulescripts from MediaWiki library/API invocations
 * T178927: Fix unhandled reparse table attribute scenario
 * T261044: Avoid array_key_first for compatibility with PHP 7.2
 * T260583: Title with leading colon is invalid
 * Use DOMCompat:normalize when normalizing
 * Make `revid` an int as well as `oldid` and distinguish between null and 0

Aug 25 - Aug 27: ✅

 * T259855 PageConfigFactory: don't use latest revision if getRevisionById fails
 * T94603 Use entity list from RemexHtml
 * Make DataAccess::fetchTemplateSource use ParserOptions::getTemplateCallback
 * Clean up IE-compatibility code from Sanitizer::normalizeCss which was removed upstream
 * Rename DataAccess::fetchPageContent to ::fetchTemplateSource
 * Remove $revid parameter from DataAccess::fetchPageContent
 * T236812 Clarify uses of legacy Parser objects

Other changes to test infrastructure:
 * Round-trip testing fixes
 * Add parser test for T236866
 * The subtree of mw:Placeholder should be uneditable during selser parser tests
 * ParserTests: Move section wrapping tests to its own file
 * ParserTests: Create knownFailures json file if one doesn't exist

Aug 18 - Aug 20: ✅

 * T236866: WrapSections::getDSR: Don't assume all non-elements are text nodes!!
 * Ensure Parsoid doesn't throw when is used w/o Cite installed
 * T259676: Fix for missing content check where ..body->extsrc is undefined
 * Add an API method to determine if the optional langconv library is available
 * Clean up some of the block tag set notions in ParagraphWrapper
 * Fully implement 'page' parameter for paged media; mock it as well
 * Extension config option: Rename sealFragment to unpackOutput
 * T234932: Match ordering of core's srcset attribute; add mock API support

Aug 11 - Aug 12: ✅

 * T259677 Fix O(N^2) behavior parsing nested broken templates
 * Use expanded attr when fragments have been expanded to DOM
 * Allow  to override nativeGallery and thumbsize
 * Changes that don't (shouldn't) impact production:
 * ParserTests improvements and refactoring
 * T258767 Improved TimedMediaHandler test harness (should not have any effect on production output)

Aug 4 - Aug 6: ✅

 * Add lang attribute to ResourceLoader url to get correct ltr/rtl CSS
 * T259311: Fix incorrect computation of 'inTpl' predicate in HandleLinkNeighbours
 * T259063: Error handling when json_decoding in ParsoidHandler::getParsedBody
 * T257504: Protect against invalid start/end offsets in ParsoidExtensionAPI::renderMedia
 * Improvements to MWSiteConfig, MWDataAccess, MWPageConfig (divergence from bundled VE extension)
 * Changes that don't (shouldn't) impact production:
 * Factor out (parser test) TestFileReader that can be used by core; move known failures into Test object
 * T251422 SiteConfig unit tests

Note: due to T259832, parsoid v0.13.0-a4 didn't actually get rolled out with the train; the train deploys this week included parsoid v0.13.0-a3 due to an oversight. This was discovered on Aug 6 after 1.36.0-wmf.3 had rolled out to all wikis, and parsoid v0.13.0-a4 was backport-deployed to all groups.

July 31 - Aug 3: ❌ Not taken, see T257970
The train for 1.36.0-wmf.2 was rolled back to group 1 due to T259311. New  branch for Parsoid rooted at. The following patch was cherry-picked, and tagged.
 * T259311: Fix incorrect computation of 'inTpl' predicate in HandleLinkNeighbours

July 28 - July 30: ✅

 * T255190 Rewrite sep indent pre suppressing in terms of block scope
 * Cleanup workaround added for TimedMediaTransformOutput::getAPIData
 * (Extension API) Wrap extension token so that it won't be exposed
 * T88495 Add the start of a design doc
 * T110772,T255584: Fix incorrect handling of tpl-span-wrapped text in HandleLinkNeighbors
 * Test for valid data-mw as in DOMDataUtils::validDataMw
 * Refactor ParserTests/Test to allow it to be used by mediawiki-core

July 21 - July 23: ✅

 * ParsoidExtensionAPI: Fix bugs in migrateChildrenBetweenDocs
 * T255190: Get rid of WikitextConstants::$BlockScope(Open|Close)Tags
 * T255190: Get rid of WikitextConstants::$SolSpaceSensitiveTags
 * T234549, T233736: Add a test for serializing when a revision isn't found
 * T234549: Unify revision content checks when starting from wikitext
 * T255190: Get rid of WikitextConstants::$HTML['HTML4BlockTags']
 * T255190: Be precise about why list item ends are implied
 * Remove some porting notes
 * Remove unported JS->PHP file
 * Combine doBlockLevel tags to form a notional wikitextBlockElems
 * T255190: Move doBlockLevel constants to WikitextConstants

July 14 - July 16: ✅

 * T252448: Make wikimedia/langconv optional
 * The Parsoid ConfigRegistry appears to be unused
 * Add missing MediaWiki\Rest\Handler::getParamSettings implementations
 * T255190: Use $onlyInlineElements from Remex when DOM p-wrapping
 * T248343: DRY out default ParsoidSettings into a single location
 * Remove parseExtensionHTML
 * Don't reallocate ExtensionTagHandler while processing ext-tags
 * Fix typo-bug in ParsoidExtensionAPI
 * Collapse heading handlers (gen ids, dedupe ids) into a single pass
 * Reorder DOM post processing passes for a more coherent ordering

July 7 - July 9: ✅

 * T51538: Match core error key for self-closed ref without name
 * Rename "cite_error_ref_no_text" to "cite_error_references_no_text"
 * T51538: Add Cite error for named refs that attempt to redefine the content
 * Remove $nestedRefsHTML
 * Move setting data-mw on autogenerated references to createReferences

June 30 - July 1: ✅

 * Pull out a renderMedia method from the gallery extension
 * Extension API: Use generic 'context' option instead of 'inlineContext'
 * T254051: Match upstream change for margin-inline-start
 * T51538: Adding error handling for cite refs with name but no content
 * Remove HandlePres dom pass
 * Clarify when content is missing in cite
 * T255746: Preserving leading whitespace in indent-pre suppressing contexts
 * Replace assertElt use with phan-var declarations
 * T251920: DOMNormalizer: Don't convert the odd visual newline to

June 23 - June 25: ✅

 * Fix deprecated uses of ParserOptions::newCanonical
 * T255500: Compute DSR info for mw:DisplaySpace
 * T223194, T229740: Replace some instances of DOMUtils::assertElt

June 16 - June 17: ✅

 * Fix deprecated use of getCurrentRevisionCallback
 * HTML5TreeBuilder: Revert now unneeded Remex-bug workarounds + emit transclusion shadow meta at the end of a run of text nodes
 * Fix extensions to use Ext\DOMUtils instead of Utils\DOMUtils
 * Fix wrong var used in PWrap::hasBlockTag
 * T51538: Whitespace only content in tags is no content
 * T210647: Remove now unneeded workarounds for handling empty p tags
 * T245206: html2wt: Newly inserted elements shouldn't disrupt whitespace heuristics
 * T254804, T254646: Renamed terms in comments, code, and filenames.

June 9 - June 11 : ✅

 * T222560, T222770, T222774, T247110: Simplifications and cleanup of several PEG tokenizer rules
 * T197879: Move armoring French spaces to a DOM post-processing pass
 * T233815: html2wt in selser mode: Don't crash on bad DSR
 * T133320: Hook up ExtensionRegistry with Parsoid and related fixes

June 2 - June 4 : ✅

 * T222561: Don't start autolink matching at "/"
 * Remove nodeName check in meta handler
 * T253703: <*include*> tags don't need newlines before/after
 * T51538: Fix Cite extension no name and no content error handling
 * T210647: Add caption to always suppressing
 * Use DOMUtils::hasTypeOf/matchTypeOf/addTypeOf consistently
 * More phpcs related cleanups

May 26 - May 28 : ✅

 * T252648: DOMNormalizer: Fix method signature causing production crashers
 * Remove unnecessary LanguageVariantHandler::set/::has//::delete handers
 * Various phpcs related cleanups

May 12 - May 18 : ✅

 * T249958: Don't add unneeded extra newlines before/after existing lists

May 5 - May 11 : ✅

 * All extension DOM processors should extend Ext\DOMProcessor
 * Add extension registration mechanism to SiteConfig
 * T249740: Bug fix in complex mixed-attr-content multi-template scenarios
 * T250935: Bump version of Remex to v2.2.0 (bump zest and alea as well)
 * T231568: SiteConfig: DRY out common computation of various config properties across subclasses
 * T250888: Move DOMDataUtils::addAttributes to DOMUtils
 * Remove dead code in SiteConfig.php
 * Add accessor methods for trace / dump flags

April 28 - May 4 : ✅

 * T242746: Refactoring and cleanup of extension API, registration, and extension code (multiple patches)
 * T250629: Fix crasher in ParsoidExtensionAPI
 * T192913: html2wt: Fix link regexp to handle parser functions
 * T225849: Don't apply display hack at sol
 * T250111: Don't include .phan/ directory in composer library
 * T247093: Use PHPUtils::unreachable instead of assert(false)
 * Use PHPUtils::jsonEncode consistently
 * AttributeExpander: Code cleanup
 * Fix tokenizer to properly encode attributes needed by QuoteTransformer
 * Enable /page/lint/... endpoint
 * Allow composer/semver ^2.0.0

April 14 - April 16 : ✅

 * T242746: Refactoring and cleanup of extension API and extension code
 * Improve debuggability of non-canonical-DOM assertions
 * Use MediaWikiServices::getBadFileLookup
 * phan-related and tracing-related code cleanup

April 7 - April 9 : ✅

 * T221989: Fix edge case misnested-tag lint detection
 * T242746: Refactoring and cleanup of extension API and extension code (many patches)
 * T246701: Fix (JS -> PHP) porting bug in interwiki computation

March 31 - April 2: ✅

 * T242746: Refactoring and cleanup of extension API and extension code (many patches)
 * T235307: Remove use of Env in the REST API code in extension/* (many patches)
 * T248121: Drop unnecessary style modules in parsoid output

March 23 - March 26: ✅

 * T247910: Allow users to set tabindex=0 on elements
 * Use MediaWikiServices::getRepoGroup
 * T247212: Allow wikimedia/ip-utils 2.0.0
 * T242746: Refactoring extensions interface
 * T247353: Move DataParsoid.php

March 18 - March 19: ✅

 * T238385: Make id attributes not include ascii whitespace per spec
 * T238385: Escape % sign if from valid percent-encoding in fragment identifiers
 * T240055: Ensure Parsoid on scandium executes from git checkout
 * T245627: TemplateData: Handle multibyte unicode characters correctly
 * T237462: Port JSON content-model extension
 * Remove use of Env parameter in Poem extension
 * Use expanded href to test for xmlish tags in wikilink title position
 * T242746: Remove direct access to Sanitizer from extension code
 * Fix minor bugs in AddMediaInfo.php, Gallery, and Sanitizer.php
 * Make extension tags optional in ext-config
 * Bump dependency versions for wikipeg, psr/log, wikimedia/assert, mediawiki/mediawiki-codesniffer, composer/semver, mediawiki-phan-config, ockcyp/covers-validator, mediawiki/minus-x. wikimedia/langconv
 * T242746: Remove more Parsoid internals knowledge from Cite
 * T241164: Sync with Cite
 * T239642: Ensure tests pass in

Monday, Feb. 10 around 1:33 pm PT: ✅

 * T242746: No need to explicitly pass 'inTemplate' flag from extension code
 * T235273: Remove PHPUtils::jsSort call from TemplateHandler and correct tests
 * T235307: Remove Env use from content version resolution functionality
 * Restore return 406 for an incorrect offset type
 * T238845: Fix for request with revID that has no content
 * T204618: Whitelist `aria-hidden` attribute in Sanitizer
 * T240054: Move all code from Parsoid to Wikimedia\Parsoid namespace
 * Update langconv package to 0.3.3
 * T244412, T244413: Corrected PAGE_UNAVAILABLE check for invalid RevID
 * Fix notice when tracing selser
 * T242746: Start untangling Parsoid internals from extensions
 * T242746: Use extension config option for html2wt formatting of extension tags

Wednesday, Feb. 5 around 1:15 pm PT: 74730a3 to be deployed Reverted due to T244413

 * T242746: No need to explicitly pass 'inTemplate' flag from extension code
 * T235273: Remove PHPUtils::jsSort call from TemplateHandler and correct tests
 * T235307: Remove Env use from content version resolution functionality
 * Restore return 406 for an incorrect offset type
 * T238845: Fix for request with revID that has no content
 * T204618: Whitelist `aria-hidden` attribute in Sanitizer
 * T240054: Move all code from Parsoid to Wikimedia\Parsoid namespace
 * Update langconv package to 0.3.3

Wednesday, Jan. 22 around 1:18 pm PT: ✅

 * T242513: Serialize reference tags by themselves on a line
 * Rename DOM handling methods toDOM/fromDOM to reflect reality
 * T243008: Fix PHP Notice when using pb2pb endpoint
 * Refactor Parsoid::html2html into Parsoid::pb2pb
 * T241146: Use DOMDataUtils::getNodeData in MachineLanguageGuesser
 * Remove backward-compatibility code for old-style DOMTraverser handler

Thursday, Jan. 16 around 10:15 am PT: 02f0066 to be deployed Reverted due to T243008

 * Refactor Parsoid::html2html into Parsoid::pb2pb
 * T241146: Use DOMDataUtils::getNodeData in MachineLanguageGuesser
 * Remove backward-compatibility code for old-style DOMTraverser handler

Monday, Jan. 13 around 1:32 pm PT: ✅

 * Clean up DOMTraverser handlers to use new calling convention

Wednesday, Jan. 8 around 1:46 pm PT: ✅

 * Fix 'source variant' functionality; add missing static types to src/Language
 * Cleanup extension arg normalization code
 * html2wt: Account for missing $dsr
 * T238934: html2wt selser: Return http 409 if the previous revision is not found
 * Ensure that SyncTransformManager has correct frame
 * T237318: Ensure Sanitizer::sanitizeToken uses correct frame source text
 * T238022: Fix autolink url parsing code
 * Minor efficiency tweak to Util::lastUniChar
 * Ensure LiFixups::handleLIHack uses correct frame source text
 * T228217: Add --maxdepth option to bin/parse.php
 * ParsoidHandler: Fix throwing ValidationException for invalid domain
 * Make Monolog version match MediaWiki core
 * Fix offset type handling for Lint API requests