Parsoid/Deployments/2019

Monday, Dec. 16 around 2:47 pm PT: ✅

 * Account for undefined dsr

Monday, Dec. 16 around 2:02 pm PT: ✅

 * Allow uppercase in language names; ignore bogus trailing content in langconv rules
 * Clean up TPL_META_TYPE_REGEXP
 * T240091: Assert that escaped text matching succeeds
 * T236912: Separate start/end null test
 * T236415: Fix for "0" in category link output not being present
 * T239929: Fix fatal error loading unknown language
 * T214649: Disable nativeGallery in production
 * T239830: Add metrics for language variant FST startup time

Thursday, Dec. 12 around 1:31 pm PT: ✅

 * T239252: Fix for invariant failed: Bad SourceRange length
 * Inline the single remaining use of TokenUtils::placeholder
 * Bump wikimedia-langconv dependency up to 0.3.2 for PHP

Thursday, Dec. 12 around 10:25 am PT: ✅

 * T237693: Ported mocha templatedata.js to phpunit TemplateDataTest.php
 * T238777: Fix flattening empty arrays
 * T238777, T237306: Handle stray "\r" in inlineline position
 * T239875, T240053: Account for getOrigSrc returning null
 * Ensure "params" are set on lints

Wednesday, Dec. 4 around 10:32 am PT: ✅

 * T239643: Bug fixes for PHP port of ConversionTraverser
 * Fix 'non-numeric value' warning when parsing bogus image width/height
 * Bug fix: Parsoid/PHP was setting the content version incorrectly in head
 * T239830: Add missing input size metric
 * T238456: Use the ParserLogLinterData hook
 * T239841: Add a FIXME comment to remove some slow `mb_strlen`s in the future
 * Bump wikimedia/langconv to 0.3.1

Monday, Dec 2 around 9:20am PT: ✅

 * T236869: Check if dsr is set
 * Fix DSR inconsistency with DisplaySpace; use DOMUtils::(has|match)TypeOf
 * T238457: Stop unnecessarily wrapping templated targets when inTemplate

Tuesday, Nov 26 around 8:25am PT: ✅
Parsoid/PHP:


 * Update wikimedia/langconv and wikimedia/assert dependencies
 * T234266: Stop using DOMDataUtils in LC pass
 * Update TemplateData lookup to match implementation of the hook

Parsoid/JS:


 * T234266: Stop using DOMDataUtils in LC pass
 * T213995: Follow redirects when fetching template data

Monday, Nov 25 around 9:14 am PT: ✅

 * T238849: Unbreak the VE-without-RESTBase scenario: Disable header checks

Wednesday, Nov. 20 around 10:30 am PT: ✅
Parsoid/PHP


 * T238463: Fix bug arising from JS vs PHP differences in falsy values
 * T234266: Revert "Log an error if HTML has already been variant converted"
 * Remove wikimedia/assert dependency
 * T237461: Port additional LintLogger features
 * T238665: Use sort, not asort to sort the attribute key array
 * T238721: html2wt: Handle missing property in data-mw template parts

Parsoid/JS


 * T234266: Revert "Log an error if HTML has already been variant converted"

Tuesday, Nov. 19 around 10:30 am PT: Rolled back because of problems found on canaries
Parsoid/PHP


 * T238463: Fix bug arising from JS vs PHP differences in falsy values
 * T234266: Revert "Log an error if HTML has already been variant converted"
 * Remove wikimedia/assert dependency
 * T237461: Port additional LintLogger features

Parsoid/JS


 * T234266: Revert "Log an error if HTML has already been variant converted"

Monday, Nov. 18 around 1:47 pm PT: ✅
Parsoid/PHP


 * T237886: Account for array key casting
 * T237103: Assert that template argument whitespace matching succeeds
 * T236864: Filter out invalid titles before requesting page properties
 * T237569: Linting: Convert DSR offsets to 'ucs2' before saving them
 * Lints are always stored originally in 'native' byte offset format
 * Distinguish "current offset type" from "request offset type"
 * T236930: Support for other content encodings in the REST API
 * T237463: Resolve some PORT-FIXMEs
 * T236867: Use frame source instead of stringifying tokens
 * T234266: Log an error if HTML has already been variant converted

Parsoid/JS


 * T237103: Assert that template argument whitespace matching succeeds
 * T234266: Log an error if HTML has already been variant converted

Thursday, Nov. 14 around 1:30 pm PT: ✅
Parsoid/JS
 * T235217: Followup #2: mostly-use protocol-relative URLs for media
 * T236868: Fix for crasher in substr
 * Account for class on poem tag
 * Backport changes from fe8630b

Parsoid/PHP
 * T234549, T238161: Add handling for new / missing pages in html2wt direction
 * Use the wiki configuration for module path
 * Accumulate, not overwrite, page output properties
 * Fix some Parsoid/JS & Parsoid/PHP differences in the section
 * T229077: Use underscores instead of spaces in dc:isVersionOf link tag
 * T235217: Followup #2: mostly-use protocol-relative URLs for media
 * Add a /u modifier to regexp that looks for valid separators
 * T236868: Fix for crasher in substr
 * Account for class on poem tag

Tuesday, Nov. 12 around 11:03 am PT: ✅
Parsoid/JS


 * T235217: Mostly use protocol-relative URLs for media
 * Remove dpContentType
 * T235656: One more fix

Parsoid/PHP


 * T215000, T235295: Workaround for missing xmlns attributes on DOMElement
 * Fix a few more phan warnings
 * T235656: One more fix
 * T235217: Mostly use protocol-relative URLs for media
 * T235295: Put the 'fake' xmlns attribute first, in an attempt to better match JS
 * Remove dpContentType
 * T236846: Fix for used without parameter causing crash
 * T237556: Return a 421 Misdirected Request if offsetType in incorrect
 * T235231: Fix for video media seek parameter

Wednesday, Nov. 6 around 1:35 pm PT: ✅
Parsoid/JS


 * Update wmf sitematrix
 * Clean up `typeOf` matching in Cite extension

Parsoid/PHP


 * Clean up `typeOf` matching in Cite extension
 * T237104: Guard missing property with the ?? operator
 * T227209: Leverage phan's bundled plugins
 * Bump mediawiki-phan-config to 0.8.0
 * T236865: Expose more information in domain validation error

Wednesday, Oct. 30 around 1:28 pm PT: ✅
Parsoid/JS


 * Fix poem JS and PHP extention to include class='poem' attribute
 * T235656: Process s found in nodes with mw:ExpandedAttrs typeof

Parsoid/PHP


 * Fix poem JS and PHP extention to include class='poem' attribute
 * T235656: Process s found in nodes with mw:ExpandedAttrs typeof
 * T233818, T234549: Return a clean 404 error if a non-existant title is requested via REST
 * Followup to : Use new Parser method to parse extension tags
 * T227209: Throw if Remex did any invalid name coercion
 * Use core's HttpAcceptParser
 * T236112: Work around missing support for non-wikitext content models

Tuesday, Oct. 28 around 9:50 pm PT: ✅

 * T235691: Ensure that DOMFragments have shifted DSRs as well
 * T235656: Ensure frameless images have their s included in )
 * addNormalizedAttribute without an original val
 * Packed gallery should round up sizes, not round down
 * Propagate DSR information into gallery captions
 * Port Gallery extension
 * Fix SerializerState:getOrigSrc to ensure valid substr offsets
 * T210752: lib/config/wmf.sitematrix.json: Update for napwikisource
 * Test against ref name length instead of coercing to bool
 * Backport link escaping change from
 * Fix copying over autoInsertedEnd when it wasn't set
 * Cleanup DOM pass: Unset null tsr
 * Fix data-mwtitle="undefined" found in HTML output for videos
 * Resolve port fixme for display space hack
 * Consistent casing for firstWikitextNode
 * Fix ported condition in makeSeparator
 * Fix porting condition when serializing text node
 * DOMDataUtils: Minor readability tweak to getNodeData
 * TableFixups: Protect clause with array length check
 * Images: Add type check for upright dimensions
 * Some separator cleanup
 * Fix test of array_search result
 * T231945: Clear invalid DSRs in
 * Update sitematrix

Monday, Aug 5, 2019 around 1:28pm PT: ✅

 * Only html p-tag is strong indent pre suppressing
 * Rename getMagicPatternMatcher to getParameterizedAliasMatcher
 * Remove some dead code + simplify getMagicPatternMatcher in WikiConfig.js
 * Use SiteConfig::magicWordCanonicalName in WikiLinkHandler::getOptionInfo
 * WikiLinkHandler.php: Fix incorrect port of ambiguous JS check
 * Work around Remex's failure to normalize DOM during DOM build
 * Use a nonambiguous title for parser function frames
 * Fix dsr test to check start against null, not 'truthy'
 * Parser functions need a frame title too
 * Move cached wiki configs to top level directory
 * Make Frame::title a proper Title object, not a string
 * T228223: Assert KVs aren't found when calling TokenUtils::tokensToString
 * Fix incorrect types (JS & PHP) in DOMFragmentBuilder and related utils
 * Ensure Parsoid native template expansion works on recent MediaWiki
 * Add addHTMLTemplateParameters options to bin/parse.php
 * Enable stage2 hybrid testing in jenkins (and fix some bugs)

Wednesday, July 24, 2019 2:00pm PT: ✅

 * T227216: Set top frame's source text when parsing from a stash (this was cherry-picked into the last deploy)
 * Fix regression in ru:Fable Legends (RT testing fix)
 * Pass  instead of   in DOM processors and handlers (RT testing fix)
 * Convert  to a
 * Even though this changes, the   property in   isn't actually used by any current html2wt code, so this is safe to deploy without a version bump
 * Follow up to Gallery: shift TSRs in the DOM
 * Gallery: shift TSRs in the DOM, rather than fibbing about
 * Set  during native template expansion
 * fixes (mirrored some formatting changes during PHP port over onto the JS side)
 * T226523: Test for pipe before trying to resolve target
 * T226451: Fix OOM when parsing template (minor JS tweak to  in the tokenizer)

Wednesday, July 3, 2019 5:00 pm PT: ✅ (deploy-20190703 branch)

 * T227216: Fix template corruption when reloading stashed wikitext

Wednesday, June 26, 2019 1:35 - 1:56 pm PT: ✅

 * Ensure that proper source texts are used when parsing (adds assertions)
 * Fix case ( vs  ) in attributes.
 * This ought to be a no-op in JavaScript, which has a case-insensitive HTML DOM
 * Other changes to the PHP port which should not affect the JS service

Thursday, June 20, 2019 around 10:34 am PT: ✅

 * Cite lint handler: Use nextSibling instead of nextNode
 * WTS: Remove some dead code
 * Minor cleanup to Utils/DOMPostOrder
 * Improve DSR on
 * Followup to : Improve DSR computation precision
 * Move the Frame class to its own JS file
 * Update baseconfigs and add for formatversion=2
 * html2wt: A number of fixes to wikitext escaping and regexps
 * Tweaks to TemplateHandler
 * Make Nowiki html2wt be like all other extensions + tweak behavior
 * DOMFragments: Use sealFragment instead of unwrapFragment
 * Remove cite-specific leak from cleanup
 * Traverse with env
 * Fix for native content
 * Remove `manager` from JS Frame object
 * WikiLinkHandler: Convert width/height to strings
 * Fix tokensToString implementation and use sites
 * Assert no nulls in tokensToString
 * T211251: Fix crasher when encapsulating empty doc
 * WikiLinkHandler.buildLinkAttrs always returns an KV[] for contentKVs
 * Add Token::getAttributeKV helper

Monday, June 17, 2019 around 1:45 pm PT: ✅ (deploy-2019-06-10 branch)

 * T225217: Revert "Stop generating an old dom when none is provided"

Wednesday, May 29, 2019 around 1:28 pm PT: ✅

 * T219927, T211125: Switch logging to rsyslog
 * Don't mutate cached values in tokenizer
 * Remove dead code
 * From TokenTransformManager.js
 * From TokenUtils.js
 * Rename PipelineUtils::buildDOMFragmentTokens to eliminate confusion
 * Port LinkHandler
 * Fixes a regex in the JS version to match MediaWiki's idea of whitespace
 * Stop emitting section offsets in the pagebundle
 * Stop generating an old dom when none is provided
 * Explicitly call yargs.options instead of passing to .usage
 * Bump to service-runner@2.6.19

Wednesday, May 15, 2019 around 1:12 pm PT: ✅

 * Template Wrapping: Change warning to assertion to fail fast
 * Some minor html2wt cleanup: Remove prototype injection
 * Move to worker-farm@1.7.0 instead of fork
 * Simplify SelectiveSerializer constructor
 * Fix Util.decodeURI and .decodeURIComponent
 * Fix linkTrailRegex
 * Convert `extTagWidths` to `extTagOffsets`
 * Refactoring in PreHandler:
 * Fix resetState implementation
 * Handle EOFTk in PreHandler onEnd
 * No need to skipOnAny from onAny handler
 * Reset is always called with enableAnyHandler
 * Prefer resetState to resetting in EOFTk

Thursday, May 2, 2019 around 10:45 am PT: ✅

 * Simplify nowiki_text rule
 * Back port latest Sanitizer::normalizeCss to Parsoid (JS and PHP)
 * Only monkey-patch console.assert on Node >= v10
 * Remove dead tsrDelta code from UnpackDOMFragments.*
 * Remove dead code from TokenUtils:shiftTokensTSR
 * Clarify names of offset fields + remove redundant fields

Monday, Apr. 29, 2019 around 1:19 pm PT: ✅
wt -> html:


 * Prepare DOM before emitting it from the tree builder
 * Back port latest Sanitizer::validateCodepoint from core
 * Synchronize Sanitizer::setupAttributeWhitelist with core
 * Preserve leading/trailing whitespace on invalid templates
 * Fix a missing case in TokenUtils.tokenTrim
 * T106578, T113194: Ensure PHP and JS are consistent wrt allowed entities
 * Apply urltext optimizations to JavaScript tokenizer

html -> wt:


 * Treat data-parsoid-diff like data-parsoid & data-mw wrt load/store
 * html2wt: Get rid of invalid / unnecessary isString checks
 * WTS: Get rid of unnecessary double-newline normalization
 * Fix onSOL buglet in html2wt introduced by refactor
 * Get rid of the only `nlConstraints.a` use
 * T205338: Create a DOMHandler class to ease porting
 * Let div fallback to html element handler

Other changes:


 * Use `Wt2HtmlResource` and `Html2WtResource` instead of Parser/Serializer
 * Linter: Consistently use lowercase tag names in lintObj params
 * Fix console.assert when running under node >= 0.10
 * T219072: Support splicing more PHP components into the parse pipeline
 * Fix serializing fosterable metas after
 * T219938: Port HTML5Treebuilder and its test suite to PHP
 * T221384, T219943: Update wikipeg to 2.0.2
 * Update to WikiPEG 2.0.3
 * Update shrinkwrap for wikipeg@2.0.3

Monday, Apr. 15, 2019 around 1:30 pm PT: ✅

 * Convert cite extension to es6 class structure
 * Remove DOM level 4 check from DOMPostProcessor
 * Make extensions with post-processors return constructors
 * Various DOMTraverser fixes
 * Port HandleLinkNeighbours handler
 * Convert HTML5TreeBuilder to es6 class structure
 * Fix DOMDataUtils.loadDataAttribs to accept options instead of bool
 * Call domino's HTMLParser.insertToken directly

Wednesday, Apr. 3, 2019 around 1:20 pm PT: ✅

 * T212597: Update lib/config/wmf.sitematrix.json
 * Convert handlers to es6 class structure
 * Simplify addExtLinkClasses DOM pass + port it to PHP
 * Fix bad return in onlyinclude handler
 * DOMTraverser cleanup
 * Organize DOMPostProcessor constructor
 * Changes to JS code while porting to PHP:
 * Transfer pwrap DOM pass from the php-prototype branch
 * T219337: Port tokenizer to PHP
 * Update wikipeg version to 2.0.1
 * Add --trace option to inspectTokenizer.js
 * Resolve superficial token stream differences between JS and PHP
 * Make processors pass phan

Tuesday, Mar. 26, 2019 around 10:28 am PT: ✅

 * Introduce new DOMUtils.{match,has}TypeOf/{match,has}NameAndTypeOf helpers
 * T219023: html2wt: Fix 'isSimpleLink' detection
 * Miscellaneous fixes to entity encoding
 * Use comment encoding for tunnelled fosterable content
 * Restrict reinsertable fostered content to internal metas
 * Let fosterable nodes remain unfostered across serialization boundaries
 * Tokenizer efficiency improvements
 * Audit uses of Node#getAttribute + add missing file to PHP codebase
 * Fix the regex used when looking for extension end tags

Thursday, Mar. 14, 2019 around 10:24 am PT: ✅

 * Update isXMLTag and isExtTag predicates to match their names
 * Consolidate block_tag_opened and xmlish_tag_opened
 * T213950: Fixes external links with special characters roundtrip
 * Refactor getWikiLinkTargetInfo to accept strings instead of KV
 * Address FIXME comments by cloning cache entries before modification

Wednesday, Mar. 13, 2019 around 1:35 pm PT: ✅

 * PEG rule parameters; Switch from pegjs to wikipeg
 * Make transform test runners quiet by default; fix "" handling in KVs
 * Protect data-object-id attribute

Thursday, Mar. 7, 2019 around 10:51 am PT: ✅

 * T202905: Fix new linter category to enable code work with templates
 * Tweak storeDataAttribs to suppress DOM nodes in data-parsoid.tmp
 * TokenHandler.processTokensSync: Dont pass strings to onTag handler

Monday, Mar. 4, 2019 around 2:05 pm PT: ✅

 * Avoid serialize/parse of data attributes when treebuilding
 * T214099: Move language conversion work into lib/parse.js
 * T214099: Move redlink updating into lib/parse.js
 * T202905: Linter.js: Add new function to detect the use of links in links
 * templatedepth is either an int or false
 * Remove redundant dataParsoid call

Tuesday, Feb. 26, 2019 around 10:17 am PT: ✅

 * T204608: Use a bag-on-the-side implementation for node data
 * T214099: Bump num_workers to 3
 * T217093: Use env.createDocument in lib/api/apiUtils.js
 * T214099: Use fork of worker-farm

Wednesday, Feb. 20, 2019 around 1:28 pm PT: ✅

 * Bump content version to 2.1.0
 * T153080, T169975: Add media info in a post-processing pass
 * Remove false assertion that file tokens wouldn't have data-mw
 * T215824: Fix crashers from file in link scenarios
 * Skip separators when looking for the next th/td
 * DOMDataUtils: Remove return statements from setData* utils
 * Assert that the .dataobject isn't touched after storing attrs on a node
 * Add some strategic isElt guards
 * Simplify and clean up stops usage

Monday, Feb. 11, 2019 around 1:25 pm PT: ✅ (deploy-2019-02-11 branch)

 * Minor JS fixes to make conversion to PHP better
 * T208901: Update pwrap.js wrt templatestyles p-wrapping expectations
 * T215537: Reduce the batch size for pageprop requests
 * T213468: PHP section numbers are assigned during tokenization
 * T215638: ListHandler tokens don't need to be special snowflakes either

Wednesday, Feb. 6, 2019 around 1:05 pm PT: ✅

 * Stop producing content version 1.x
 * Move bulk of transformTokens code from SyncTTM to TokenHandler
 * Improve TokenHandler flags for readability
 * Refactor ConstrainedText to make it easier to port
 * Backport some improved comments and function names from PHP port.

Thursday, Jan. 24, 2019 around 3:54 pm PT: ✅

 * T214649, T214648: Revert "Get rid of `nativeGallery` option and enable it by default"
 * Set `nativeGallery` to `false`

Thursday, Jan. 24, 2019 around 11:03 am PT: ✅

 * Convert several files to use an ES6 class structure
 * Remove unnecessary dependency from WikitextSerializer -> escape handlers
 * Handle encoded pipes in link's "alt" option
 * T187958: Match php parser gallery caption parsing
 * Get rid of `nativeGallery` option and enable it by default
 * Eliminate use of prevToken from QuoteTransformer
 * Always pass an actual boolean (not "undefined" or "null") as `sol` option
 * Get rid of unused prevToken arg from token handler signatures
 * T205337: Simplify SyncTTM and handlers
 * T214103: Instrument language variant conversions
 * Work around aggressive exception handling in the tokenizer

Tuesday, Jan. 8, 2019 around 11:00 am PT: ✅

 * T197616: Add test-commons.wikimedia.org
 * Tweak QuoteTransformer code + add edgy test specing prevToken arg
 * T205491: QuoteTransformer quote tokens don't need to be special snowflakes
 * T209772: Add helpers to ease binding context when load/storing data attribs
 * T199926: Remove unnecessary pattern from interwiki checks
 * Simplify DOMUtils.visitDOM helper
 * No need to close over CleanUp.stripMarkerMetas
 * Use escapeIdForExternalInterwiki when rendering interwiki links
 * Remove `figureHandlerImpl`
 * Convert NodeList to Array in `addRedLinksG`
 * Refactor tokenizeSync signature to avoid potentially ignoring args
 * Stop leaking manager (an impl. detail) to extensions

Code refactoring


 * T209194: Export one class per file for various things
 * T204622: Convert various things to use ES6 class syntax
 * Migrate handlers out of DOMPostProcessor into their own files
 * Rename Normalizer to DOMNormalizer and update file name to match