Parsoid/Deployments

Planned deployments, linked from Deployments. For a list of past deployments, look for 'parsoid' in Server Admin Log.

See Parsoid to learn how to deploy a new version of Parsoid.


 * Deployments in 2013
 * Deployments in 2014

Wednesday, June 3, 2015 around 1:15pm PST: ✅

 * Be more careful about which MW API warnings we suppress
 * : Make behavior switches SOL transparent
 * API: If "wt" parameter is passed in, set it as the page source unconditionally
 * : DOM normalization: Move meta-tag hoisting from core serializer to DOM normalization pass
 * DOM normalization: Merge adjacent  tags with identical attrs

Monday, June 1, 2015 around 1pm PST: ✅
This is the same as the previous deploy attempt:
 * : Support subst: of transclusion blocks in the parseFragment API endpoint
 * DOMDiff: For  id properties in data-mw, fetch HTML and compare DOMs to detect edits to  s without requiring clients to dirty the   nodes
 * DOMDiff: Improve robustness of data-mw diff testing
 * Suppress separators in single-line context (part of )
 * , : Make hardcoded config values configurable
 * Blank template parameters should be preserved
 * Code cleanup in mediawiki wiki config
 * Use interwikiMap, not mwApiMap, to normalize titles
 * Store apiConf as an object in the mwApiMap
 * Make  into a general proxy configuration option
 * Code cleanup and fixes in mediawiki API request handling
 * Refactor request default options into ApiRequest.prototype.request
 * Strip UTF8 BOM so that JSON.parse doesn't throw
 * Ignore modulemessages in api=parse result

Plus two new cherry-picked patches:
 * : suppress modulemessages deprecation warnings in logs.
 * A new version of mediawiki core was deployed earlier in the day which caused a spike in these warning messages. With this patch, we are suppressing all warning/api messages.
 * Fix typo in config property used for sampling heap usage
 * this should fix the outgoing network spike seen in previous attempt.

Thursday, May 28, 2015 around 12:40pm PST: 497da30e to be deployed (Reverted)

 * : Support subst: of transclusion blocks in the parseFragment API endpoint
 * DOMDiff: For  id properties in data-mw, fetch HTML and compare DOMs to detect edits to  s without requiring clients to dirty the   nodes
 * DOMDiff: Improve robustness of data-mw diff testing
 * Suppress separators in single-line context (part of )
 * , : Make hardcoded config values configurable
 * Blank template parameters should be preserved
 * Code cleanup in mediawiki wiki config
 * Use interwikiMap, not mwApiMap, to normalize titles
 * Store apiConf as an object in the mwApiMap
 * Make  into a general proxy configuration option
 * Code cleanup and fixes in mediawiki API request handling
 * Refactor request default options into ApiRequest.prototype.request
 * Strip UTF8 BOM so that JSON.parse doesn't throw
 * Ignore modulemessages in api=parse result

This is the same as yesterday's attempted deploy, which we had to defer due to.

Reverted after observing an outgoing network traffic spike on our canary deploy machine (wtp1001). Suspected to be due to stats or logging misconfiguration. This is because of a typo in one of the parsoid-config properties that determines the heap usage sample interval. Because of the typo, instead of sending heap usage samples every 5 mins, parsoid was sending samples all the time. This caused the network spike seen on wtp1001.

Wednesday, May 27, 2015 around 1pm PST: 497da30e to be deployed (Cancelled)
Because of, we cannot currently test the deploy by looking at VE edits to see that we didn't break anything by examining wikitext diffs. Parsoid deploys are paused till that ticket is resolved and a patch is deployed to production.
 * : Support subst: of transclusion blocks in the parseFragment API endpoint
 * DOMDiff: For id properties in data-mw, fetch HTML and compare DOMs to detect edits to s without requiring clients to dirty the nodes
 * DOMDiff: Improve robustness of data-mw diff testing
 * Suppress separators in single-line context (part of )
 * , : Make hardcoded config values configurable
 * Blank template parameters should be preserved
 * Code cleanup in mediawiki wiki config
 * Use interwikiMap, not mwApiMap, to normalize titles
 * Store apiConf as an object in the mwApiMap
 * Make `proxy_strip_https` into a general proxy configuration option
 * Code cleanup and fixes in mediawiki API request handling
 * Refactor request default options into ApiRequest.prototype.request
 * Strip UTF8 BOM so that JSON.parse doesn't throw
 * Ignore modulemessages in api=parse result

Wednesday, May 20, 2015 around 1pm PST: ✅

 * : Add mw:DisplaySpace to typeof for nbsp before colon
 * : Provide section-offsets for immediate children of to support section editing in VE and other clients

Monday, May 18, 2015 around 1:10pm PST: ✅

 * : Put escaped HTML tags inside &lt;nowiki&gt;
 * : html2wt should not need access to original source
 * Restore speedy non-selser serialization
 * Don't use selser if oldid is missing

Wednesday, May 13, 2015 around 1:25pm PST: ✅

 * : Allow quotes as template targets
 * Normalize empty headings only if they are newly inserted content
 * A bunch of code cleanup patches (including some refactoring of server configuration)

Monday, May 4, 2015 11:44am PST: ✅

 * Avoid deep freezing some parsoidConfig properties
 * This patch prevents the bug that prevented Parsoid service from starting up in production causing a revert Wedneday, April 29
 * Ensure that embedded Maps and Sets are properly deep-frozen
 * Freeze parsoidConfig to avoid shared mutable state
 * Remove uri fallback when switching wiki configs

Wednesday, April 29, 2015 around 1pm PST: 45b54f63 to be deployed (Reverted)
See outage report for more details.
 * Freeze parsoidConfig to avoid shared mutable state
 * Remove uri fallback when switching wiki configs

Monday, April 27, 2015 around 1pm PST: ✅

 * : Forward the X-Request-ID header
 * : Exponentially increase the request timeout
 * Reduce API concurrency and retries (to deal with overload on API cluster)
 * Don't strip \r in API routes
 * Remove redundant \r handling
 * Upgrade to prfun 2.0.0 and smash the global Promise
 * Performance: Use core-js/shim instead of es6-shim
 * A lot of code cleanup
 * This includes bcea0ab0 which is a fix for the cleanup patch 915ea3f6 which was causing last week's corruptions.

Saturday, April 25, 2015 around 8:25 am PST: ✅ (cherry-pick of d2135c6b on parsoid master)
Cherry-picked "Reduce API concurrency and retries" from parsoid master to reduce # retries and concurrency level with which Mediawiki API is hit.

Friday, April 24, 2015 around 12:50 pm PST: Reverted deploy to 3311936a
Thursday late night deploy reverted due to corruptions reported.

See outage report for more details

Thursday, April 23, 2015 around 11:45pm PST: ✅
This was meant to be an emergency deploy of one patch but unintentionally deployed all changes from master.
 * Reduce API concurrency and retries (to deal with overload on API cluster)
 * Don't strip \r in API routes
 * Remove redundant \r handling
 * Upgrade to prfun 2.0.0 and smash the global Promise
 * Performance: Use core-js/shim instead of es6-shim
 * A lot of code cleanup

Wednesday, April 22, 2015 around 1:05pm PST: ✅

 * : Enforce for all lines when escaping wikitext
 * Fix base href on _rt routes
 * Accept scrubWikitext as a query parameter

Monday, April 20, 2015 around 1pm PST: ✅

 * : Suppress empty headings if scrubWikitext param is provided
 * Add a  param to the API to (optionally) apply normalizations that won't roundtrip
 * : Fix crasher seen in production
 * :  marker metas should remain fosterable
 * Log uncaught exceptions in Parsoid service
 * Edge case bug fix in migrateTrailingNLs DOM pass (for example, in en:SM U-66)
 * Other code cleanup that doesn't affect functionality

Wednesday, April 15, 2015 around 1:20pm PST: ✅

 * Bug fix serializing nested refs (would refuse to save because of missing content)
 * Bug fix in selser tests that sometimes normalized element attributes unnecessarily
 * Handle empty content string ("") returned by the API
 * Normalize DOM before running DOM-Diff
 * Fix findFirstEncapsulationWrapperNode -- eliminates dirty diffs in some edge case scenarios
 * Other code cleanup that doesn't affect functionality

Monday, April 13, 2015 around 1pm PST: 8f35374d (skipped)
Deploy postponed because beta cluster is down and it is not possible to verify this in beta cluster beforehand.
 * Bug fix serializing nested refs (would refuse to save because of missing content)
 * Bug fix in selser tests that sometimes normalized element attributes unnecessarily
 * Handle empty content string ("") returned by the API
 * Normalize DOM before running DOM-Diff
 * Other code cleanup that doesn't affect functionality

Wednesday, April 8, 2015 around 1pm PST: ✅

 * :  tags with invalid hrefs should serialize to text
 * : use HTML entities to encode/decode arbitrary data in comments (see also ).
 * : Switch from TXStatsD to statsite metrics.

Other changes:
 * Various code style tweaks and clean ups.

Monday, April 6, 2015 around 1pm PST: ✅

 * : Normalize comments so that Parsoid output is valid XML
 * Edge-case fix for hoisting embedded s from headings
 * : Preserve querystring params while redirecting
 * Don't serialize  tags as  ever
 * : Remove state from Cite extension
 * Log with supplied x-request-id header

Monday, Mar 30, 2015 around 1pm PST: ✅

 * Skip link validity tests for strings that won't be used as hrefs: Eliminates erroneous "bad title text" logging messages
 * cleanupAndSaveDataParsoid should be done in its own pass: Fixes incorrect HTML generated in -hack scenarios when v2 API is used
 * Replace duplicate ids in wikitext: Allows Parsoid to handle pages with duplicated ids without corrupting them on serialization ( is an instance of this)
 * : Never serialize a-tag as html
 * : Add original dimension information for images.
 * : Normalize wikilink targets to strip leading "./"
 * , : Ensure reference index is reset at the end of document
 * Use tokenizer info to fix/cleanup tdFixups dom pass

Wednesday, Mar 25, 2015 around 1pm PST: ✅

 * : Pop comments from the end of table tag attributes
 * Strip out X-Parsoid-Performance headers and associated code -- no longer useful since Parsoid now sends lots of metrics to statsd
 * Bug fix setting TSR in defn lists - fixes DSR inconsistency warnings
 * : Nulls in DSR computation should not be coerced to 0
 * Edge case fix for definition lists: Only return colon when ignoring in tags
 * Associate data-parsoid with duplicated ids (copy-paste in VE can introduce duplicate element ids)

Monday, Mar 23, 2015 around 1:25pm PST: ✅

 * : Fix tokenizing redirect context
 * Use more specific warning labels to help sift through logs in Kibana
 * Use  instead of   when we can't serialize a   ( https://gerrit.wikimedia.org/r/198176 ).  This should send a 500 response, not kill the entire worker.

Thursday, Mar 19, 2015 around 6:45 pm PST: ✅

 * : Don't strip id attributes from DOM nodes -- required for tags
 * : Serialize category redirects with a ':'
 * Additional logging to help debug Visual Editor issues

Thursday, Mar 19, 2015 around 9 am PST: ✅

 * : Abort html -> wt serialization when we encounter a DOM id without a matching DOM element
 * Log errors when Parsoid-like element ids are stripped from HTML elements

Wednesday, Mar 18, 2015 around 1pm PST: ✅

 * Don't serialize HTML id attributes with Parsoid-like elt ids
 * : Ensure that alt image option is handled properly even when it has complex wikitext
 * v2 API: Explicitly set a utf-8 charset in text content-types

Monday, Mar 16, 2015 around 1pm PST: ✅

 * , : Handle entities/nowikis in templated attributes
 * : Enforce single-line context in the serializer
 * : Table cells not properly parsed in an implicit-td context
 * : Improve escaping and nowikiing of template arguments
 * Additional fixes to selective serializer around reusing original source in lists and list items
 * Additional instrumentation (input/output sizes, init times) of Parsoid endpoints

Wednesday, Mar 11, 2015 around 1pm PST: ✅

 * : Fix serialization of table cells with "-" and "+" in them
 * : Convert | to | in template parameters
 * : Eliminate fatal assertion failures seen in production (found on kibana)
 * : Improvements to &lt;nowiki&gt; wrapping for strings that needed them
 * Fixes to DSR computation algo to eliminate negative DSR deltas (should eliminate the warnings seen in kibana)
 * Updated sitematrix.json to latest changes
 * Explicitly pass rawcontinue=1 to the Mediawiki API (to eliminate deprecation warnings logged on the M/W API end)
 * Log mediawiki API warnings (so we can find and fix API deprecations in future)

Monday, Mar 9, 2015 around 1pm PST: ✅

 * , : Several fixes to serialization of lists
 * : Change how LST s are output

Wednesday, Mar 4, 2015 around 1pm PST: ✅

 * : Fix selser bugs that would occasionally lose newly added comments
 * : Fix broken serialization in some scenarios after table columns are deleted
 * Fix broken performance timer code (broken in Monday, Mar 2, deploy)

Monday, Mar 2, 2015 around 1:15pm PST: ✅

 * : Remove duplication of content in the data-mw.body.html property of tags
 * : Remove more cases of data-parsoid.src from mw:Extensions
 * Memory usage reports are now generated once every 5 mins and sent to the statsd server

Wednesday, Feb 25, 2015 around 1:00 pm PST: ✅

 * Serialize new anchor links (w/o rel) as internal
 * Amend timing metrics
 * : Open tags only affect line when parsing definition list colon
 * : Fix nowiki escaping for &lt;td&gt;

Monday, Feb 23, 2015 around 1pm PST: ✅

 * Workaround for . (Will be reverted once citoid bug is fixed: .)
 * : Ensure that implicitly-added output have unique ids
 * Fix serializing categories without indent-pre protection (tweaks #REDIRECT handling as well)
 * Don't crash when revision is hidden
 * : Handle more templated  -attr scenarios
 * Tweaked naming of selser-related timing stats.
 * Enable timing stats in production (localsettings.js change).

Wednesday, Feb 18, 2015 around 1:30 pm PST: ✅

 * : Emit reflists for with no explicit
 * , : Enable timing stats for Parsoid wt2html and html2wt requests
 * not yet enabled in production (requires change to localsettings.js)

Monday, Feb 16, 2015 around 1pm PST: ✅

 * : Handle template-generated DISPLAYTITLE and DEFAULTSORT
 * : Fix selser regression introduced on Feb 11 deploy
 * : Fix selser in v2 API (to be used by RESTbase)
 * For older MW APIs that doesn't provide that information, default to cached enwiki config for supported link protocols

Wednesday, Feb 11, 2015, around 1:35pm PST: ✅

 * : Remove data-parsoid.src for elts with valid data-mw and DSR info
 * : Remove unnecessary transclusion tags
 * Fixes to handle high load on Parsoid cluster
 * Don't reprocess same token in AttributeExpander unless necessary (eliminates infinite loop scenarios found on some pages)
 * Fixes to make sure fatal errors more consistently force process restarts without leaving behind stuck processes
 * Categories on their own line don't need nowikis around any leading whitespace
 * Non-word characters shouldn't terminate tag names
 * : Hoist categories, language links, redirects, comments out of headings when serializing them
 * : Fix serializing new links with "./" in content string

Monday, Feb 9, 2015, around 1pm PST: dd98dea0 to be deployed (Cancelled)

 * : Remove data-parsoid.src for elts with valid data-mw and DSR info
 * : Remove unnecessary transclusion tags
 * Fixes to handle high load on Parsoid cluster
 * Don't reprocess same token in AttributeExpander unless necessary (eliminates infinite loop scenarios found on some pages)
 * Fixes to make sure fatal errors more consistently force process restarts without leaving behind stuck processes
 * Categories on their own line don't need nowikis around any leading whitespace
 * Non-word characters shouldn't terminate tag names
 * : Hoist categories, language links, redirects, comments out of headings when serializing them

Deployment cancelled. We found some regressions and the fixes for them are still going through round trip testing at this time. So, we'll get these out on Wednesday.

Friday, Feb 6, 2015, around 9:10pm PST: Hotfix of cherry-pick
Jan 28, 2015 deploy of exposed a longstanding bug in Parsoid which was fixed by. On some pages, due to where some templates weren't being expanded, the Attribute Expander was effectively being asked to re-expand the template over and over again in an infinite loop. This was being triggered on a few enwiki pages today that was causing processes to get stuck without being restarted. This hotfix prevents the infinite loop.

Friday, Feb 6, 2015, around 11:20am PST: Hotfix of cherry-pick
A bug in our process restart (on fatal errors) was exposed by unrelated bugs in our parse pipeline which manifested as stuck processes on the cluster. This hotfix fixes that by ensuring that fatals continue to restart processes.

Wednesday, Feb 4, 2015, around 1pm PST: ✅

 * Switch to using the compression package instead of the outdated version bundled with connect. In testing, that cleared up the memory leak noticed since the Jan. 15th deploy.
 * Some cleanup including:
 * Changing a few on handlers to once.
 * Using request's qs option for apiargs instead of stringifying those manually.
 * Better error handling for config requests.

Monday, Feb 2, 2015, around 1pm PST: ✅

 * Set X-Forwarded-Proto when proxying https. This fixes timeouts for ruwikinews which is strict about accepting only https connections.
 * Some performance tweaks to attribute expander to eliminate useless work and useless memory allocation
 * : Fixes to resource module loading URI in the section of Parsoid HTML
 * Fixes to tokenizer to ensure that strings starting with '-' are parsed for directives like language variant markup

Friday, Jan 30, 2015 around 2:35 pm PST: ✅
The Jan 15th deploy where Parsoid started using sitematrix info for configuring wikis was missing special handling for some wikis (commonswiki being one of them). This caused timeouts which in turn repeatedly exercised an existing memory leak. This, in turn, caused a slow buildup of leaked memory on the cluster and a higher than normal cpu load. This special Friday deploy fixed the config issues.

Specifically, the following two patches were deployed:
 * Some special wikis should use the default proxy
 * Strip TLS from sitematrix url if we're using the default proxy

Wednesday, Jan 28, 2015 around 1pm PST: ✅

 * : Correctly handle templates that generate part-attribute and part-content of a DOM node.
 * : Preserve blank template parameters
 * : Cleanup of behavior switch production
 * Updates to wikitext serializer to simplify and enable more robust wikitext escaping
 * ,, : Magic link fixes (wt2html and html2wt nowiki handling)

Thursday, Jan 15, 2015 around 1pm PST: ✅

 * ,, : Default WMF wikis served by Parsoid fetched from sitematrix API call
 * : Positional params with = in extlink are serialized as named parameters

On Jan 14th 1:20 pm PST, we reverted Parsoid to older deployed version after dirty diffs were seen during post-deploy testing. It turned out that the dirty diffs weren't related to the Parsoid deploy, but now that those issues have been fixed, we'll revisit the Parsoid deployment on Thursday.

Monday, Jan 12, 2015 around 1pm PST: ✅

 * Include location of titles in timeout logs
 * Tweaks to Parsoid's cite port to generate identical ref ids as native cite implementation

Wednesday, Jan 7, 2015 around 1pm PST: ✅

 * : Improved handling of extremely large lists -- fixes the load issues seen in production on Jan 3rd
 * Removed hardcoded HTTP 500 response for urwiki:نام_مقامات_اے (deployed on Jan 3rd to prevent this page from overloading the cluster)
 * : Fix edge case tokenizing table lines

Monday, Jan 5, 2015 around 2pm PST: ✅
Wikitext -> HTML HTML -> Wikitext Other (API, logging, etc)
 * : data-parsoid stripped from template content
 * : Context-aware parsing of definition list colon
 * : Parse extension parameters as plain text
 * : Stray is parsed to meta
 * Marginal improvement parsing templates in definition lists
 * , : Several improvements and fixes to nowiki protection for quotes
 * Other improvements and bug fixes to nowiki protection in headings, lists, tables.
 * : Insert an extra newline after new content and existing headings
 * Add logging for html2wt API endpoints
 * Fix robots.txt route
 * Send SIGKILL to kill a timed out worker
 * : API v2 parsing and serialization routes