Parsoid/Deployments

Planned deployments, linked from Deployments. For a list of past deployments, look for 'parsoid' in Server Admin Log.

See Parsoid to learn how to deploy a new version of Parsoid.


 * Deployments in 2013
 * Deployments in 2014

Monday, April 13, 2015 around 1pm PST: 8f35374d to be deployed
Deploy postponed because beta cluster is down and it is not possible to verify this in beta cluster beforehand.
 * Bug fix serializing nested refs (would refuse to save because of missing content)
 * Bug fix in selser tests that sometimes normalized element attributes unnecessarily
 * Handle empty content string ("") returned by the API
 * Normalize DOM before running DOM-Diff
 * Other code cleanup that doesn't affect functionality

Wednesday, April 8, 2015 around 1pm PST: ✅

 * :  tags with invalid hrefs should serialize to text
 * : use HTML entities to encode/decode arbitrary data in comments (see also ).
 * : Switch from TXStatsD to statsite metrics.

Other changes:
 * Various code style tweaks and clean ups.

Monday, April 6, 2015 around 1pm PST: ✅

 * : Normalize comments so that Parsoid output is valid XML
 * Edge-case fix for hoisting embedded s from headings
 * : Preserve querystring params while redirecting
 * Don't serialize  tags as  ever
 * : Remove state from Cite extension
 * Log with supplied x-request-id header

Monday, Mar 30, 2015 around 1pm PST: ✅

 * Skip link validity tests for strings that won't be used as hrefs: Eliminates erroneous "bad title text" logging messages
 * cleanupAndSaveDataParsoid should be done in its own pass: Fixes incorrect HTML generated in -hack scenarios when v2 API is used
 * Replace duplicate ids in wikitext: Allows Parsoid to handle pages with duplicated ids without corrupting them on serialization ( is an instance of this)
 * : Never serialize a-tag as html
 * : Add original dimension information for images.
 * : Normalize wikilink targets to strip leading "./"
 * , : Ensure reference index is reset at the end of document
 * Use tokenizer info to fix/cleanup tdFixups dom pass

Wednesday, Mar 25, 2015 around 1pm PST: ✅

 * : Pop comments from the end of table tag attributes
 * Strip out X-Parsoid-Performance headers and associated code -- no longer useful since Parsoid now sends lots of metrics to statsd
 * Bug fix setting TSR in defn lists - fixes DSR inconsistency warnings
 * : Nulls in DSR computation should not be coerced to 0
 * Edge case fix for definition lists: Only return colon when ignoring in tags
 * Associate data-parsoid with duplicated ids (copy-paste in VE can introduce duplicate element ids)

Monday, Mar 23, 2015 around 1:25pm PST: ✅

 * : Fix tokenizing redirect context
 * Use more specific warning labels to help sift through logs in Kibana
 * Use  instead of   when we can't serialize a   ( https://gerrit.wikimedia.org/r/198176 ).  This should send a 500 response, not kill the entire worker.

Thursday, Mar 19, 2015 around 6:45 pm PST: ✅

 * : Don't strip id attributes from DOM nodes -- required for tags
 * : Serialize category redirects with a ':'
 * Additional logging to help debug Visual Editor issues

Thursday, Mar 19, 2015 around 9 am PST: ✅

 * : Abort html -> wt serialization when we encounter a DOM id without a matching DOM element
 * Log errors when Parsoid-like element ids are stripped from HTML elements

Wednesday, Mar 18, 2015 around 1pm PST: ✅

 * Don't serialize HTML id attributes with Parsoid-like elt ids
 * : Ensure that alt image option is handled properly even when it has complex wikitext
 * v2 API: Explicitly set a utf-8 charset in text content-types

Monday, Mar 16, 2015 around 1pm PST: ✅

 * , : Handle entities/nowikis in templated attributes
 * : Enforce single-line context in the serializer
 * : Table cells not properly parsed in an implicit-td context
 * : Improve escaping and nowikiing of template arguments
 * Additional fixes to selective serializer around reusing original source in lists and list items
 * Additional instrumentation (input/output sizes, init times) of Parsoid endpoints

Wednesday, Mar 11, 2015 around 1pm PST: ✅

 * : Fix serialization of table cells with "-" and "+" in them
 * : Convert | to | in template parameters
 * : Eliminate fatal assertion failures seen in production (found on kibana)
 * : Improvements to &lt;nowiki&gt; wrapping for strings that needed them
 * Fixes to DSR computation algo to eliminate negative DSR deltas (should eliminate the warnings seen in kibana)
 * Updated sitematrix.json to latest changes
 * Explicitly pass rawcontinue=1 to the Mediawiki API (to eliminate deprecation warnings logged on the M/W API end)
 * Log mediawiki API warnings (so we can find and fix API deprecations in future)

Monday, Mar 9, 2015 around 1pm PST: ✅

 * , : Several fixes to serialization of lists
 * : Change how LST s are output

Wednesday, Mar 4, 2015 around 1pm PST: ✅

 * : Fix selser bugs that would occasionally lose newly added comments
 * : Fix broken serialization in some scenarios after table columns are deleted
 * Fix broken performance timer code (broken in Monday, Mar 2, deploy)

Monday, Mar 2, 2015 around 1:15pm PST: ✅

 * : Remove duplication of content in the data-mw.body.html property of tags
 * : Remove more cases of data-parsoid.src from mw:Extensions
 * Memory usage reports are now generated once every 5 mins and sent to the statsd server

Wednesday, Feb 25, 2015 around 1:00 pm PST: ✅

 * Serialize new anchor links (w/o rel) as internal
 * Amend timing metrics
 * : Open tags only affect line when parsing definition list colon
 * : Fix nowiki escaping for &lt;td&gt;

Monday, Feb 23, 2015 around 1pm PST: ✅

 * Workaround for . (Will be reverted once citoid bug is fixed: .)
 * : Ensure that implicitly-added output have unique ids
 * Fix serializing categories without indent-pre protection (tweaks #REDIRECT handling as well)
 * Don't crash when revision is hidden
 * : Handle more templated  -attr scenarios
 * Tweaked naming of selser-related timing stats.
 * Enable timing stats in production (localsettings.js change).

Wednesday, Feb 18, 2015 around 1:30 pm PST: ✅

 * : Emit reflists for with no explicit
 * , : Enable timing stats for Parsoid wt2html and html2wt requests
 * not yet enabled in production (requires change to localsettings.js)

Monday, Feb 16, 2015 around 1pm PST: ✅

 * : Handle template-generated DISPLAYTITLE and DEFAULTSORT
 * : Fix selser regression introduced on Feb 11 deploy
 * : Fix selser in v2 API (to be used by RESTbase)
 * For older MW APIs that doesn't provide that information, default to cached enwiki config for supported link protocols

Wednesday, Feb 11, 2015, around 1:35pm PST: ✅

 * : Remove data-parsoid.src for elts with valid data-mw and DSR info
 * : Remove unnecessary transclusion tags
 * Fixes to handle high load on Parsoid cluster
 * Don't reprocess same token in AttributeExpander unless necessary (eliminates infinite loop scenarios found on some pages)
 * Fixes to make sure fatal errors more consistently force process restarts without leaving behind stuck processes
 * Categories on their own line don't need nowikis around any leading whitespace
 * Non-word characters shouldn't terminate tag names
 * : Hoist categories, language links, redirects, comments out of headings when serializing them
 * : Fix serializing new links with "./" in content string

Monday, Feb 9, 2015, around 1pm PST: dd98dea0 to be deployed

 * : Remove data-parsoid.src for elts with valid data-mw and DSR info
 * : Remove unnecessary transclusion tags
 * Fixes to handle high load on Parsoid cluster
 * Don't reprocess same token in AttributeExpander unless necessary (eliminates infinite loop scenarios found on some pages)
 * Fixes to make sure fatal errors more consistently force process restarts without leaving behind stuck processes
 * Categories on their own line don't need nowikis around any leading whitespace
 * Non-word characters shouldn't terminate tag names
 * : Hoist categories, language links, redirects, comments out of headings when serializing them

Deployment cancelled. We found some regressions and the fixes for them are still going through round trip testing at this time. So, we'll get these out on Wednesday.

Friday, Feb 6, 2015, around 9:10pm PST: Hotfix of cherry-pick
Jan 28, 2015 deploy of exposed a longstanding bug in Parsoid which was fixed by. On some pages, due to where some templates weren't being expanded, the Attribute Expander was effectively being asked to re-expand the template over and over again in an infinite loop. This was being triggered on a few enwiki pages today that was causing processes to get stuck without being restarted. This hotfix prevents the infinite loop.

Friday, Feb 6, 2015, around 11:20am PST: Hotfix of cherry-pick
A bug in our process restart (on fatal errors) was exposed by unrelated bugs in our parse pipeline which manifested as stuck processes on the cluster. This hotfix fixes that by ensuring that fatals continue to restart processes.

Wednesday, Feb 4, 2015, around 1pm PST: ✅

 * Switch to using the compression package instead of the outdated version bundled with connect. In testing, that cleared up the memory leak noticed since the Jan. 15th deploy.
 * Some cleanup including:
 * Changing a few on handlers to once.
 * Using request's qs option for apiargs instead of stringifying those manually.
 * Better error handling for config requests.

Monday, Feb 2, 2015, around 1pm PST: ✅

 * Set X-Forwarded-Proto when proxying https. This fixes timeouts for ruwikinews which is strict about accepting only https connections.
 * Some performance tweaks to attribute expander to eliminate useless work and useless memory allocation
 * : Fixes to resource module loading URI in the section of Parsoid HTML
 * Fixes to tokenizer to ensure that strings starting with '-' are parsed for directives like language variant markup

Friday, Jan 30, 2015 around 2:35 pm PST: ✅
The Jan 15th deploy where Parsoid started using sitematrix info for configuring wikis was missing special handling for some wikis (commonswiki being one of them). This caused timeouts which in turn repeatedly exercised an existing memory leak. This, in turn, caused a slow buildup of leaked memory on the cluster and a higher than normal cpu load. This special Friday deploy fixed the config issues.

Specifically, the following two patches were deployed:
 * Some special wikis should use the default proxy
 * Strip TLS from sitematrix url if we're using the default proxy

Wednesday, Jan 28, 2015 around 1pm PST: ✅

 * : Correctly handle templates that generate part-attribute and part-content of a DOM node.
 * : Preserve blank template parameters
 * : Cleanup of behavior switch production
 * Updates to wikitext serializer to simplify and enable more robust wikitext escaping
 * ,, : Magic link fixes (wt2html and html2wt nowiki handling)

Thursday, Jan 15, 2015 around 1pm PST: ✅

 * ,, : Default WMF wikis served by Parsoid fetched from sitematrix API call
 * : Positional params with = in extlink are serialized as named parameters

On Jan 14th 1:20 pm PST, we reverted Parsoid to older deployed version after dirty diffs were seen during post-deploy testing. It turned out that the dirty diffs weren't related to the Parsoid deploy, but now that those issues have been fixed, we'll revisit the Parsoid deployment on Thursday.

Monday, Jan 12, 2015 around 1pm PST: ✅

 * Include location of titles in timeout logs
 * Tweaks to Parsoid's cite port to generate identical ref ids as native cite implementation

Wednesday, Jan 7, 2015 around 1pm PST: ✅

 * : Improved handling of extremely large lists -- fixes the load issues seen in production on Jan 3rd
 * Removed hardcoded HTTP 500 response for urwiki:نام_مقامات_اے (deployed on Jan 3rd to prevent this page from overloading the cluster)
 * : Fix edge case tokenizing table lines

Monday, Jan 5, 2015 around 2pm PST: ✅
Wikitext -> HTML HTML -> Wikitext Other (API, logging, etc)
 * : data-parsoid stripped from template content
 * : Context-aware parsing of definition list colon
 * : Parse extension parameters as plain text
 * : Stray is parsed to meta
 * Marginal improvement parsing templates in definition lists
 * , : Several improvements and fixes to nowiki protection for quotes
 * Other improvements and bug fixes to nowiki protection in headings, lists, tables.
 * : Insert an extra newline after new content and existing headings
 * Add logging for html2wt API endpoints
 * Fix robots.txt route
 * Send SIGKILL to kill a timed out worker
 * : API v2 parsing and serialization routes