Parsoid/Deployments

Planned deployments, linked from Deployments. For a list of past deployments, look for 'parsoid' in Server Admin Log.

See Parsoid to learn how to deploy a new version of Parsoid.


 * Deployments in 2015
 * Deployments in 2014
 * Deployments in 2013

Wednesday, September 21, 2016 around 1:15pm PT: ✅

 * Tokenizer:
 * Encapsulate protected table attributes from wt
 * Inline generic_attribute_newline_value and table_attribute_value
 * Set srcOffsets for table_attribute and generic_newline_attribute
 * HTTP API:
 * Page id and revid aren't the same thing
 * html2html should require an original or previous revision

Wednesday, September 14, 2016 around 1:15pm PT: ✅

 * Let native extensions add stylesheets
 * Move getAPIProxy to parsoidConfig
 * Other minor refactorings and parserTest changes

Monday, September 12, 2016 around 1:15 pm PT: ✅

 * Handle HTML tags in attribute text properly
 * AttributeExpander: Tweak check for improved code readability
 * Testing:
 * Bump worker_heartbeat_timeout to 2mins for testing
 * Allow specifying a specific revision for roundtrip-test.js

Tuesday, September 6, 2016 around 10:37 am PT: ✅

 * : Handle invalid titles in transclusions
 * Sanitizer fixes:
 * Decode all char refs in text
 * Ignore some fields when freezing SanitizerConstants for node v6.5 -- no-op for Wikimedia cluster that runs node v4.x
 * node-module updates:
 * Bump service-runner to v2.1.0
 * Remove bunyan
 * Some minor cleanups

Monday, August 29, 2016 around 1:10 pm PT: ✅

 * Run localSettings.setup after assigning options
 * Use service-runner's metrics reporter in the http api
 * Updates in preparation for supporting version 2.x content in the future -- should be no-op for version 1.x content
 * Support downgrading 2.x content to 1.x
 * No content reuse from semantically different content versions
 * : Establish precedence for data-mw in 2.0.0 content

Monday, August 22, 2016 around 1:12 pm PT: ✅

 * : html2wt: Fix crasher in DOM normalization code
 * : Use service-runner's logger as a backend to Parsoid's logger

Wednesday, August 17, 2016 around 1:09 pm PT: ✅

 * html2wt: Always emit canonical wikitext for url links
 * html2wt: Emit url-links where appropriate no matter what rel attribute says

Monday, August 15, 2016 around 1:09 pm PT: ✅

 * migrateTrailingNLs DOM pass: Code simplifications and some subtle edge case bug fixes
 * : Deal with edge cases serializing links
 * Remove deprecated "disablepp" MediaWiki API param and pass "disablelimitreport" instead
 * Increase resource limits for wikitext size, max table cells, and max list items
 * With the upgrade to node v4, we have more breathing room for parsing large pages

Wednesday, August 10, 2016 around 1:10 pm PT: ✅

 * Handle caption-like text outside tables
 * Table captions: Remove unneeded mw:TSRMarker meta token + add TSR info in tokenizer which leads to more accurate DSR offsets.
 * When table wikitext shows up outside tables and are converted to strings, strip attached mw:TSRMarker tags
 * computeDSR: Fix source of pathological O(n^2) behavior

Tuesday, August 9, 2016 around 11:15 am PT: ✅

 * Fix crasher in escapeWikitext
 * : Update site matrix for tcy.wikipedia.org

Tuesday, August 2, 2016 - Tuesday August 9, 2016: Upgrade Parsoid cluster to node v4.x and Jessie

 * : Over the week, Operations upgraded the cluster gradually.
 * The eqiad cluster was fully migrated by Friday, August 5th.
 * The codfw cluster was fully migrated by Tuesday, August 9th.

Monday, August 1, 2016 around 1:15 pm PT: ✅

 * Fix title parsing of subpages during initialization (addresses crashers while parsing these pages)
 * Only apply data-* attributes in /pagebundle/ paths (API cleanup)
 * Determines the content version in the html2wt direction, enabling content upgrade

Tuesday, July 26, 2016 around 10:12 am PT: ✅

 * Use mediawiki-title package to replace homegrown Title code (resolves, , and )
 * Reintroduce a 3-minute request timeout
 * Bump some minor / patch level versions of dependencies (addresses a security advisory)
 * Prevent JSON.stringify circular refs in template wrapping trace/error logs

Thursday, July 21, 2016 around 9:30 am PT: ✅

 * Test deploy to verify trebuchet deployment is not broken after all the tinkering done during the service-runner deploy. The deployed change was a change that only affects parser tests.

Wednesday, July 20, 2016 between 7:30 - 8:20 am PT: ✅

 * : Update Parsoid to use the service-runner framework
 * In collaboration with Services & Ops teams
 * wtp1001 and wtp1002 were transitioned over July 19, 2016 between 8:00 - 9:00 am PT

Monday, July 11, 2016 around 1:10 pm PT: ✅

 * : Respect $wgInterwikiMagic setting while parsing lang-links
 * : DOMDiff: Skip over encapsulated content rather than about-id content (fixes problem with lost edits in content nested in elements with templated attributes)
 * Code cleanup (don't expect functional changes): Use a more appropriate DOM helper (s/hasParsoidAboutId/isEncapsulationWrapper/) where appropriate

Monday, June 27, 2016 around 1:08 pm PT: ✅

 * Template wrapping: Eliminate pathological tpl-range nesting scenario

Thursday, June 23, 2016 around 10:30 am PT: ✅

 * Emit single newline separator in table wikitext for new content


 * Make the http connect timeout configurable
 * Update many deps by minor version
 * : Ensure newlines are added where required around thead/tbody/tfoot
 * : Remove node 0.8 support (does not affect WMF deploy of Parsoid)

Wednesday June 15, 2016 around 1:10 pm PT: ✅
Non-functional changes (these will come into play once we move to v2.0.0 of Parsoid HTML):
 * : Emit |- between thead/tbody/tfoot
 * Roundtrip 2.0.0 content
 * : Provide HTML2HTML endpoint in Parsoid

Monday, June 6, 2016 around 1:15 pm PT: ✅

 * Normalize all lists to not mix wikitext and HTML list syntax (selser prevents unnecessary dirty diffs in production)

Thursday, June 2, 2016 around 10:40pm PT: ✅

 * : Serialize content in HTML tables using HTML tags
 * : Fix selser issues serializing first table row
 * Selser: Bug fix reusing separator text from original source

Wednesday, June 1, 2016 around 1:15 pm PT: ✅

 * Bump core-js from v1.2.6 to v2.4.0
 * Bump yargs from v1.3.1 to v4.7.1
 * Don't use non-standard array generic functions (Array.reduce, etc.) - removed from newer version of core-js
 * Use normalized form of default page "Main_Page" instead of "Main Page"
 * : Return client error for missing data attributes
 * Fix up the internal forms to use v3 post endpoint
 * Add a page/wikitext/:title route to GET wikitext for a page

Thursday, May 19, 2016 around 11:38am PT: ✅

 * : Remove deprecated v1/v2 HTTP APIs.
 * : Content negotiation; Add data-mw as separate JSON blob in the page bundle.
 * Strict Accept header checking is turned off; we will return 1.2.x format if an invalid Accept header is provided (which is allowed by RFC 2616).

CLEARED DIRTY REPOS which had this patch applied as root during the restbase/changeprop/parsoid outage: diff --git a/lib/api/routes.js b/lib/api/routes.js index 4d08922..d372c2f 100644 --- a/lib/api/routes.js +++ b/lib/api/routes.js @@ -377,6 +377,7 @@ module.exports = function(parsoidConfig, processLogger) { var v1Wt2html = function(req, res, wt) { var env = res.locals.env; var p = apiUtils.startWt2html(req, res, wt).then(function(ret) { +                      if ( ret.oldid === 106801025 ) { return false; }                        if (typeof ret.wikitext === 'string') {                                return apiUtils.parseWt(ret)                                        // .timeout(REQ_TIMEOUT)

Wednesday, May 4, 2016 around 1:15 pm PT: ✅

 * : Update cached SiteMatrix, mainly for jamwiki

Monday, May 2, 2016 around 1:15 pm PT: ✅

 * html -> wt: For invalid links, text doesn't need escaping in link context
 * DOMDiff: Fix marking data-is-block on extra base nodes
 * Add autoload mechanism for user extension code -- proof-of-concept for future use
 * Update shrinkwrap after 23c97752
 * Code cleanup: should not affect functionality
 * Keep the data-* attributes at the edges of the DOM
 * Remove ParsoidCacheRequest
 * Organize post-processors distinguishing handlers
 * Move the dumper to DOMUtils and use more widely

Monday, April 25, 2016 around 1:05 pm PT: ✅

 * : Pass the right title to PHPParseRequest
 * Don't allow unclosed extension tags
 * Code cleanup: should not affect functionality
 * : Move tsrDelta to dp.tmp
 * Rename DU.serializeChilden to DU.serializeToXML
 * storeDataParsoid is an env variable, not a Parsoid config property

Monday, April 11, 2016 around 1:15pm PT: ✅

 * Count api version use
 * Don't dom-diff on a cloned node
 * : Migrate temporary data to dp.tmp
 * Suppress errors raised when getting debugging info
 * Code cleanup: should not affect functionality
 * Fix some variable shadowing
 * Stop working on cloned nodes in parserTests
 * Rename timer to stats, since we do counting too
 * Fix regression testing tool
 * Fix crasher and more informative rt errors

Wednesday, April 6, 2016 around 1:15 pm PT: ✅

 * , : Serialize localized image options (already cherry-picked yesterday)
 * Stop suppressing escaping errors
 * Remove the broken_template rule in the PEG tokenizer -- no need to wrap {{, {{{, }}, }} } in &lt;nowiki&gt; spans
 * Code cleanup: should not affect functionality
 * Cleanup some fallback rules in the PEG tokenizer
 * Use Util.placeholder in a few more places
 * Be consistent with dp.src check

Tuesday, April 5, 2016 around 2:40pm PT: ✅

 * , : Cherry-pick of image option localization patch to match alias reordering in mediawiki core version 1.27.0-wmf.20.
 * Deployed cherry-pick from  branch.

Monday, April 4, 2016 around 1:10 pm PT: ✅

 * Fix log type in cite implementation
 * Code cleanup: should not affect functionality
 * Move dp.src handlers to their respective dom handlers
 * Add new env.normalizeAndResolvePageTitle helper and use it

Wednesday, March 30, 2016 around 1:15 pm PT: ✅

 * Bump HTML version number to 1.2.1
 * Declare charset with
 * Add html/dp version numbers in instead of full content type
 * : Move auto-generated refs flag from data-parsoid to data-mw
 * Default ParsoidConfig.loadWMF to false
 * Bump node-uuid to 1.4.7 for nsp

Wednesday, March 23, 2016 around 1:15 pm PT: ✅

 * Don't construct regexp with a regexp when flags need to be set
 * Don't export Namespace since it isn't used anywhere else
 * : Include user agent in request logs
 * Tweak error prefixes for ease of browsing in logstash
 * Promisify the exposed batching methods
 * : Handle async createSocket

Monday, March 7, 2016 around 1:15pm PT: ✅

 * Cleanup and tweaks of translusion formatting for clarity and fewer dirty diffs
 * Fix breakage in counting of HTTP status codes (broken by fix for T127983)

Tuesday, March 1, 2016 around 10:50am PT: ✅

 * : Fix bug in formatting of transclusions for block-format templates
 * Remove overloading of pipe stop in the PEG tokenizer -- eliminates incorrect parsing of pipes in external links

Monday, February 29, 2016 around 1:25pm PT: ✅

 * : Don't crash on misconfigured statsd host
 * : Match html5 unquoted attribute parsing
 * Break for [[ in table attribute values too

Wednesday, Feb 24, 2016 around 1:15 pm PT: ✅

 * Bump HTML content-type version to 1.2.0 (from 1.1.0) and data-parsoid content-type version to 0.0.2 (from 0.0.1)
 * Update parsoid content type meta tags in the
 *  is now changed to  to be more consistent with the version information that is output in the response headers.
 * For the non-pagebundle API endpoints,  is also emitted.
 * : Remove user/contribution information from header
 * : Assert param value serializes to a string
 * , : Fetch and use templatedata while serializing transclusions
 * data-parsoid semantics updated to use 'foo=bar' as the default transclusion arg spacing.
 * Remove data-mw.body.extsrc for the tag (unused, and bloats data-mw)

Thursday, Feb 18, 2016 around 11:00 am PT: ✅

 * : Update sitematrix for ady.wikipedia.org

Wednesday, Feb 10, 2016 around 1:15 pm PT: ✅

 * Assert when flipped ranges are expected in template wrapping
 * This should have no functional changes in parsing. At best, it will catch a bug / failed expectation in the template wrapping code.

Monday, Feb 8 2016 around 1:15 pm PT: ✅

 * Fix worker shutdown code in server.js + use it to restart stuck workers and to shutdown the Parsoid service
 * Expect that this will fix the scenario with stuck worker processes when Parsoid service is restarted during deploys.

Wednesday, Feb 3, 2016 around 2:45 pm PT: ✅

 * Fix complex single-line nowiki handling
 * More robust algorithm + can eliminate some spurious nowikis
 * : Disable migrateTrailingNLs if table has had content fostered out of it
 * Some code cleanup
 * Removed some FIXMEs in nowiki escaping in  s
 * Tweaks to attribute parsing in the PEG tokenizer
 * Warn if prefix/domain is not unique during configuration
 * ParsoidConfig changes: Don't proxy nonglobal wikis (temporary special handling for labswiki and labstestwiki)
 * Config changes:
 * Remove hardcoded references to internal API LVS endpoint.
 * Removed references to unused parsoidcache.
 * Removed explicit config entry for labswiki (ParsoidConfig handles it now).

Monday, Feb 1, 2016 around 1:15 pm PT: 2fcc841f to be deployed Cancelled deploy to fix nowiki regressions

 * Warn if prefix/domain is not unique during configuration
 * Fix complex single-line nowiki tests
 * Can eliminate some spurious nowikis
 * But, can introduce spurious nowikis around [] style wikitext -- 0.07% of pages in rt testing were affected, but with selective serialization, we expect impact to be small. We will consider possible solutions to minimize nowikis in this scenario, nevertheless.
 * : Disable migrateTrailingNLs if table has had content fostered out of it
 * Config changes: Remove hardcoded references to internal API LVS endpoint + removed references to unused parsoidcache.

Wednesday, Jan 20, 2016 around 1:45 pm PT: ✅

 * : Record when a range is subsumed from overlapping
 * Temporarily disable the request timeout (since they don't abort request processing and cancel cpu timeouts as well)
 * Reduce cpu timeout value to 3 minutes

Monday, Jan 11, 2016 around 1:15 pm PT: ✅
wt2html
 * : Remove the vestiges of pipetrick entirely
 * , : Note that DOM tree building uses restrictive checks (documentation fix)
 * : Strip nowiki spans from templated / extension content
 * Match permitted attributes to php's getAttribsRegex

html2wt
 * Normalize DOM by stripping \u200e, \u200f next to category links (This is controlled by a config switch that we will turn on, if necessary)
 * Edge case fixes to serializing lists with templated portions


 * Use startsWith instead of regex to match tag names in the DOM
 * Optimise shadow meta deletion
 * Bump domino to 1.0.21 (with performance fixes)

Other
 * : Add a generic extension registration mechanism
 * : Register and natively
 * : Update SiteMatrix, another wiki created
 * : Use httpStatus instead of code as the property on errors