Parsoid/Deployments/2016

Tuesday, December 21, 2016 around 5:03 am PT: ✅

 * ApiRequest: Clone the request options before modifying them.

Tuesday, December 20, 2016 around 7:48 am PT: ✅

 * Use mwApiServer as the provider of the full URI of the MW API
 * Add a mwApiServer configuration variable
 * Add arbcom_cswiki to site matrix

Thursday, December 15, 2016 around 10:24 am PT: ✅

 * : Ignore self-closed tags when extending source
 * Drop native LST altogether
 * Fix DOMDiff annotations


 * Linter:
 * Fix bug in self-closing-tag category + other cleanup
 * Fix crasher when linting a gallery
 * Apply lint sampling when sending it to the logger as well
 * Don't provide 'src'

Wednesday, December 14, 2016 around 1:24 pm PT: ✅
wt2html: Linting (disabled in production): Debugging:
 * : Add more page-level metadata that MCS can use
 * Support extension tags which shadows block level elements
 * Move section handling to the LST extension
 * : Prevent infinite recursion
 * : Allow nested ref tags only in templates
 * Use ApiRequest.js to post results
 * Handle MW API errors that come with a HTTP 200
 * Let extensions supply the pp tracing name

Monday, December 12, 2016 around 1:35 pm PT: ✅

 * Bump table cell and list item resource limits to 40K (from 30K)

Wednesday, December 7, 2016 around 1:21 pm PT: ✅

 * Bump HTML contentVersion to 1.3.0 (see updated spec)
 * : Native extension
 * : Assign ids to headings to match core's section anchors
 * , : Munge link fragments and element ids as in the php parser
 * : Update SiteMatrix data fork for last 3 wiki creations
 * : Deal with newlines in  and  cells
 * : Suppress logs for known unknown contentmodels
 * : Reduce request timeout to 110s (from 3min) and worker timeout to 115s (from 3min); Increase M/W batcher API timeout to 65s
 * Some configurations moved to vars.yaml in the deploy repo
 * s/warning/warn/ to match service-runner's levels
 * Don't entity escape extension attribute values from data-mw
 * Normalize all extension options, not just native
 * Remove unused package gelf-stream
 * Linter: Add linting of self-closed tags
 * Testing:
 * Remove scrolling by access key
 * require('should') in lintertests.js for standalone runs

Monday, November 7, 2016 around 1:29 pm PT: ✅

 * Cleanup http redirects
 * Send error responses in the requested format
 * Fix processing listeners in node v7.x

Wednesday, November 2, 2016 around 1:27 pm PT: ✅
And the commits that were attempted to deploy on Oct. 26th (ede4353):
 * : Whitelist content model fallback
 * Testing:
 * Don't expose dev routes in production
 * Get rid of simple debug helpers
 * : Stop testing on node v0.10.x
 * Linter:
 * Add node name for missing-end-tag
 * Remove higher resource limits (max wikitext page size, max # list items, max # table cells per page) and fall back to default limits.
 * : Bump mediawiki-title
 * : Fix crasher and other bugs of that category
 * service-runner doesn't recognize warning level
 * Stop asserting that we'll never be encapsulating a flipped range
 * Lots of linter fixes / features (currently, linting is disabled in production though)
 * Remove html5 treebuilder in favour of domino's
 * Bump domino to 1.0.27
 * : Trim template target after stripping comments
 * , : Allow extensions to handle specific contentmodels

Tuesday, November 1, 2016: Parsoid cluster upgraded to node v4.6
Ops upgraded node on the Parsoid eqiad cluster to node v4.6. The (backup) codfw cluster had been upgraded on Monday.

Monday, October 31, 2016 around 1:34 pm PT: ✅

 * : Fix reflected XSS

Wednesday, October 26, 2016 around 1:15 PT: ede4353 to be deployed Reverted to 63f1e151, contentmodel errs

 * : Bump mediawiki-title
 * : Fix crasher and other bugs of that category
 * service-runner doesn't recognize warning level
 * Stop asserting that we'll never be encapsulating a flipped range
 * Lots of linter fixes / features (currently, linting is disabled in production though)
 * Remove html5 treebuilder in favour of domino's
 * Bump domino to 1.0.27
 * : Trim template target after stripping comments
 * , : Allow extensions to handle specific contentmodels

Monday, October 24, 2016 around 1:42 pm PT: ✅

 * , : Site matrix update for olowiki
 * : Fix crasher in table fixups

Wednesday, September 21, 2016 around 1:17 pm PT: ✅

 * Tokenizer:
 * Encapsulate protected table attributes from wt
 * Inline generic_attribute_newline_value and table_attribute_value
 * Set srcOffsets for table_attribute and generic_newline_attribute
 * HTTP API:
 * Page id and revid aren't the same thing
 * html2html should require an original or previous revision

Wednesday, September 14, 2016 around 1:11 pm PT: ✅

 * Let native extensions add stylesheets
 * Move getAPIProxy to parsoidConfig
 * Other minor refactorings and parserTest changes

Monday, September 12, 2016 around 1:40 pm PT: ✅

 * Handle HTML tags in attribute text properly
 * AttributeExpander: Tweak check for improved code readability
 * Testing:
 * Bump worker_heartbeat_timeout to 2mins for testing
 * Allow specifying a specific revision for roundtrip-test.js

Tuesday, September 6, 2016 around 10:37 am PT: ✅

 * : Handle invalid titles in transclusions
 * Sanitizer fixes:
 * Decode all char refs in text
 * Ignore some fields when freezing SanitizerConstants for node v6.5 -- no-op for Wikimedia cluster that runs node v4.x
 * node-module updates:
 * Bump service-runner to v2.1.0
 * Remove bunyan
 * Some minor cleanups

Monday, August 29, 2016 around 1:10 pm PT: ✅

 * Run localSettings.setup after assigning options
 * Use service-runner's metrics reporter in the http api
 * Updates in preparation for supporting version 2.x content in the future -- should be no-op for version 1.x content
 * Support downgrading 2.x content to 1.x
 * No content reuse from semantically different content versions
 * : Establish precedence for data-mw in 2.0.0 content

Monday, August 22, 2016 around 1:12 pm PT: ✅

 * : html2wt: Fix crasher in DOM normalization code
 * : Use service-runner's logger as a backend to Parsoid's logger

Wednesday, August 17, 2016 around 1:09 pm PT: ✅

 * html2wt: Always emit canonical wikitext for url links
 * html2wt: Emit url-links where appropriate no matter what rel attribute says

Monday, August 15, 2016 around 1:09 pm PT: ✅

 * migrateTrailingNLs DOM pass: Code simplifications and some subtle edge case bug fixes
 * : Deal with edge cases serializing links
 * Remove deprecated "disablepp" MediaWiki API param and pass "disablelimitreport" instead
 * Increase resource limits for wikitext size, max table cells, and max list items
 * With the upgrade to node v4, we have more breathing room for parsing large pages

Wednesday, August 10, 2016 around 1:10 pm PT: ✅

 * Handle caption-like text outside tables
 * Table captions: Remove unneeded mw:TSRMarker meta token + add TSR info in tokenizer which leads to more accurate DSR offsets.
 * When table wikitext shows up outside tables and are converted to strings, strip attached mw:TSRMarker tags
 * computeDSR: Fix source of pathological O(n^2) behavior

Tuesday, August 9, 2016 around 11:15 am PT: ✅

 * Fix crasher in escapeWikitext
 * : Update site matrix for tcy.wikipedia.org

Tuesday, August 2, 2016 - Tuesday August 9, 2016: ✅

 * : Over the week, Operations upgraded the cluster gradually.
 * The eqiad cluster was fully migrated by Friday, August 5th.
 * The codfw cluster was fully migrated by Tuesday, August 9th.

Monday, August 1, 2016 around 1:15 pm PT: ✅

 * Fix title parsing of subpages during initialization (addresses crashers while parsing these pages)
 * Only apply data-* attributes in /pagebundle/ paths (API cleanup)
 * Determines the content version in the html2wt direction, enabling content upgrade

Tuesday, July 26, 2016 around 10:12 am PT: ✅

 * Use mediawiki-title package to replace homegrown Title code (resolves, , and )
 * Reintroduce a 3-minute request timeout
 * Bump some minor / patch level versions of dependencies (addresses a security advisory)
 * Prevent JSON.stringify circular refs in template wrapping trace/error logs

Thursday, July 21, 2016 around 9:30 am PT: ✅

 * Test deploy to verify trebuchet deployment is not broken after all the tinkering done during the service-runner deploy. The deployed change was a change that only affects parser tests.

Wednesday, July 20, 2016 between 7:30 - 8:20 am PT: ✅

 * : Update Parsoid to use the service-runner framework
 * In collaboration with Services & Ops teams
 * wtp1001 and wtp1002 were transitioned over July 19, 2016 between 8:00 - 9:00 am PT

Monday, July 11, 2016 around 1:10 pm PT: ✅

 * : Respect $wgInterwikiMagic setting while parsing lang-links
 * : DOMDiff: Skip over encapsulated content rather than about-id content (fixes problem with lost edits in content nested in elements with templated attributes)
 * Code cleanup (don't expect functional changes): Use a more appropriate DOM helper (s/hasParsoidAboutId/isEncapsulationWrapper/) where appropriate

Monday, June 27, 2016 around 1:08 pm PT: ✅

 * Template wrapping: Eliminate pathological tpl-range nesting scenario

Thursday, June 23, 2016 around 10:30 am PT: ✅

 * Emit single newline separator in table wikitext for new content


 * Make the http connect timeout configurable
 * Update many deps by minor version
 * : Ensure newlines are added where required around thead/tbody/tfoot
 * : Remove node 0.8 support (does not affect WMF deploy of Parsoid)

Wednesday June 15, 2016 around 1:10 pm PT: ✅
Non-functional changes (these will come into play once we move to v2.0.0 of Parsoid HTML):
 * : Emit |- between thead/tbody/tfoot
 * Roundtrip 2.0.0 content
 * : Provide HTML2HTML endpoint in Parsoid

Monday, June 6, 2016 around 1:15 pm PT: ✅

 * Normalize all lists to not mix wikitext and HTML list syntax (selser prevents unnecessary dirty diffs in production)

Thursday, June 2, 2016 around 10:40pm PT: ✅

 * : Serialize content in HTML tables using HTML tags
 * : Fix selser issues serializing first table row
 * Selser: Bug fix reusing separator text from original source

Wednesday, June 1, 2016 around 1:15 pm PT: ✅

 * Bump core-js from v1.2.6 to v2.4.0
 * Bump yargs from v1.3.1 to v4.7.1
 * Don't use non-standard array generic functions (Array.reduce, etc.) - removed from newer version of core-js
 * Use normalized form of default page "Main_Page" instead of "Main Page"
 * : Return client error for missing data attributes
 * Fix up the internal forms to use v3 post endpoint
 * Add a page/wikitext/:title route to GET wikitext for a page

Thursday, May 19, 2016 around 11:38am PT: ✅
CLEARED DIRTY REPOS which had this patch applied as root during the restbase/changeprop/parsoid outage: diff --git a/lib/api/routes.js b/lib/api/routes.js index 4d08922..d372c2f 100644 --- a/lib/api/routes.js +++ b/lib/api/routes.js @@ -377,6 +377,7 @@ module.exports = function(parsoidConfig, processLogger) { var v1Wt2html = function(req, res, wt) { var env = res.locals.env; var p = apiUtils.startWt2html(req, res, wt).then(function(ret) { +                      if ( ret.oldid === 106801025 ) { return false; }                         if (typeof ret.wikitext === 'string') {                                 return apiUtils.parseWt(ret)                                         // .timeout(REQ_TIMEOUT)
 * : Remove deprecated v1/v2 HTTP APIs.
 * : Content negotiation; Add data-mw as separate JSON blob in the page bundle.
 * Strict Accept header checking is turned off; we will return 1.2.x format if an invalid Accept header is provided (which is allowed by RFC 2616).

Wednesday, May 4, 2016 around 1:15 pm PT: ✅

 * : Update cached SiteMatrix, mainly for jamwiki

Monday, May 2, 2016 around 1:15 pm PT: ✅

 * html -> wt: For invalid links, text doesn't need escaping in link context
 * DOMDiff: Fix marking data-is-block on extra base nodes
 * Add autoload mechanism for user extension code -- proof-of-concept for future use
 * Update shrinkwrap after 23c97752
 * Code cleanup: should not affect functionality
 * Keep the data-* attributes at the edges of the DOM
 * Remove ParsoidCacheRequest
 * Organize post-processors distinguishing handlers
 * Move the dumper to DOMUtils and use more widely

Monday, April 25, 2016 around 1:05 pm PT: ✅

 * : Pass the right title to PHPParseRequest
 * Don't allow unclosed extension tags
 * Code cleanup: should not affect functionality
 * : Move tsrDelta to dp.tmp
 * Rename DU.serializeChilden to DU.serializeToXML
 * storeDataParsoid is an env variable, not a Parsoid config property

Monday, April 11, 2016 around 1:15pm PT: ✅

 * Count api version use
 * Don't dom-diff on a cloned node
 * : Migrate temporary data to dp.tmp
 * Suppress errors raised when getting debugging info
 * Code cleanup: should not affect functionality
 * Fix some variable shadowing
 * Stop working on cloned nodes in parserTests
 * Rename timer to stats, since we do counting too
 * Fix regression testing tool
 * Fix crasher and more informative rt errors

Wednesday, April 6, 2016 around 1:15 pm PT: ✅

 * , : Serialize localized image options (already cherry-picked yesterday)
 * Stop suppressing escaping errors
 * Remove the broken_template rule in the PEG tokenizer -- no need to wrap {{, {{{, }}, }} } in &#x3C;nowiki&#x3E; spans
 * Code cleanup: should not affect functionality
 * Cleanup some fallback rules in the PEG tokenizer
 * Use Util.placeholder in a few more places
 * Be consistent with dp.src check

Tuesday, April 5, 2016 around 2:40pm PT: ✅

 * , : Cherry-pick of image option localization patch to match alias reordering in mediawiki core version 1.27.0-wmf.20.
 * Deployed cherry-pick from  branch.

Monday, April 4, 2016 around 1:10 pm PT: ✅

 * Fix log type in cite implementation
 * Code cleanup: should not affect functionality
 * Move dp.src handlers to their respective dom handlers
 * Add new env.normalizeAndResolvePageTitle helper and use it

Wednesday, March 30, 2016 around 1:15 pm PT: ✅

 * Bump HTML version number to 1.2.1
 * Declare charset with &lt;meta charset>
 * Add html/dp version numbers in &lt;head> instead of full content type
 * : Move auto-generated refs flag from data-parsoid to data-mw
 * Default ParsoidConfig.loadWMF to false
 * Bump node-uuid to 1.4.7 for nsp

Wednesday, March 23, 2016 around 1:15 pm PT: ✅

 * Don't construct regexp with a regexp when flags need to be set
 * Don't export Namespace since it isn't used anywhere else
 * : Include user agent in request logs
 * Tweak error prefixes for ease of browsing in logstash
 * Promisify the exposed batching methods
 * : Handle async createSocket

Monday, March 7, 2016 around 1:15pm PT: ✅

 * Cleanup and tweaks of transclusion formatting for clarity and fewer dirty diffs
 * Fix breakage in counting of HTTP status codes (broken by fix for T127983)

Tuesday, March 1, 2016 around 10:50am PT: ✅

 * : Fix bug in formatting of transclusions for block-format templates
 * Remove overloading of pipe stop in the PEG tokenizer -- eliminates incorrect parsing of pipes in external links

Monday, February 29, 2016 around 1:25pm PT: ✅

 * : Don't crash on misconfigured statsd host
 * : Match html5 unquoted attribute parsing
 * Break for [[ in table attribute values too

Wednesday, Feb 24, 2016 around 1:15 pm PT: ✅

 * Bump HTML content-type version to 1.2.0 (from 1.1.0) and data-parsoid content-type version to 0.0.2 (from 0.0.1)
 * Update parsoid content type meta tags in the &lt;head>
 *  is now changed to  to be more consistent with the version information that is output in the response headers.
 * For the non-pagebundle API endpoints,  is also emitted.
 * : Remove user/contribution information from header
 * : Assert param value serializes to a string
 * , : Fetch and use templatedata while serializing transclusions
 * data-parsoid semantics updated to use 'foo=bar' as the default transclusion arg spacing.
 * Remove data-mw.body.extsrc for the tag (unused, and bloats data-mw)

Thursday, Feb 18, 2016 around 11:00 am PT: ✅

 * : Update sitematrix for ady.wikipedia.org

Wednesday, Feb 10, 2016 around 1:15 pm PT: ✅

 * Assert when flipped ranges are expected in template wrapping
 * This should have no functional changes in parsing. At best, it will catch a bug / failed expectation in the template wrapping code.

Monday, Feb 8 2016 around 1:15 pm PT: ✅

 * Fix worker shutdown code in server.js + use it to restart stuck workers and to shutdown the Parsoid service
 * Expect that this will fix the scenario with stuck worker processes when Parsoid service is restarted during deploys.

Wednesday, Feb 3, 2016 around 2:45 pm PT: ✅

 * Fix complex single-line nowiki handling
 * More robust algorithm + can eliminate some spurious nowikis
 * : Disable migrateTrailingNLs if table has had content fostered out of it
 * Some code cleanup
 * Removed some FIXMEs in nowiki escaping in  s
 * Tweaks to attribute parsing in the PEG tokenizer
 * Warn if prefix/domain is not unique during configuration
 * ParsoidConfig changes: Don't proxy nonglobal wikis (temporary special handling for labswiki and labstestwiki)
 * Config changes:
 * Remove hardcoded references to internal API LVS endpoint.
 * Removed references to unused parsoidcache.
 * Removed explicit config entry for labswiki (ParsoidConfig handles it now).

Monday, Feb 1, 2016 around 1:15 pm PT: 2fcc841f to be deployed Cancelled deploy to fix nowiki regressions

 * Warn if prefix/domain is not unique during configuration
 * Fix complex single-line nowiki tests
 * Can eliminate some spurious nowikis
 * But, can introduce spurious nowikis around [] style wikitext -- 0.07% of pages in rt testing were affected, but with selective serialization, we expect impact to be small. We will consider possible solutions to minimize nowikis in this scenario, nevertheless.
 * : Disable migrateTrailingNLs if table has had content fostered out of it
 * Config changes: Remove hardcoded references to internal API LVS endpoint + removed references to unused parsoidcache.

Wednesday, Jan 20, 2016 around 1:45 pm PT: ✅

 * : Record when a range is subsumed from overlapping
 * Temporarily disable the request timeout (since they don't abort request processing and cancel cpu timeouts as well)
 * Reduce cpu timeout value to 3 minutes

Monday, Jan 11, 2016 around 1:15 pm PT: ✅
wt2html html2wt : Performance fixes (for large DOMs) Other
 * : Remove the vestiges of pipetrick entirely
 * , : Note that DOM tree building uses restrictive checks (documentation fix)
 * : Strip nowiki spans from templated / extension content
 * Match permitted attributes to php's getAttribsRegex
 * Normalize DOM by stripping \u200e, \u200f next to category links (This is controlled by a config switch that we will turn on, if necessary)
 * Edge case fixes to serializing lists with templated portions
 * Use startsWith instead of regex to match tag names in the DOM
 * Optimise shadow meta deletion
 * Bump domino to 1.0.21 (with performance fixes)
 * : Add a generic extension registration mechanism
 * : Register and  natively
 * : Update SiteMatrix, another wiki created
 * : Use httpStatus instead of code as the property on errors