Parsoid/Deployments/2016

From mediawiki.org

Tuesday, December 21, 2016 around 5:03 am PT: Yes Deployed e7e3a4dc on the deploy-20161221 branch[edit]

  • ApiRequest: Clone the request options before modifying them.

Tuesday, December 20, 2016 around 7:48 am PT: Yes Deployed 5eb649e8[edit]

  • Use mwApiServer as the provider of the full URI of the MW API
  • Add a mwApiServer configuration variable
  • Add arbcom_cswiki to site matrix

Thursday, December 15, 2016 around 10:24 am PT: Yes Deployed 6719e240[edit]

  • task T96555: Ignore self-closed tags when extending source
  • Drop native LST altogether
  • Fix DOMDiff annotations
  • Linter:
    • Fix bug in self-closing-tag category + other cleanup
    • Fix crasher when linting a gallery
    • Apply lint sampling when sending it to the logger as well
    • Don't provide 'src'

Wednesday, December 14, 2016 around 1:24 pm PT: Yes Deployed 60ee19ac[edit]

wt2html:

  • task T119265: Add more page-level metadata that MCS can use
  • Support extension tags which shadows block level elements
  • Move section handling to the LST extension
  • task T104523: Prevent infinite recursion
  • task T104662: Allow nested ref tags only in templates

Linting (disabled in production):

  • Use ApiRequest.js to post results
  • Handle MW API errors that come with a HTTP 200

Debugging:

  • Let extensions supply the pp tracing name

Monday, December 12, 2016 around 1:35 pm PT: Yes Updated production config[edit]

  • Bump table cell and list item resource limits to 40K (from 30K)

Wednesday, December 7, 2016 around 1:21 pm PT: Yes Deployed 3cf19c6b[edit]

  • Bump HTML contentVersion to 1.3.0 (see updated spec)
  • task T151570: Update SiteMatrix data fork for last 3 wiki creations
  • task T149209: Deal with newlines in <td> and <th> cells
  • task T150213: Suppress logs for known unknown contentmodels
  • task T152073: Reduce request timeout to 110s (from 3min) and worker timeout to 115s (from 3min); Increase M/W batcher API timeout to 65s
  • Some configurations moved to vars.yaml in the deploy repo
  • s/warning/warn/ to match service-runner's levels
  • Don't entity escape extension attribute values from data-mw
  • Normalize all extension options, not just native
  • Remove unused package gelf-stream
  • Linter: Add linting of self-closed tags
  • Testing:
    • Remove scrolling by access key
    • require('should') in lintertests.js for standalone runs

Monday, November 7, 2016 around 1:29 pm PT: Yes Deployed 2c2fe425[edit]

  • Cleanup http redirects
  • Send error responses in the requested format
  • Fix processing listeners in node v7.x

Wednesday, November 2, 2016 around 1:27 pm PT: Yes Deployed 173d7e32[edit]

  • task T149241: Whitelist content model fallback
  • Testing:
    • Don't expose dev routes in production
    • Get rid of simple debug helpers
    • task T119228: Stop testing on node v0.10.x
  • Linter:
    • Add node name for missing-end-tag
  • Remove higher resource limits (max wikitext page size, max # list items, max # table cells per page) and fall back to default limits.

And the commits that were attempted to deploy on Oct. 26th (ede4353):

  • task T141723: Bump mediawiki-title
  • task T141905: Fix crasher and other bugs of that category
  • service-runner doesn't recognize warning level
  • Stop asserting that we'll never be encapsulating a flipped range
  • Lots of linter fixes / features (currently, linting is disabled in production though)
  • Remove html5 treebuilder in favour of domino's
  • Bump domino to 1.0.27
  • task T147742: Trim template target after stripping comments
  • task T48580, task T133320: Allow extensions to handle specific contentmodels

Tuesday, November 1, 2016: Parsoid cluster upgraded to node v4.6[edit]

Ops upgraded node on the Parsoid eqiad cluster to node v4.6. The (backup) codfw cluster had been upgraded on Monday.

Monday, October 31, 2016 around 1:34 pm PT: Yes Deployed e503e801[edit]

Wednesday, October 26, 2016 around 1:15 PT: ede4353 to be deployed Reverted to 63f1e151, contentmodel errs[edit]

  • task T141723: Bump mediawiki-title
  • task T141905: Fix crasher and other bugs of that category
  • service-runner doesn't recognize warning level
  • Stop asserting that we'll never be encapsulating a flipped range
  • Lots of linter fixes / features (currently, linting is disabled in production though)
  • Remove html5 treebuilder in favour of domino's
  • Bump domino to 1.0.27
  • task T147742: Trim template target after stripping comments
  • task T48580, task T133320: Allow extensions to handle specific contentmodels

Monday, October 24, 2016 around 1:42 pm PT: Yes Deployed 63f1e151[edit]

Wednesday, September 21, 2016 around 1:17 pm PT: Yes Deployed a802de0[edit]

  • Tokenizer:
    • Encapsulate protected table attributes from wt
    • Inline generic_attribute_newline_value and table_attribute_value
    • Set srcOffsets for table_attribute and generic_newline_attribute
  • HTTP API:
    • Page id and revid aren't the same thing
    • html2html should require an original or previous revision

Wednesday, September 14, 2016 around 1:11 pm PT: Yes Deployed aed15dda[edit]

  • Let native extensions add stylesheets
  • Move getAPIProxy to parsoidConfig
  • Other minor refactorings and parserTest changes

Monday, September 12, 2016 around 1:40 pm PT: Yes Deployed f7c43009[edit]

  • Handle HTML tags in attribute text properly
  • AttributeExpander: Tweak check for improved code readability
  • Testing:
    • Bump worker_heartbeat_timeout to 2mins for testing
    • Allow specifying a specific revision for roundtrip-test.js

Tuesday, September 6, 2016 around 10:37 am PT: Yes Deployed 7863e6ad[edit]

  • task T142617: Handle invalid titles in transclusions
  • Sanitizer fixes:
    • Decode all char refs in text
    • Ignore some fields when freezing SanitizerConstants for node v6.5 -- no-op for Wikimedia cluster that runs node v4.x
  • node-module updates:
    • Bump service-runner to v2.1.0
    • Remove bunyan
  • Some minor cleanups

Monday, August 29, 2016 around 1:10 pm PT: Yes Deployed 48cf803e[edit]

  • Run localSettings.setup after assigning options
  • Use service-runner's metrics reporter in the http api
  • Updates in preparation for supporting version 2.x content in the future -- should be no-op for version 1.x content
    • Support downgrading 2.x content to 1.x
    • No content reuse from semantically different content versions
    • task T143356: Establish precedence for data-mw in 2.0.0 content

Monday, August 22, 2016 around 1:12 pm PT: Yes Deployed df53a991[edit]

  • task T142998: html2wt: Fix crasher in DOM normalization code
  • task T141370: Use service-runner's logger as a backend to Parsoid's logger

Wednesday, August 17, 2016 around 1:09 pm PT: Yes Deployed 3cf877bb[edit]

  • html2wt: Always emit canonical wikitext for url links
  • html2wt: Emit url-links where appropriate no matter what rel attribute says

Monday, August 15, 2016 around 1:09 pm PT: Yes Deployed f039dcf6[edit]

  • migrateTrailingNLs DOM pass: Code simplifications and some subtle edge case bug fixes
  • task T138864: Deal with edge cases serializing links
  • Remove deprecated "disablepp" MediaWiki API param and pass "disablelimitreport" instead
  • Increase resource limits for wikitext size, max table cells, and max list items
    • With the upgrade to node v4, we have more breathing room for parsing large pages

Wednesday, August 10, 2016 around 1:10 pm PT: Yes Deployed 4de49e26[edit]

  • Handle caption-like text outside tables
  • Table captions: Remove unneeded mw:TSRMarker meta token + add TSR info in tokenizer which leads to more accurate DSR offsets.
  • When table wikitext shows up outside tables and are converted to strings, strip attached mw:TSRMarker tags
  • computeDSR: Fix source of pathological O(n^2) behavior

Tuesday, August 9, 2016 around 11:15 am PT: Yes Deployed a577d80e[edit]

  • Fix crasher in escapeWikitext
  • task T140898: Update site matrix for tcy.wikipedia.org

Tuesday, August 2, 2016 - Tuesday August 9, 2016: Yes Upgrade Parsoid cluster to node v4.x and Jessie[edit]

  • task T135176: Over the week, Operations upgraded the cluster gradually.
    • The eqiad cluster was fully migrated by Friday, August 5th.
    • The codfw cluster was fully migrated by Tuesday, August 9th.

Monday, August 1, 2016 around 1:15 pm PT: Yes Deployed abf396eb[edit]

  • Fix title parsing of subpages during initialization (addresses crashers while parsing these pages)
  • Only apply data-* attributes in /pagebundle/ paths (API cleanup)
    • Determines the content version in the html2wt direction, enabling content upgrade

Tuesday, July 26, 2016 around 10:12 am PT: Yes Deployed 285b6983[edit]

  • Use mediawiki-title package to replace homegrown Title code (resolves task T113322, task T133425, and task T139135)
  • Reintroduce a 3-minute request timeout
  • Bump some minor / patch level versions of dependencies (addresses a security advisory)
  • Prevent JSON.stringify circular refs in template wrapping trace/error logs

Thursday, July 21, 2016 around 9:30 am PT: Yes Deployed ed2f8228[edit]

  • Test deploy to verify trebuchet deployment is not broken after all the tinkering done during the service-runner deploy. The deployed change was a change that only affects parser tests.

Wednesday, July 20, 2016 between 7:30 - 8:20 am PT: Yes Deployed 45beb6c0[edit]

  • task T90668: Update Parsoid to use the service-runner framework
    • In collaboration with Services & Ops teams
    • wtp1001 and wtp1002 were transitioned over July 19, 2016 between 8:00 - 9:00 am PT

Monday, July 11, 2016 around 1:10 pm PT: Yes Deployed e738c415[edit]

  • task T131564: Respect $wgInterwikiMagic setting while parsing lang-links
  • task T139388: DOMDiff: Skip over encapsulated content rather than about-id content (fixes problem with lost edits in content nested in elements with templated attributes)
  • Code cleanup (don't expect functional changes): Use a more appropriate DOM helper (s/hasParsoidAboutId/isEncapsulationWrapper/) where appropriate

Monday, June 27, 2016 around 1:08 pm PT: Yes Deployed dd8e644d[edit]

  • Template wrapping: Eliminate pathological tpl-range nesting scenario

Thursday, June 23, 2016 around 10:30 am PT: Yes Deployed 18022c96[edit]

  • Emit single newline separator in table wikitext for new content
  • Make the http connect timeout configurable
  • Update many deps by minor version
  • task T137406: Ensure newlines are added where required around thead/tbody/tfoot
  • task T96195: Remove node 0.8 support (does not affect WMF deploy of Parsoid)

Wednesday June 15, 2016 around 1:10 pm PT: Yes Deployed 3445eceb[edit]

Non-functional changes (these will come into play once we move to v2.0.0 of Parsoid HTML):

  • Roundtrip 2.0.0 content
  • task T114413: Provide HTML2HTML endpoint in Parsoid

Monday, June 6, 2016 around 1:15 pm PT: Yes Deployed e8d6092e[edit]

  • Normalize all lists to not mix wikitext and HTML list syntax (selser prevents unnecessary dirty diffs in production)

Thursday, June 2, 2016 around 10:40pm PT: Yes Deployed 7188080b[edit]

  • task T134389: Serialize content in HTML tables using HTML tags
  • task T125419: Fix selser issues serializing first table row
  • Selser: Bug fix reusing separator text from original source

Wednesday, June 1, 2016 around 1:15 pm PT: Yes Deployed afb0d522[edit]

  • Bump core-js from v1.2.6 to v2.4.0
  • Bump yargs from v1.3.1 to v4.7.1
  • Don't use non-standard array generic functions (Array.reduce, etc.) - removed from newer version of core-js
  • Use normalized form of default page "Main_Page" instead of "Main Page"
  • task T135596: Return client error for missing data attributes
  • Fix up the internal forms to use v3 post endpoint
  • Add a page/wikitext/:title route to GET wikitext for a page

Thursday, May 19, 2016 around 11:38am PT: Yes Deployed 67816adf[edit]

  • task T100681: Remove deprecated v1/v2 HTTP APIs.
  • task T130638: Content negotiation; Add data-mw as separate JSON blob in the page bundle.
  • Strict Accept header checking is turned off; we will return 1.2.x format if an invalid Accept header is provided (which is allowed by RFC 2616).

CLEARED DIRTY REPOS which had this patch applied as root during the restbase/changeprop/parsoid outage:

diff --git a/lib/api/routes.js b/lib/api/routes.js
index 4d08922..d372c2f 100644
--- a/lib/api/routes.js
+++ b/lib/api/routes.js
@@ -377,6 +377,7 @@ module.exports = function(parsoidConfig, processLogger) {
        var v1Wt2html = function(req, res, wt) {
                var env = res.locals.env;
                var p = apiUtils.startWt2html(req, res, wt).then(function(ret) {
+                       if ( ret.oldid === 106801025 ) { return false; }
                        if (typeof ret.wikitext === 'string') {
                                return apiUtils.parseWt(ret)
                                        // .timeout(REQ_TIMEOUT)

Wednesday, May 4, 2016 around 1:15 pm PT: Yes Deployed b0d015fa[edit]

Monday, May 2, 2016 around 1:15 pm PT: Yes Deployed 0a26f3a4[edit]

  • html -> wt: For invalid links, text doesn't need escaping in link context
  • DOMDiff: Fix marking data-is-block on extra base nodes
  • Add autoload mechanism for user extension code -- proof-of-concept for future use
  • Update shrinkwrap after 23c97752
  • Code cleanup: should not affect functionality
    • Keep the data-* attributes at the edges of the DOM
    • Remove ParsoidCacheRequest
    • Organize post-processors distinguishing handlers
    • Move the dumper to DOMUtils and use more widely

Monday, April 25, 2016 around 1:05 pm PT: Yes Deployed d5363193[edit]

  • task T130645: Pass the right title to PHPParseRequest
  • Don't allow unclosed extension tags
  • Code cleanup: should not affect functionality
    • task T95325: Move tsrDelta to dp.tmp
    • Rename DU.serializeChilden to DU.serializeToXML
    • storeDataParsoid is an env variable, not a Parsoid config property

Monday, April 11, 2016 around 1:15pm PT: Yes Deployed e3766b79[edit]

  • Count api version use
  • Don't dom-diff on a cloned node
  • task T95325: Migrate temporary data to dp.tmp
  • Suppress errors raised when getting debugging info
  • Code cleanup: should not affect functionality
    • Fix some variable shadowing
    • Stop working on cloned nodes in parserTests
    • Rename timer to stats, since we do counting too
    • Fix regression testing tool
    • Fix crasher and more informative rt errors

Wednesday, April 6, 2016 around 1:15 pm PT: Yes Deployed 5f6c0c60[edit]

  • task T116020, task T53852: Serialize localized image options (already cherry-picked yesterday)
  • Stop suppressing escaping errors
  • Remove the broken_template rule in the PEG tokenizer -- no need to wrap {{, {{{, }}, }}} in <nowiki> spans
  • Code cleanup: should not affect functionality
    • Cleanup some fallback rules in the PEG tokenizer
    • Use Util.placeholder in a few more places
    • Be consistent with dp.src check

Tuesday, April 5, 2016 around 2:40pm PT: Yes Deployed a5be1cdc[edit]

  • task T116020, task T53852: Cherry-pick of image option localization patch to match alias reordering in mediawiki core version 1.27.0-wmf.20.
  • Deployed cherry-pick from deploy-20160405 branch.

Monday, April 4, 2016 around 1:10 pm PT: Yes Deployed 579ec3e6[edit]

  • Fix log type in cite implementation
  • Code cleanup: should not affect functionality
    • Move dp.src handlers to their respective dom handlers
    • Add new env.normalizeAndResolvePageTitle helper and use it

Wednesday, March 30, 2016 around 1:15 pm PT: Yes Deployed a20ef276[edit]

  • Bump HTML version number to 1.2.1
  • Declare charset with <meta charset>
  • Add html/dp version numbers in <head> instead of full content type
  • task T113331: Move auto-generated refs flag from data-parsoid to data-mw
  • Default ParsoidConfig.loadWMF to false
  • Bump node-uuid to 1.4.7 for nsp

Wednesday, March 23, 2016 around 1:15 pm PT: Yes Deployed 5538d868[edit]

  • Don't construct regexp with a regexp when flags need to be set
  • Don't export Namespace since it isn't used anywhere else
  • task T129752: Include user agent in request logs
  • Tweak error prefixes for ease of browsing in logstash
  • Promisify the exposed batching methods
  • task T128659: Handle async createSocket

Monday, March 7, 2016 around 1:15pm PT: Yes Deployed 5db1d28b[edit]

  • Cleanup and tweaks of transclusion formatting for clarity and fewer dirty diffs
  • Fix breakage in counting of HTTP status codes (broken by fix for T127983)

Tuesday, March 1, 2016 around 10:50am PT: Yes Deployed 1f7ed5d0[edit]

  • task T128319: Fix bug in formatting of transclusions for block-format templates
  • Remove overloading of pipe stop in the PEG tokenizer -- eliminates incorrect parsing of pipes in external links

Monday, February 29, 2016 around 1:25pm PT: Yes Deployed d809ad7a[edit]

  • task T127983: Don't crash on misconfigured statsd host
  • task T108134: Match html5 unquoted attribute parsing
  • Break for [[ in table attribute values too

Wednesday, Feb 24, 2016 around 1:15 pm PT: Yes Deployed 581a43c7[edit]

  • Bump HTML content-type version to 1.2.0 (from 1.1.0) and data-parsoid content-type version to 0.0.2 (from 0.0.1)
  • Update parsoid content type meta tags in the <head>
    • <meta property="mw:parsoidVersion" content="0"/> is now changed to <meta property="mw:html-content-type" content='text/html; charset=utf-8; profile="mediawiki.org/specs/html/1.2.0"'/> to be more consistent with the version information that is output in the response headers.
    • For the non-pagebundle API endpoints, <meta property="mw:data-parsoid-content-type" content='application/json; charset=utf-8; profile="mediawiki.org/specs/data-parsoid/0.0.2"'/> is also emitted.
  • task T125266: Remove user/contribution information from header
  • task T90479: Assert param value serializes to a string
  • task T104599, task T111674: Fetch and use templatedata while serializing transclusions
    • data-parsoid semantics updated to use 'foo=bar' as the default transclusion arg spacing.
  • Remove data-mw.body.extsrc for the <references> tag (unused, and bloats data-mw)

Thursday, Feb 18, 2016 around 11:00 am PT: Yes Deployed dfbafb60[edit]

Wednesday, Feb 10, 2016 around 1:15 pm PT: Yes Deployed 8976ab93[edit]

  • Assert when flipped ranges are expected in template wrapping
    • This should have no functional changes in parsing. At best, it will catch a bug / failed expectation in the template wrapping code.

Monday, Feb 8 2016 around 1:15 pm PT: Yes Deployed 4d44fcc7[edit]

  • Fix worker shutdown code in server.js + use it to restart stuck workers and to shutdown the Parsoid service
    • Expect that this will fix the scenario with stuck worker processes when Parsoid service is restarted during deploys.

Wednesday, Feb 3, 2016 around 2:45 pm PT: Yes Deployed 98619f7f[edit]

  • Fix complex single-line nowiki handling
    • More robust algorithm + can eliminate some spurious nowikis
  • task T115289: Disable migrateTrailingNLs if table has had content fostered out of it
  • Some code cleanup
    • Removed some FIXMEs in nowiki escaping in <td>s
    • Tweaks to attribute parsing in the PEG tokenizer
  • Warn if prefix/domain is not unique during configuration
  • ParsoidConfig changes: Don't proxy nonglobal wikis (temporary special handling for labswiki and labstestwiki)
  • Config changes:
    • Remove hardcoded references to internal API LVS endpoint.
    • Removed references to unused parsoidcache.
    • Removed explicit config entry for labswiki (ParsoidConfig handles it now).

Monday, Feb 1, 2016 around 1:15 pm PT: 2fcc841f to be deployed Cancelled deploy to fix nowiki regressions[edit]

  • Warn if prefix/domain is not unique during configuration
  • Fix complex single-line nowiki tests
    • Can eliminate some spurious nowikis
    • But, can introduce spurious nowikis around [{{echo|foo}}] style wikitext -- 0.07% of pages in rt testing were affected, but with selective serialization, we expect impact to be small. We will consider possible solutions to minimize nowikis in this scenario, nevertheless.
  • task T115289: Disable migrateTrailingNLs if table has had content fostered out of it
  • Config changes: Remove hardcoded references to internal API LVS endpoint + removed references to unused parsoidcache.

Wednesday, Jan 20, 2016 around 1:45 pm PT: Yes Deployed f1ddfb88[edit]

  • task T122816: Record when a range is subsumed from overlapping
  • Temporarily disable the request timeout (since they don't abort request processing and cancel cpu timeouts as well)
  • Reduce cpu timeout value to 3 minutes

Monday, Jan 11, 2016 around 1:15 pm PT: Yes Deployed 07494cf2[edit]

wt2html

  • task T73154: Remove the vestiges of pipetrick entirely
  • task T114225, task T121611: Note that DOM tree building uses restrictive checks (documentation fix)
  • task T122054: Strip nowiki spans from templated / extension content
  • Match permitted attributes to php's getAttribsRegex

html2wt

  • Normalize DOM by stripping \u200e, \u200f next to category links (This is controlled by a config switch that we will turn on, if necessary)
  • Edge case fixes to serializing lists with templated portions

task T119883: Performance fixes (for large DOMs)

  • Use startsWith() instead of regex to match tag names in the DOM
  • Optimise shadow meta deletion
  • Bump domino to 1.0.21 (with performance fixes)

Other

  • task T55874: Add a generic extension registration mechanism
  • task T50891: Register ‎<translate> and ‎<tvar> natively
  • task T122062: Update SiteMatrix, another wiki created
  • task T121611: Use httpStatus instead of code as the property on errors