Parsoid/Deployments

Planned deployments, linked from Deployments. For a list of past deployments, look for 'parsoid' in Server Admin Log.

See Parsoid to learn how to deploy a new version of Parsoid.


 * Deployments in 2013
 * Deployments in 2014

Monday, Dec 14, 2015 around 1:15 pm PT: ✅

 * Use babybird as the underlying Promise implementation
 * This is faster than the currently used implementation provided by core-js.
 * We have seen a 30% slowdown in WTS performance since the the async WTS version was deployed on Dec 9th.
 * Tweaks to the resource limits enforcing code

Friday, Dec 11, 2015 around 4:20pm PT: ✅

 * Introduce configurable wt2html/html2wt resource limits

Thursday, Dec 10, 2015 around 2:25pm PT: Config change deployed

 * Config change: reduce request time out to 3 mins (from 4 mins earlier)

Wednesday, Dec 9, 2015 around 1:35pm PT: ✅

 * : Record first wikitext node in multi-template-content-block scenario
 * : Fix html2wt newline constraints for paragraphs
 * : Refactor WTS to be async
 * Bunch of code cleanup:
 * Cleanup in WikiConfig and parser environment constructors
 * Cleanup list handing in the serializer

Monday, Dec 7, 2015 around 1:15pm PT: ✅

 * : Strip trailing &lt;nowiki /&gt;s
 * Update core-js to v1.2.6 and prfun to v2.1.2
 * Consolidate setting separators into a method to ensure consistent updates of SOL state

Wednesday, Nov 18, 2015 around 1:15 pm PT: ✅

 * : Log errors passed along in express
 * : Improvements to broken attribute parsing in self-closing tags
 * Non-functional changes
 * : Allocate native extension objects once per doc
 * Removed dead code (Remove unnecessary indent pre stripping for refs)

Monday, Nov 16, 2015 around 1:15 pm PT: ✅

 * : Support the newer scrub_wikitext form as well
 * : Strip  s from headings via new HTML normalization routine

Thursday, Nov 12, 2015 around 9:25 am PT: ✅

 * : Kill dead code + fix bad perf in pathological scenarios.

Wednesday, Nov 11, 2015 around 1:15 pm PT: ✅

 * Remove api/server.js symlink to bin/server.js (no longer needed since the puppet patch updating paths has been merged and deployed)
 * : Provide srcset attribute for images
 * : Optimize insertion of transclusion shadow metas -- these metas are added for detecting fostered content from transclusions. These set of patches greatly reduces the volume of these meta tags and improves performance on a subset of pages that would previously take too long and cause timeouts.
 * When a template range is expanded to include a table, expand it to include fostered content from it.
 * Code cleanup in template wrapping + removal of some potentially edge case bug scenarios.

Wednesday, Nov 4, 2015 around 1:15 pm PT: ✅

 * Reduce logging volume for empty/li entries + turn of logging for empty/tr entries
 * Put express in production mode by default (enables view caching)
 * Non-functional changes: Code cleanup of the wikitext serializer

Monday, Nov 2, 2015 around 1:25 pm PT: ✅

 * : Add ability to sample log requests
 * Log template names that produce stripped empty elements
 * Fix sol handling in separators
 * Update DU.hasDiffMarkers helper
 * Non-functional changes: Reorganization of the Parsoid code repo + code cleanup.

Monday, Oct 26, 2015 around 1:50 pm PT: ✅
Wikitext -> HTML fixes HTML -> Wikitext fixes Other fixes
 * DSR: Fix bugs in LTR propagation + fix buggy tests in DOMUtil helpers. This fix eliminates O(n^2) behavior in some cases.
 * Fix OOM issue: our old favourite (exp*)+ (cherry-picked to production on Oct 19)
 * An inline_break is a fine way to end a list
 * nowiki escaping: Reduce use of fullWrap scenarios
 * Remove forked _http_agent.js
 * Move stack suppression to the logger
 * Remove some dead code from parser.defines
 * Improve ApiRequest logging
 * : Graph worker exit code / signals (cherry-picked to production on Oct 19)

Monday, October 19, 2015 around 1:20 pm PT: cherrypicked b317f33f and 60a82ae0
These patches are being cherry-picked since master is not currently in deployable state.
 * : Fix out-of-memory parse errors on some pages (regression since deploy of 44d657de on Wednesday, August 26, 2015)
 * : Graph worker exit code / signals

Thursday, October 8, 2015 around 1pm PT: 998db843 to be deployed
Continues to be postponed since this deploy is dependent on a patch that needs review and testing. We have been backlogged because of parsing team offsite, vacations, quarterly planning and reviews. This should get unblocked this week.
 * : Serialize s on own line always (affects newly added categories, magic words, and <*include*> directives).

Thurday, October 1, 2015 around 1:30 pm PT: ✅

 * Set Main_Page as the default page name if none is provided in API requests.

Cuts down the errors showing up in kibana ... 100s of K errors in 3-4 bursts last 2 days.

Wednesday, September 30, 2015 around 1pm PT: ✅

 * : Support  parameter in v3 API.
 * Minor fix to WTS nowiki-ing of links whose hrefs could be magic links but whose content isn't appropriate.
 * : Terminate autolinks on double or triple quotes
 * : Terminate autolinks on &amp;nbsp; and numeric entity encodings of &lt;&gt;

Tuesday, September 29, 2015 around 9:15 am PT: Turned on use of ParsoidBatchAPI in production

 * Expected to reduce Parsoid's load on the Mediawiki API cluster
 * Expected to improve parse latencies
 * Improves image handling in some scenarios

Monday, September 28, 2015 around 1:45 pm PT: ✅

 * Update request to 2.63
 * : Fix batch retries
 * : Do not allow data-ooui attributes in wikitext
 * Turn on use of ParsoidBatchAPI in production
 * Expected to reduce Parsoid's load on the Mediawiki API cluster
 * Expected to improve parse latencies
 * Improves image handling in some scenarios

Wednesday, September 23, 2015 around 1:45 pm PT: ✅

 * Count non-200 http status codes in the API (will show up in grafana)
 * Log 4xx API responses in Kibana
 * : Render default part of parameters at the top level
 * A bit of bonus cleanup in the tokenizer
 * : Attempt to match tpl(arg) brace precedence
 * : Drop tags without attributes if scrubWikitext=true

Monday, September 21, 2015 around 1:25 pm PT: ✅

 * : Update parsoid sitematrix (et.wikimedia.org -> ee.wikimedia.org and other sitename updates)
 * , : Release version 0.4.1
 * : Use a timer to ensure forward progress in batched dispatches (fixes bug in use of batching API which is not enabled in production)
 * : Fix denial of client-side upscaling in thumb and frameless format (primarily related to batching API, but also some thumbnail scaling fixes in the non-batching API usecase)

Monday, September 14, 2015 around 1:15 pm PT: ✅
Bunch of edge-case tweaks and fixes to parsing of attributes in tables (rows, cells, table) -- improves compatibility with PHP parser output: Other wikitext -> HTML fixes: HTML -> wikitext fixes: Other fixes:
 * Pop tableCellArg before parsing template args
 * : Content on table start / row is all attributes
 * Remove single_cell_table_args
 * Match broken attribute parsing with the PHP parser
 * Handle broken_table_attribute_name_char in table_attributes (improves handling of broken table attributes, , )
 * TSP: Retokenize tokens that get converted to strings
 * Handle [ Foo ] and [[Foo ]] properly
 * Move popping EOFTk inside tokenizeStr
 * Nowiki escaping: Process multi-line text nodes line-by-line
 * Log the signal, if available, when a Parsoid worker exits
 * : Batching API use (not yet enabled in production): Fix totally broken interpretation of parse batch response

Wednesday, September 9, 2015 around 1:15pm PT: ✅
npm dependency tweaks to eliminate version variability in installed packages: Logging and error reporting fixes: Wikitext -> HTML fixes: HTML -> wikitext fixes (specifically nowiki escaping code): Other:
 * Shrinkwrap npm dependencies
 * Bump several dependencies to what's in production
 * Prefer tilde ranges in package.json
 * Downgrade duplicate id warnings
 * DOMDiff: Use more descriptive error prefixes
 * Improved Mediawiki API error reporting for ease of debugging
 * : Handle s in inline image captions
 * Fix logic in hasWikitextTokens when asking for linksOnly
 * : Update sitematrix.json for be-tarask and affcom wikis

Wednesday, September 2, 2015 around 1:15pm PT: ✅

 * : Massage batching API imageinfo width/height to numbers
 * Tabs are preventing nowiki pre protection
 * Consolidate test to determine if separator introduced SOL
 * Implement Sanitizer's escapeId

Monday, August 31, 2015 around 1:20pm PT: ✅

 * : WTS support for localized ISBN magic links
 * Be careful about using tsr in tokens/x-mediawiki phase
 * Don't ignore errors in extension parsing
 * : Support IPv6 addresses in URLs
 * Drop bad extension HTML and continue html2wt instead of returning HTTP 500
 * Let the OS randomize ports
 * Allow non-newline whitespace in RFC/PMID/ISBN autolinks

Wednesday, August 26, 2015 around 1:10 pm PT: ✅

 * Fix the profile quoting in our content type strings (currently in production via a cherry-picked deploy on Tue, Aug 25)
 * : Fix couple regexps in tokenizer
 * : Fix html2wt crasher on eswiki:Usme
 * : Fix pathological backtracking regexp
 * : Implement Parsoid v3 API (and add test suite)
 * Several cleanups and improvements to attribute parsing in the tokenizer
 * Improve broken attribute heuristics
 * Cleanup _att_value rules
 * Remove resetting the parse position
 * Move location of tokenizing tags in attributes

Tuesday, August 25, 2015 around 3:00pm PT: ✅ (cherry-pick of 437cac80)

 * Cherry-pick of https://gerrit.wikimedia.org/r/233835 in order to make the RESTBase test suite happy.

Tuesday, August 25, 2015 around 1:10 pm PT: ✅

 * : Upgrade express to 4.x from 2.x, use connect-busboy and upgrade other dependencies
 * Finish up fixing profile values in all content-type strings

Monday, August 24, 2015 around 1:15pm PT: ✅

 * serializeChildrenToString shouldn't clobber sol state
 * Allow configuration of the "domain" separate from the MW API URL
 * Deprecate "prefix" parameter of setMwApi/removeMwApi
 * Match separator heuristic to its description
 * Quote the profile in our content type strings

Thursday, August 20, 2015 around 1:30pm PT: ✅

 * : Fix crasher in normalizer
 * Use  for ISBN magic links
 * : Protect RFC/PMID/ISBN magic links with &lt;nowiki&gt; during WTS
 * Bracketed links must have at least one valid character after protocol
 * : Escape serialized nowiki DOM elements

Wednesday, August 19, 2015 around 1:15pm PT: ✅

 * Followup to T93580 fix: Save data-attribs in DOMs of nested refs (improves serialization and editablity)
 * : Fix buggy regexp in strip meta tags DOM pass
 * : Bare protocols are not autolinks
 * : Fix &lt;nowiki&gt; escape of | in image captions
 * , : Fix WTS of autolink-like text after [^W]
 * : Batch MW parser and imageinfo API requests (batching disabled currently -- will be enabled once the batching extension is deployed and we test latency impacts).
 * Code cleanup:
 * Remove special case in nowiki serializing
 * WikiConfig: remove dead code for hasValidProtocol / findValidProtocol
 * Convert bugzilla references in source code to phabricator references.
 * Documentation updates

Monday, August 17, 2015 around 1:15pm PT: ✅

 * : fix WTS of autolink-like text after [^W]
 * Allow ISBNs which end with a lowercase `x`
 * Support bitcoin:, redis:, urn:, xmpp:, etc protocols (part 2)
 * Newlines in html table attributes are valid
 * Normalizer: Tweaks to  escapable prefix normalization
 * Normalizer: Deal with "chameleon node" effect as in 7608aeab
 * WTS: Strip spans added for misnested a-tags
 * Other fixes: documentation, testing related code updates, code cleanup

Wednesday, August 12, 2015 around 1:40pm PT: ✅

 * ,, : Run normalization after dom-diff to handle edited content
 * Normalizer: Do not suppress numbered extlinks
 * : Unbreak Parsoid on wikitech

Monday, August 10, 2015 around 1:10pm PT: ✅

 * , : Parse non-block image caption all the way to Parsoid DOM
 * : Scrub empty anchors
 * Support bitcoin:, redis:, urn:, xmpp:, etc protocols
 * DOMDiff: Get rid of 'modified' diff marker - reduces dirty diffs by improving reusability of original wikitext during serialization.
 * HTML pres should permit newline attributes
 * : Disable pre_indent_in_tags rule for now
 * Check for null nodes in DOM helpers that test for node type

Wednesday, August 5, 2015 around 2:40pm PT: ✅

 * Check for null nodes in DOM helpers that test for node type -- should fix crashers on saves to VE edits that involved empty table cells.

Wednesday, August 5, 2015 around 1:25pm PT: ✅

 * : Add a space after the | char in table cells if it contains +/- as the first char (fix for new table cells only)
 * Normalize links that end in spaces to prevent nowikis
 * : Don't strip span tags in templated -attr scenarios

Monday, August 3, 2015 around 1:25pm PT: ✅

 * Enforce single-line context for definition lists
 * , : Additional scenarios dealing with treebuilder fixup
 * : &lt;nowiki&gt; tags don't properly protect table-related content
 * Remove smart nowikier
 * nowiki wrappers are now added around smallest string (instead of trying to minimize nowiki additions).
 * Addresses comments like this and others in the past.
 * Update sitematrix.json
 * Fetched latest changes in wiki configs - gom, lrc, azb wikis added + TLS added to most urls
 * Update domino to 1.0.19

Wednesday, July 29, 2015 around 1:30pm PT: ✅

 * Move sol transparent link hoisting behind scrubWikitext (since VE is now passing in that API flag)
 * Disable single-line wikitext mode in selser in the same places as in non-selser serialization
 * : Prevent nowiki protection around leading whitespace in paragraphs by deleting that whitespace.

Monday, July 27, 2015 around 1:30 pm PT: ✅

 * Bug fix stripping indent-pre nowikis in scrubWikitext mode

Wednesday, July 22, 2015 around 1:15pm PT: ✅

 * : Redirects no longer create categories
 * : Fix redirects to non-local targets
 * : Edited autolink-like text becomes an autolink
 * : Fix crash on __proto__
 * Escape data-mw as well as data-parsoid in tokenizer
 * Refactor comment regexp into a constant and reuse everywhere
 * Use the new fork of PEG.js master

No deployments week of July 13 - 17th
Parsoid deployments paused this week because of Wikimania. Only emergency cherry-picks, when required, this week.

Wednesday, July 8, 2015 around 1:25pm PT: ✅

 * Scrub empty styles tags (if scrubWikitext API param is enabled)
 * Scrub whitespace at the start of paragraphs (if scrubWikitext API param is enabled)
 * Disentangle versioned APIs
 * : Improve validating dp in the api
 * Remove old-style url redirects
 * Tweak td-fixup dom pass to handle some unhandled scenarios
 * Generate only for the final document

Monday, July 6, 2015 around 2:10pm PT: ✅

 * Bump HTML version because of cite html changes
 * Use CSS to style Cite references

Monday, June 29, 2015 around 1:20pm PT: ✅

 * Suppress newlines before category links + Don't swallow newlines & categories into last  of a list (Fixes, related to ).
 * : Serialize new display space hacks.

Monday, June 22, 2015 around 1:25pm PT: ✅

 * : Tokenizer incorrectly parses a "!!" inside a HTML cell as a
 * Newlines in comments shouldn't affect SOL state
 * Give nested blocks a chance to break on end delimiters
 * Only normalize new nodes
 * : Fix serialization of `mw:WikiLink` which use absolute URLs
 * : Include RL style modules from parser functions in
 * Use DOMTraverser instead of DOMUtils.traverseWithTplOrExtInfo
 * Further tests and fixes to SOL behavior switches
 * Make tokenizer errors be more vague
 * Remove the last use of peg$FAILED from the PEG grammar
 * Eliminate the possibility of expansion reuse for private routes

Wednesday, June 17, 2015 around 1:21pm PT: ✅
This is a repeat of Monday's postponed deploy.
 * Don't stop on "!!" in templates
 * More cleanup in the tokenizer
 * : Ignore marker meta tags during nowiki escaping
 * Refine DSR algo to use end-tag width info in the right context
 * Fix bug in computation of end-tag widths of wikitext constructs
 * Update sitematrix to include cnwikimedia
 * : Don't prevent fostering of meta tags in our DOM spec
 * : Return 400 if the passed in data-parsoid is empty

Monday, June 15, 2015 around 1:15pm PT: to be deployed 402ddf66 (cancelled)
We couldn't perform pre-deploy checks on the beta cluster since VisualEditor was broken there. Postponing deploy to Wednesday.
 * Don't stop on "!!" in templates
 * More cleanup in the tokenizer
 * : Ignore marker meta tags during nowiki escaping
 * Refine DSR algo to use end-tag width info in the right context
 * Fix bug in computation of end-tag widths of wikitext constructs
 * Update sitematrix to include cnwikimedia
 * : Don't prevent fostering of meta tags in our DOM spec
 * : Return 400 if the passed in data-parsoid is empty

Monday, June 8, 2015 around 1:15pm PT: ✅

 * More thorough job of stripping unneeded data-parsoid from templated content
 * Code cleanup and improvements in PEG tokenizer
 * Minor code refactoring in serializer and template encapsulation code

Saturday, June 6, 2015 around 4:40 PT: 5172a446 (cherry-pick of 719c736f) deployed as a hotfix

 * : Don't hoist category links out of headings when they come from templates

Wednesday, June 3, 2015 around 1:15pm PST: ✅

 * Be more careful about which MW API warnings we suppress
 * : Make behavior switches SOL transparent
 * API: If "wt" parameter is passed in, set it as the page source unconditionally
 * : DOM normalization: Move meta-tag hoisting from core serializer to DOM normalization pass
 * DOM normalization: Merge adjacent  tags with identical attrs

Monday, June 1, 2015 around 1pm PST: ✅
This is the same as the previous deploy attempt:
 * : Support subst: of transclusion blocks in the parseFragment API endpoint
 * DOMDiff: For  id properties in data-mw, fetch HTML and compare DOMs to detect edits to  s without requiring clients to dirty the   nodes
 * DOMDiff: Improve robustness of data-mw diff testing
 * Suppress separators in single-line context (part of )
 * , : Make hardcoded config values configurable
 * Blank template parameters should be preserved
 * Code cleanup in mediawiki wiki config
 * Use interwikiMap, not mwApiMap, to normalize titles
 * Store apiConf as an object in the mwApiMap
 * Make  into a general proxy configuration option
 * Code cleanup and fixes in mediawiki API request handling
 * Refactor request default options into ApiRequest.prototype.request
 * Strip UTF8 BOM so that JSON.parse doesn't throw
 * Ignore modulemessages in api=parse result

Plus two new cherry-picked patches:
 * : suppress modulemessages deprecation warnings in logs.
 * A new version of mediawiki core was deployed earlier in the day which caused a spike in these warning messages. With this patch, we are suppressing all warning/api messages.
 * Fix typo in config property used for sampling heap usage
 * this should fix the outgoing network spike seen in previous attempt.

Thursday, May 28, 2015 around 12:40pm PST: 497da30e to be deployed (Reverted)

 * : Support subst: of transclusion blocks in the parseFragment API endpoint
 * DOMDiff: For  id properties in data-mw, fetch HTML and compare DOMs to detect edits to  s without requiring clients to dirty the   nodes
 * DOMDiff: Improve robustness of data-mw diff testing
 * Suppress separators in single-line context (part of )
 * , : Make hardcoded config values configurable
 * Blank template parameters should be preserved
 * Code cleanup in mediawiki wiki config
 * Use interwikiMap, not mwApiMap, to normalize titles
 * Store apiConf as an object in the mwApiMap
 * Make  into a general proxy configuration option
 * Code cleanup and fixes in mediawiki API request handling
 * Refactor request default options into ApiRequest.prototype.request
 * Strip UTF8 BOM so that JSON.parse doesn't throw
 * Ignore modulemessages in api=parse result

This is the same as yesterday's attempted deploy, which we had to defer due to.

Reverted after observing an outgoing network traffic spike on our canary deploy machine (wtp1001). Suspected to be due to stats or logging misconfiguration. This is because of a typo in one of the parsoid-config properties that determines the heap usage sample interval. Because of the typo, instead of sending heap usage samples every 5 mins, parsoid was sending samples all the time. This caused the network spike seen on wtp1001.

Wednesday, May 27, 2015 around 1pm PST: 497da30e to be deployed (Cancelled)
Because of, we cannot currently test the deploy by looking at VE edits to see that we didn't break anything by examining wikitext diffs. Parsoid deploys are paused till that ticket is resolved and a patch is deployed to production.
 * : Support subst: of transclusion blocks in the parseFragment API endpoint
 * DOMDiff: For id properties in data-mw, fetch HTML and compare DOMs to detect edits to s without requiring clients to dirty the nodes
 * DOMDiff: Improve robustness of data-mw diff testing
 * Suppress separators in single-line context (part of )
 * , : Make hardcoded config values configurable
 * Blank template parameters should be preserved
 * Code cleanup in mediawiki wiki config
 * Use interwikiMap, not mwApiMap, to normalize titles
 * Store apiConf as an object in the mwApiMap
 * Make `proxy_strip_https` into a general proxy configuration option
 * Code cleanup and fixes in mediawiki API request handling
 * Refactor request default options into ApiRequest.prototype.request
 * Strip UTF8 BOM so that JSON.parse doesn't throw
 * Ignore modulemessages in api=parse result

Wednesday, May 20, 2015 around 1pm PST: ✅

 * : Add mw:DisplaySpace to typeof for nbsp before colon
 * : Provide section-offsets for immediate children of to support section editing in VE and other clients

Monday, May 18, 2015 around 1:10pm PST: ✅

 * : Put escaped HTML tags inside &lt;nowiki&gt;
 * : html2wt should not need access to original source
 * Restore speedy non-selser serialization
 * Don't use selser if oldid is missing

Wednesday, May 13, 2015 around 1:25pm PST: ✅

 * : Allow quotes as template targets
 * Normalize empty headings only if they are newly inserted content
 * A bunch of code cleanup patches (including some refactoring of server configuration)

Monday, May 4, 2015 11:44am PST: ✅

 * Avoid deep freezing some parsoidConfig properties
 * This patch prevents the bug that prevented Parsoid service from starting up in production causing a revert Wedneday, April 29
 * Ensure that embedded Maps and Sets are properly deep-frozen
 * Freeze parsoidConfig to avoid shared mutable state
 * Remove uri fallback when switching wiki configs

Wednesday, April 29, 2015 around 1pm PST: 45b54f63 to be deployed (Reverted)
See outage report for more details.
 * Freeze parsoidConfig to avoid shared mutable state
 * Remove uri fallback when switching wiki configs

Monday, April 27, 2015 around 1pm PST: ✅

 * : Forward the X-Request-ID header
 * : Exponentially increase the request timeout
 * Reduce API concurrency and retries (to deal with overload on API cluster)
 * Don't strip \r in API routes
 * Remove redundant \r handling
 * Upgrade to prfun 2.0.0 and smash the global Promise
 * Performance: Use core-js/shim instead of es6-shim
 * A lot of code cleanup
 * This includes bcea0ab0 which is a fix for the cleanup patch 915ea3f6 which was causing last week's corruptions.

Saturday, April 25, 2015 around 8:25 am PST: ✅ (cherry-pick of d2135c6b on parsoid master)
Cherry-picked "Reduce API concurrency and retries" from parsoid master to reduce # retries and concurrency level with which Mediawiki API is hit.

Friday, April 24, 2015 around 12:50 pm PST: Reverted deploy to 3311936a
Thursday late night deploy reverted due to corruptions reported.

See outage report for more details

Thursday, April 23, 2015 around 11:45pm PST: ✅
This was meant to be an emergency deploy of one patch but unintentionally deployed all changes from master.
 * Reduce API concurrency and retries (to deal with overload on API cluster)
 * Don't strip \r in API routes
 * Remove redundant \r handling
 * Upgrade to prfun 2.0.0 and smash the global Promise
 * Performance: Use core-js/shim instead of es6-shim
 * A lot of code cleanup

Wednesday, April 22, 2015 around 1:05pm PST: ✅

 * : Enforce for all lines when escaping wikitext
 * Fix base href on _rt routes
 * Accept scrubWikitext as a query parameter

Monday, April 20, 2015 around 1pm PST: ✅

 * : Suppress empty headings if scrubWikitext param is provided
 * Add a  param to the API to (optionally) apply normalizations that won't roundtrip
 * : Fix crasher seen in production
 * :  marker metas should remain fosterable
 * Log uncaught exceptions in Parsoid service
 * Edge case bug fix in migrateTrailingNLs DOM pass (for example, in en:SM U-66)
 * Other code cleanup that doesn't affect functionality

Wednesday, April 15, 2015 around 1:20pm PST: ✅

 * Bug fix serializing nested refs (would refuse to save because of missing content)
 * Bug fix in selser tests that sometimes normalized element attributes unnecessarily
 * Handle empty content string ("") returned by the API
 * Normalize DOM before running DOM-Diff
 * Fix findFirstEncapsulationWrapperNode -- eliminates dirty diffs in some edge case scenarios
 * Other code cleanup that doesn't affect functionality

Monday, April 13, 2015 around 1pm PST: 8f35374d (skipped)
Deploy postponed because beta cluster is down and it is not possible to verify this in beta cluster beforehand.
 * Bug fix serializing nested refs (would refuse to save because of missing content)
 * Bug fix in selser tests that sometimes normalized element attributes unnecessarily
 * Handle empty content string ("") returned by the API
 * Normalize DOM before running DOM-Diff
 * Other code cleanup that doesn't affect functionality

Wednesday, April 8, 2015 around 1pm PST: ✅

 * :  tags with invalid hrefs should serialize to text
 * : use HTML entities to encode/decode arbitrary data in comments (see also ).
 * : Switch from TXStatsD to statsite metrics.

Other changes:
 * Various code style tweaks and clean ups.

Monday, April 6, 2015 around 1pm PST: ✅

 * : Normalize comments so that Parsoid output is valid XML
 * Edge-case fix for hoisting embedded s from headings
 * : Preserve querystring params while redirecting
 * Don't serialize  tags as  ever
 * : Remove state from Cite extension
 * Log with supplied x-request-id header

Monday, Mar 30, 2015 around 1pm PST: ✅

 * Skip link validity tests for strings that won't be used as hrefs: Eliminates erroneous "bad title text" logging messages
 * cleanupAndSaveDataParsoid should be done in its own pass: Fixes incorrect HTML generated in -hack scenarios when v2 API is used
 * Replace duplicate ids in wikitext: Allows Parsoid to handle pages with duplicated ids without corrupting them on serialization ( is an instance of this)
 * : Never serialize a-tag as html
 * : Add original dimension information for images.
 * : Normalize wikilink targets to strip leading "./"
 * , : Ensure reference index is reset at the end of document
 * Use tokenizer info to fix/cleanup tdFixups dom pass

Wednesday, Mar 25, 2015 around 1pm PST: ✅

 * : Pop comments from the end of table tag attributes
 * Strip out X-Parsoid-Performance headers and associated code -- no longer useful since Parsoid now sends lots of metrics to statsd
 * Bug fix setting TSR in defn lists - fixes DSR inconsistency warnings
 * : Nulls in DSR computation should not be coerced to 0
 * Edge case fix for definition lists: Only return colon when ignoring in tags
 * Associate data-parsoid with duplicated ids (copy-paste in VE can introduce duplicate element ids)

Monday, Mar 23, 2015 around 1:25pm PST: ✅

 * : Fix tokenizing redirect context
 * Use more specific warning labels to help sift through logs in Kibana
 * Use  instead of   when we can't serialize a   ( https://gerrit.wikimedia.org/r/198176 ).  This should send a 500 response, not kill the entire worker.

Thursday, Mar 19, 2015 around 6:45 pm PST: ✅

 * : Don't strip id attributes from DOM nodes -- required for tags
 * : Serialize category redirects with a ':'
 * Additional logging to help debug Visual Editor issues

Thursday, Mar 19, 2015 around 9 am PST: ✅

 * : Abort html -> wt serialization when we encounter a DOM id without a matching DOM element
 * Log errors when Parsoid-like element ids are stripped from HTML elements

Wednesday, Mar 18, 2015 around 1pm PST: ✅

 * Don't serialize HTML id attributes with Parsoid-like elt ids
 * : Ensure that alt image option is handled properly even when it has complex wikitext
 * v2 API: Explicitly set a utf-8 charset in text content-types

Monday, Mar 16, 2015 around 1pm PST: ✅

 * , : Handle entities/nowikis in templated attributes
 * : Enforce single-line context in the serializer
 * : Table cells not properly parsed in an implicit-td context
 * : Improve escaping and nowikiing of template arguments
 * Additional fixes to selective serializer around reusing original source in lists and list items
 * Additional instrumentation (input/output sizes, init times) of Parsoid endpoints

Wednesday, Mar 11, 2015 around 1pm PST: ✅

 * : Fix serialization of table cells with "-" and "+" in them
 * : Convert | to | in template parameters
 * : Eliminate fatal assertion failures seen in production (found on kibana)
 * : Improvements to &lt;nowiki&gt; wrapping for strings that needed them
 * Fixes to DSR computation algo to eliminate negative DSR deltas (should eliminate the warnings seen in kibana)
 * Updated sitematrix.json to latest changes
 * Explicitly pass rawcontinue=1 to the Mediawiki API (to eliminate deprecation warnings logged on the M/W API end)
 * Log mediawiki API warnings (so we can find and fix API deprecations in future)

Monday, Mar 9, 2015 around 1pm PST: ✅

 * , : Several fixes to serialization of lists
 * : Change how LST s are output

Wednesday, Mar 4, 2015 around 1pm PST: ✅

 * : Fix selser bugs that would occasionally lose newly added comments
 * : Fix broken serialization in some scenarios after table columns are deleted
 * Fix broken performance timer code (broken in Monday, Mar 2, deploy)

Monday, Mar 2, 2015 around 1:15pm PST: ✅

 * : Remove duplication of content in the data-mw.body.html property of tags
 * : Remove more cases of data-parsoid.src from mw:Extensions
 * Memory usage reports are now generated once every 5 mins and sent to the statsd server

Wednesday, Feb 25, 2015 around 1:00 pm PST: ✅

 * Serialize new anchor links (w/o rel) as internal
 * Amend timing metrics
 * : Open tags only affect line when parsing definition list colon
 * : Fix nowiki escaping for &lt;td&gt;

Monday, Feb 23, 2015 around 1pm PST: ✅

 * Workaround for . (Will be reverted once citoid bug is fixed: .)
 * : Ensure that implicitly-added output have unique ids
 * Fix serializing categories without indent-pre protection (tweaks #REDIRECT handling as well)
 * Don't crash when revision is hidden
 * : Handle more templated  -attr scenarios
 * Tweaked naming of selser-related timing stats.
 * Enable timing stats in production (localsettings.js change).

Wednesday, Feb 18, 2015 around 1:30 pm PST: ✅

 * : Emit reflists for with no explicit
 * , : Enable timing stats for Parsoid wt2html and html2wt requests
 * not yet enabled in production (requires change to localsettings.js)

Monday, Feb 16, 2015 around 1pm PST: ✅

 * : Handle template-generated DISPLAYTITLE and DEFAULTSORT
 * : Fix selser regression introduced on Feb 11 deploy
 * : Fix selser in v2 API (to be used by RESTbase)
 * For older MW APIs that doesn't provide that information, default to cached enwiki config for supported link protocols

Wednesday, Feb 11, 2015, around 1:35pm PST: ✅

 * : Remove data-parsoid.src for elts with valid data-mw and DSR info
 * : Remove unnecessary transclusion tags
 * Fixes to handle high load on Parsoid cluster
 * Don't reprocess same token in AttributeExpander unless necessary (eliminates infinite loop scenarios found on some pages)
 * Fixes to make sure fatal errors more consistently force process restarts without leaving behind stuck processes
 * Categories on their own line don't need nowikis around any leading whitespace
 * Non-word characters shouldn't terminate tag names
 * : Hoist categories, language links, redirects, comments out of headings when serializing them
 * : Fix serializing new links with "./" in content string

Monday, Feb 9, 2015, around 1pm PST: dd98dea0 to be deployed (Cancelled)

 * : Remove data-parsoid.src for elts with valid data-mw and DSR info
 * : Remove unnecessary transclusion tags
 * Fixes to handle high load on Parsoid cluster
 * Don't reprocess same token in AttributeExpander unless necessary (eliminates infinite loop scenarios found on some pages)
 * Fixes to make sure fatal errors more consistently force process restarts without leaving behind stuck processes
 * Categories on their own line don't need nowikis around any leading whitespace
 * Non-word characters shouldn't terminate tag names
 * : Hoist categories, language links, redirects, comments out of headings when serializing them

Deployment cancelled. We found some regressions and the fixes for them are still going through round trip testing at this time. So, we'll get these out on Wednesday.

Friday, Feb 6, 2015, around 9:10pm PST: Hotfix of cherry-pick
Jan 28, 2015 deploy of exposed a longstanding bug in Parsoid which was fixed by. On some pages, due to where some templates weren't being expanded, the Attribute Expander was effectively being asked to re-expand the template over and over again in an infinite loop. This was being triggered on a few enwiki pages today that was causing processes to get stuck without being restarted. This hotfix prevents the infinite loop.

Friday, Feb 6, 2015, around 11:20am PST: Hotfix of cherry-pick
A bug in our process restart (on fatal errors) was exposed by unrelated bugs in our parse pipeline which manifested as stuck processes on the cluster. This hotfix fixes that by ensuring that fatals continue to restart processes.

Wednesday, Feb 4, 2015, around 1pm PST: ✅

 * Switch to using the compression package instead of the outdated version bundled with connect. In testing, that cleared up the memory leak noticed since the Jan. 15th deploy.
 * Some cleanup including:
 * Changing a few on handlers to once.
 * Using request's qs option for apiargs instead of stringifying those manually.
 * Better error handling for config requests.

Monday, Feb 2, 2015, around 1pm PST: ✅

 * Set X-Forwarded-Proto when proxying https. This fixes timeouts for ruwikinews which is strict about accepting only https connections.
 * Some performance tweaks to attribute expander to eliminate useless work and useless memory allocation
 * : Fixes to resource module loading URI in the section of Parsoid HTML
 * Fixes to tokenizer to ensure that strings starting with '-' are parsed for directives like language variant markup

Friday, Jan 30, 2015 around 2:35 pm PST: ✅
The Jan 15th deploy where Parsoid started using sitematrix info for configuring wikis was missing special handling for some wikis (commonswiki being one of them). This caused timeouts which in turn repeatedly exercised an existing memory leak. This, in turn, caused a slow buildup of leaked memory on the cluster and a higher than normal cpu load. This special Friday deploy fixed the config issues.

Specifically, the following two patches were deployed:
 * Some special wikis should use the default proxy
 * Strip TLS from sitematrix url if we're using the default proxy

Wednesday, Jan 28, 2015 around 1pm PST: ✅

 * : Correctly handle templates that generate part-attribute and part-content of a DOM node.
 * : Preserve blank template parameters
 * : Cleanup of behavior switch production
 * Updates to wikitext serializer to simplify and enable more robust wikitext escaping
 * ,, : Magic link fixes (wt2html and html2wt nowiki handling)

Thursday, Jan 15, 2015 around 1pm PST: ✅

 * ,, : Default WMF wikis served by Parsoid fetched from sitematrix API call
 * : Positional params with = in extlink are serialized as named parameters

On Jan 14th 1:20 pm PST, we reverted Parsoid to older deployed version after dirty diffs were seen during post-deploy testing. It turned out that the dirty diffs weren't related to the Parsoid deploy, but now that those issues have been fixed, we'll revisit the Parsoid deployment on Thursday.

Monday, Jan 12, 2015 around 1pm PST: ✅

 * Include location of titles in timeout logs
 * Tweaks to Parsoid's cite port to generate identical ref ids as native cite implementation

Wednesday, Jan 7, 2015 around 1pm PST: ✅

 * : Improved handling of extremely large lists -- fixes the load issues seen in production on Jan 3rd
 * Removed hardcoded HTTP 500 response for urwiki:نام_مقامات_اے (deployed on Jan 3rd to prevent this page from overloading the cluster)
 * : Fix edge case tokenizing table lines

Monday, Jan 5, 2015 around 2pm PST: ✅
Wikitext -> HTML HTML -> Wikitext Other (API, logging, etc)
 * : data-parsoid stripped from template content
 * : Context-aware parsing of definition list colon
 * : Parse extension parameters as plain text
 * : Stray is parsed to meta
 * Marginal improvement parsing templates in definition lists
 * , : Several improvements and fixes to nowiki protection for quotes
 * Other improvements and bug fixes to nowiki protection in headings, lists, tables.
 * : Insert an extra newline after new content and existing headings
 * Add logging for html2wt API endpoints
 * Fix robots.txt route
 * Send SIGKILL to kill a timed out worker
 * : API v2 parsing and serialization routes