Deployment tooling/Notes/Deployment system requirements
This page is currently a draft.
More information and discussion about changes to this draft may be on the discussion page.
Requirements for the next iterations of deployment tooling projects at WMF
|provision artifacts on multiple servers comprising a cluster||MUST||MUST||MUST|
|manage publication of at least 2 versions of mediawiki-core||MUST||NON-ISSUE||NON-ISSUE|
|keep old versions of mediawiki-core around to support bits servers (30 days?)||MUST||NON-ISSUE||NON-ISSUE|
|manage l10n cache files on target servers||MUST||NON-ISSUE||NON-ISSUE|
|support publication of patches to internal servers that cannot be shared publicly||MUST||MUST||MUST|
|complete in a timely manner ||MUST||MUST||NON-ISSUE|
|Time-to-complete for a deploy is proportional to magnitude of change||MUST||MUST||NON-ISSUE|
|be able to update/provision a new server joining the cluster without forcing update of all other servers (server pull)||MUST||MUST||MUST|
|handle rollbacks easily||MUST||MUST||NON-ISSUE|
|handle downgrades (of packages) easily||MUST||MUST (including dependencies)||NON-ISSUE|
|have equal or greater atomicity to rsync --delay-updates||MUST||?||?|
|be able to lock (to prevent folks overlapping)||MUST||MUST||SHOULD|
|support multiple datacenters||MUST||SHOULD||SHOULD|
|have documentation for use and maintenance||MUST||MUST||MUST|
|Handle security patches cleanly but separate from released versions||MUST||MUST||MUST|
|support eventual consistency (servers update to latest version on reboot)||MUST||SHOULD||?|
|work well with Puppet-controlled config files (especially for ES and Parsoid)||MUST||MUST||MUST|
|support a variable number of versions||SHOULD||?||?|
|allow generation of caches and other content post-deploy on target servers||SHOULD||?||NON-ISSUE|
|prevent concurrent modification of source||SHOULD||?||?|
|track versions installed on each target host||SHOULD||MUST||MUST|
|record errors and informational events publicly||SHOULD||SHOULD||SHOULD|
|record errors and informational events durably for use in troubleshooting||SHOULD||SHOULD||SHOULD|
|allow "canary"/rolling deploys where a sub-set of the cluster is updated||SHOULD||MUST||MUST|
|allow multiple production clusters (privates vs non-private, and wikitech.wmf getting out of date)||SHOULD||SHOULD||SHOULD|
|rely on SSH agent forwarding||SHOULD NOT||SHOULD NOT||SHOULD NOT|
|require root on the target hosts (privledge separation/access control)||SHOULD NOT||SHOULD NOT||NON-ISSUE|
|allow multi-masters (especially for multi-datacenter)||SHOULD||SHOULD||SHOULD|
|be easily auditable (e.g. verifying the git commit hash on the deploy host)||SHOULD||SHOULD||SHOULD|
- MW: 10 mins basic, 30 full i10n included
Parsoid: 10 min
Copy/paste requirements from the etherpad while table massaging...
Parsoid / Mathoid / Rashomon / PDF renderer deploys
- MUST make it easy to test packaging (init scripts, log rotation etc) outside the cluster -- debs? make scap/whatever be the thing that deploys beta cluster?
- MUST use rolling deploys: not all nodes at once to avoid taking down the cluster
- SHOULD allow non-roots to upgrade / downgrade / restart individual nodes for testing (ex: permissions by group membership)
- SHOULD NOT create much additional overhead over normal debian packaging
- SHOULD handle dependencies with system packages/libraries and other packages cleanly (including downgrades)
- SHOULD be able to split packaging further (separate packages for library dependencies for example) -- support n versioned dependencies
- SHOULD use known systems / avoid "unnecessary" complexity wherever possible
- SHOULD make it easy for third parties to track regular (non-security) deployed code / be directly usable for third parties (Vagrant, labs, hosted VMs etc)
- SHOULD support timely third-party security upgrades with systems like unattended-upgrades once the security release is published
Deploys are rare, less than once a month. Deploys are time consuming due to rolling restarts of a data store. We don't have many folks with the expertise to recover from unexpected failure. I think of it more like MySQL then like MW or Parsoid.
- MUST allow user to force Elasticsearch to be installed at the version on the rest of the Elasticsearch cluster
- work around reprepo bug :)
- MUST configure Elasticsearch settings
- yaml files be stuff'd
- MUST be able to provision a new server without restarting current servers
- targetted deploys, canary deploys
- MUST support deploying Elasticsearch plugins
- MUST support ES plugin presence assurance
- MUST verify that the Elasticsearch plugins are genuine (hash or something)
- MUST NOT upgrade Elasticsearch without manual intervention
- no "require => latest"
- SHOULD be compatible with Elasticsearch's debian packages
- SHOULD coordinate versions of Elasticsearch plugins deployed so they are compatible with Elasticsearch server
- plugin compatibility matrix?
- SHOULD NOT allow plugin undeployment. That'd break things.
- NONISSUE locks aren't important because very few people are conserned with upgrading Elasticsearch and the upgrades themselves are pretty rare