Deployment tooling/Notes/Deployment system requirements

Requirements for the next iterations of deployment tooling projects at WMF
Copy/paste requirements from the etherpad while table massaging...

Parsoid / Mathoid / Rashomon / PDF renderer deploys

 * MUST support quick downgrades
 * MUST support rolling upgrades (*not* all nodes at once)
 * MUST work well with Puppet-controlled config files -- ??
 * MUST make it easy to test packaging (init scripts, log rotation etc) outside the cluster -- debs? make scap/whatever be the thing that deploys beta cluster?
 * MUST finish a full upgrade in a timely manner (10 minutes)
 * MUST be robust and well-understood
 * SHOULD allow non-roots to upgrade / downgrade / restart individual nodes for testing (ex: permissions by group membership)
 * SHOULD support canary deploys (deploy to small group first, check if things are ok, then roll out to all)
 * SHOULD NOT create much additional overhead over normal debian packaging
 * SHOULD handle dependencies with system packages/libraries and other packages cleanly (including downgrades)
 * SHOULD be able to split packaging further (separate packages for library dependencies for example) -- support n dependencies
 * SHOULD use known systems / avoid "unnecessary" complexity wherever possible
 * SHOULD support eventual consistency of deployed versions
 * SHOULD make it easy for third parties to track regular (non-security) deployed code / be directly usable for third parties (Vagrant, labs, hosted VMs etc)
 * MUST support internal security deploys that are not immediately published for third parties
 * SHOULD support timely third-party security upgrades with systems like unattended-upgrades once the security release is published

ES deploys
Deploys are rare, less than once a month. Deploys are time consuming due to rolling restarts of a data store. We don't have many folks with the expertise to recover from unexpected failure. I think of it more like MySQL then like MW or Parsoid.
 * MUST allow user to force Elasticsearch to be installed at the version on the rest of the Elasticsearch cluster
 * work around reprepo bug :)
 * MUST configure Elasticsearch settings
 * yaml files be stuff'd
 * MUST be able to provision a new server without restarting current servers
 * targetted deploys, canary deploys
 * MUST be able to upgrade a single server at a time (updates have to roll through the servers one at a time)
 * rolling upgrades
 * MUST have documentation for use and maintenance
 * :) bug:1
 * MUST support deploying Elasticsearch plugins
 * MUST support ES plugin presence assurance
 * MUST verify that the Elasticsearch plugins are genuine (hash or something)
 * MUST NOT upgrade Elasticsearch without manual intervention
 * no "require => latest"
 * SHOULD be compatible with Elasticsearch's debian packages
 * SHOULD coordinate versions of Elasticsearch plugins deployed so they are compatible with Elasticsearch server
 * plugin compatibility matrix?
 * SHOULD NOT rely on SSH agent forwarding
 * SHOULD NOT allow plugin undeployment. That'd break things.
 * NONISSUE speed of execution doesn't really matter because the rolling restart proccess for Elasticsearch is already not quick
 * NONISSUE rollbacks aren't really possible with Elasticsearch so the deployment mechanism doesn't have to handle them gracefully
 * NONISSUE locks aren't important because very few people are conserned with upgrading Elasticsearch and the upgrades themselves are pretty rare
 * SHOULD in greg's opinion
 * NONISSUE requiring root on the target hosts isn't too big a problem because few people are deploying it and root is useful for those folks in case it breaks