Deployment tooling/Cabal/2015-06-15

June 15th
** deploy services! (maybe pick one to be a focus—restbase?) *** current RESTBase deployment workflow: https://wikitech.wikimedia.org/wiki/RESTBase *** general service deployment workflow: https://wikitech.wikimedia.org/wiki/User:Mobrovac/Service_Deployment ** should allow batches (specify via config or at runtime) ** should run checks (what do those look like?) ** should roll back ** should get running in deployment-prep by quarter's end (no explicit dependencies on ops—ops feedback throughout [obvs]) ** keep deployment cabal group running as a means of sanity checks ** RelEng code: Tyler, Chad, Mukunda—Dan to facilitate discussion
 * Big goal for next quarter (discuss, change, etc):
 * Run down features that from scap/trebuchet tickets that we may want to move
 * Consolidate meta-tickets


 * https://phabricator.wikimedia.org/T101022

Modularity

 * Transport mechanism
 * Version to deploy
 * meaningful tags
 * list different deploy versions
 * Signaling restart
 * `service` command
 * HUP
 * Testing at the end

Versioning

 * Services use semantic versioning, but not for deployments.
 * There is a task for making mediawiki follow semantic versioning as well.
 * It would be nice to use a standard versioning scheme, and some naming conventions for deployment tags, rather than long numeric deployment numbers like we have in trebuchet.
 * for phabricator I use a date based deployment tag like  release/2015-06-10/1  where the /1 is a revision number,  for hotfixes you just increment the tag

concerns

 * SSH for each host
 * Public key deploy
 * Sudoers roles
 * troubleshooting deploys requiring escalation
 * service user needs read/write (possibly)


 * https://phabricator.wikimedia.org/T101024

interface

 * Tmux—lotsa feedback
 * Ability to abort at any point
 * Watching logs/backend
 * start from alternative interface, attach if problems
 * locking mechanism per repo (possibly global, not neccessarily)
 * single point of updates, multiple consumers (e.g. redis consumed by web page and by commandline)

TODO
1. Conversation with ops about ssh shared user (mwdeploy, whatever) 2. Regroup with RelEng figure out timelines 3. Granularity of ssh control