Deployment tooling/Cabal/2015-06-15

From mediawiki.org

June 15th[edit]

  • Big goal for next quarter (discuss, change, etc):

    ** deploy services! (maybe pick one to be a focus—restbase?)       *** current RESTBase deployment workflow: https://wikitech.wikimedia.org/wiki/RESTBase       *** general service deployment workflow: https://wikitech.wikimedia.org/wiki/User:Mobrovac/Service_Deployment     ** should allow batches (specify via config or at runtime)     ** should run checks (what do those look like?)     ** should roll back     ** should get running in deployment-prep by quarter's end (no explicit dependencies on ops—ops feedback throughout [obvs])     ** keep deployment cabal group running as a means of sanity checks     ** RelEng code: Tyler, Chad, Mukunda—Dan to facilitate discussion

  • Run down features that from scap/trebuchet tickets that we may want to move
  • Consolidate meta-tickets

Modularity[edit]

  • Transport mechanism
  • Version to deploy
    • meaningful tags
    • list different deploy versions
  • Signaling restart
    • `service` command
    • HUP
  • Testing at the end

Versioning[edit]

  • Services use semantic versioning, but not for deployments.
  • There is a task for making mediawiki follow semantic versioning as well.
  • It would be nice to use a standard versioning scheme, and some naming conventions for deployment tags, rather than long numeric deployment numbers like we have in trebuchet.
  • for phabricator I use a date based deployment tag like  release/2015-06-10/1  where the /1 is a revision number,  for hotfixes you just increment the tag

concerns[edit]

  • SSH for each host
  • Public key deploy
  • Sudoers roles
    • troubleshooting deploys requiring escalation
    • service user needs read/write (possibly)

interface[edit]

  • Tmux—lotsa feedback
  • Ability to abort at any point
  • Watching logs/backend
  • start from alternative interface, attach if problems
  • locking mechanism per repo (possibly global, not neccessarily)
  • single point of updates, multiple consumers (e.g. redis consumed by web page and by commandline)

TODO[edit]

1. Conversation with ops about ssh shared user (mwdeploy, whatever) 2. Regroup with RelEng figure out timelines 3. Granularity of ssh control