Deployment tooling/Future

The current state of deployment tooling is there are currently two competing tools:




 * 1) Trebuchet A salt and git-based deployment mechanism
 * 2) Scap A python-based deployment mechanism which handles many MediaWiki-specific tasks as well as deploys Mediawiki

The future state is to create one deploy tool to rule them all.

So the broader picture is that RelEng, Services, and some Opsen (toward the end of the quarter) have been having meetings about the future of all deployment-tooling; taking inventory of the current state and discussing possible improvements.

I can hopefully distill the findings of the quarter below:

Problems

 * MediaWiki deploy has no way to to preform follow-up actions on hosts (restart service, check service health, batches, etc—although this may not be true in a week)
 * Trebuchet has had issues (like T63882) that are mostly salt-related that has led to fragmentation of the deployment process via git-deploy (read: services does kinda their own thing, cf. this and this)
 * Having two deployment systems is more to maintain and more to know.

Ideas/Discussion that led us here

 * Ideally, deployment tooling would be combined and modernized—to that end many graphs were created demonstrating that our deployment processes were not too dissimilar.
 * Trebuchet has a mechanism to perform followup-tasks on each deployment target, by way of running a custom salt execution module post-fetch and post-checkout. And (if you specify a service name in the `repo_config` pillar) is capable of restarting a service—close to what we want for MediaWiki deploy
 * Deploying MediaWiki via Trebuchet would require some significant work to both Trebuchet and infrastructure—Trebuchet relies on each node pulling code from a central git server (tin). Having evidently tried deploying MediaWiki via Trebuchet in the past, a fan-out of the git repo to proxy git servers would be needed to deploy via Trebuchet.
 * While Trebuchet, seemingly, has all the features required of a modern deployment system, in practice salt has had some issues (T102808) that make us reluctant to move forward with it.
 * None of the deployment systems are perfect. Trebuchet is pretty close to what we want, but the problems with salt have made it a difficult system with which to work. Scap doesn't do 100% of what we want it to do, but it's reliable and works for MediaWiki's scale.
 * Futher ideas and discussions on phabricator T101023

Next steps
Let's build a deployment system!

Instead of trying to build a deployment system that is perfect and works for everything (a seemingly impossible task), let's build a deployment system that is modular and test it with a single use-case initially (Create new RESTBase deploy method (tracking)). The initial narrow focus allows work to progress more quickly since it tightens the testing feedback loop.

To avoid falling into the trap of many competing standards https://xkcd.com/927/ we've attempted/are attempting to do the following:


 * 1) ✅ Gather requirements most deployment use-cases of Trebuchet and Scap - cf. T97068
 * 2) ✅ Identify areas of overlap and divergence in Scap and Trebuchet
 * 3) Build a modular and modern deployment system, test against RESTBase
 * 4) After initial success with RESTBase, quickly work to expand the system to cover all current deployment uses.