Deployment tooling

= Sprint focus areas =

Deployment
Primary goals:
 * Maintain reasonable usability
 * Speed: no more than 10-15 minutes
 * Graceful handling of unresponsive Apaches
 * A workflow for security patches
 * Better alerting / monitoring
 * Smoke test
 * Usability: commands used should map to logical activities rather than minutia
 * Better SAL entries (include commit ranges)
 * Maybe an easy way to get diffs of what was deployed
 * Deal with umask and .bashrc insanity

ACTION ITEMS

 * Audit of salt scripts for completeness - (
 * Add rsync backend for Trebuchet -
 * Add submodule (and recursive submodule) support to Trebuchet
 * Put new Trebuchet frontend on labs (Ryan)
 * scap-recompile - (Aaron)
 * Enable Trebuchet logging to SAL/IRC
 * Fix the db migration from small to medium (
 * Integrate work from Joey H into Trebuchet (git corruption fixing )
 * Test Trebuchet on production to dummy dir, point a testwiki to it

Monitoring
Primary Goals:
 * We shouldn't find out that various parts of our infrastructure are down because of a failed browser test (that only happen twice/day).
 * Inform deployment rollback decisions based on pre/post deployment performance metrics

ACTION ITEMS

 * finish migration (puppetization) of graphite to eqiad (Ori)
 * Upgrade graphite (this will fix graph exceptions when a line has no data points)
 * Enable deploy markings (the verticle lines) on all graphs in graphite (Aaron)
 * BLOCKED on graphite migration
 * Document fatal and exception logging on the cluster
 * ✅ Install logstash in labs for testing
 * Brainstorm relevant monitoring/alerting metrics
 * Review of current metrics for alertable ones
 * Expose exceptions data (showing exceptions per file/extension)