Wikimedia Release Engineering Team/Book club/Continuous Delivery

= First meeting =
 * Discussion: 2019-03-06 17:00 UTC
 * See also: Martin Fowler doing a 20 minute presentation on the topic: https://www.youtube.com/watch?v=aoMfbgF2D_4

Personal notes

 * Lars's notes: https://files.liw.fi/temp/cdbook.mdwn
 * thcipriani really long notes: https://people.wikimedia.org/~thcipriani/continuous-delivery-book/continuous-delivery-book.html

Preface

 * “How long would it take your organization to deploy a change that involves just one single line of code? Do you do this on a repeatable, reliable basis?” p-xxiii
 * JR: I'm working form the premis that I'm reading this book to learn something. To that end, should we be looking to ID "nuggets" of knowledge from each chapter and perhaps how we might implement that new learning?

Chapter 1: The problem of delivering software

 * Greg: single button deploy is a laudable goal, but I wonder how often that is reality?
 * Jeena: I wonder if they are talking about an "environment" that's already setup, so a single click may be less impossible
 * Brennen: I think it's possible to get pretty close to a single-button spin-up env
 * liw: for personal needs I use VMs rented from cloud providers, and have done that in a previous job, setting up DNS and env in one command, deploying software is 2nd command, but it could be one command. It took a lot of work to get this done. Virtual Machines make this a lot easier because you don't need to assemble a new computer to setup a new computer.
 * Tyler: We do configuration management entirely wrong. eg: We've had a db since 2001, so all of our puppet config assumes there is a db somewhere. Does this prevent us from spinning up an env with one click? Will that prevent us from doing the Deployment Pipeline?
 * Jeena: not sure it would prevent us, we can always try out things outside of production until we're satisfied that it's good enough then migrate
 * Brennen: agree we probably do everything wrong, but it's not impossible to get better (he says with a smile)
 * Lars: nothing here in current things/legacy that prevent us from building a deployment pipeline, maybe more work and not as clean as we'd like, but it shouldn't prevent us.
 * Lars: we do do everything wrong, including PHP use :), we also do many things right, we use VCS
 * Lars: compared to the world the book describes 10 years ago, we're light years ahead. Even at orgs with massively larger budgets, they have it worse. In 2019 they have people still editing live files in production.
 * Antoine: The book is too idealistic.
 * "Antipattern: Deploying Software Manually" p5
 * "Releasing software should be easy. " p24

Chapter 2: Configuration management

 * Tyler: We don't actually keep everything in VCS. MW as deployed currently, there is no exact version of what is deployed. Lots in .gitignore. Lots generated at deploy time. You can't spin up a deploy server at the current state it's in. There's maybe a dozen people who could rebuild a deploy server from scratch. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 * Tyler: our dev envs aren't able to be "production-like" because production-like in our case means build by hand <--- THIS
 * "It should always be cheaper to create a new environment than to repair an old one." p50
 * Lars: I agree when the book says that if something is painful, it should be done more often. Then you get rid of the pain. If the deployment server is hard to rebuild, then we should throw it away and rebuild it for every deployment
 * Lars: reproducible builds. See the Debian project where they rebuild everything twice and if they're not identical it reports an error. That is unrealistic for us at this time. Maybe wait until Debian has solved this problem for us
 * https://wiki.debian.org/ReproducibleBuilds https://reproducible-builds.org/
 * CI view/dashboard https://tests.reproducible-builds.org/debian/reproducible.html
 * Brennen: good for some artifacts, but for some of what we're going to do maybe you can't have that requirement
 * Tyler: one of the benefits of the pipeline is that right now we don't have confidence in third-party dependencies, to build that confidence we wanted a progressive pipeline of tests to verify.
 * Lars: even if it's reproducible, the source code might be wrong
 * Tyler: Visibility into environments. Do we have a good visibility of versions deployed into deployment-prep? "Dont' break the build == don't break Beta".
 * Brennen: long timers know where to look. I can tell info is available, but I have no idea where they are. We could work on discoverability/front and centerness for the holistic overview.
 * Greg: Dashboards dashboards dashboards.
 * Tyler: maybe too many dashboards
 * Greg: yes, we should make a better clearing house.
 * Brennen: if it's automated then it doesn't need documentation, wtf?!
 * Lars: I agree with the book, conditionally. Well written scripts can be documentation of how things are done. Most manual process descriptions are better shown as well written scripts. Checklists get out of date etc. Worse than not having a bad script.
 * Zeljko: especially true with installation/setup. There's no documenation for how Docker or Vagrant does that, the script is the documentation.
 * Brennen: true, but that doesn't mean we should document how the sytems work. Even with self-documenting scripts, they sometimes fail and then you can't figure it out unless you already know how it's supposed to work. Context should be available to explain what's going on.
 * Jeena: I can't think of many times I'm looking through the code and think it's good documenation for itself.
 * JR: documentation should explain conceptually what it's trying to accomplish, not the step by step. Balance. Documentation is important.
 * Tyler: code tells you "how" not "why"
 * Lars: we need testable documentation.

Chapter 3: Continuous Integration
TODO: Look into speed of tests and potentially failing.
 * Jeena: you should never break the build when you check in to master (as opposed to their statement "just fix it after")
 * Greg: Good emphasis on various statges of testing, etc. Keeping commit stage tests to less than 10 min is a great goal.  Also make a point to continually analyze how long tests are taking and optimize so your devs don't get annoyed.  These are things we should be aware of / exposing.
 * Greg: Only in exceptional circumstances should you use shared environments for development.
 * Brennen: the best env I ever used was a bunch of VMs on a box, which is better than any local dev env I've ever given
 * Jeena: I don't know why they'd expect a separate db/etc for everyone to be realistic
 * Lars: I think the background is if there's a db then it's too easy for one dev to break it for everyone else.
 * Lars: not opposed to having them, but understand where the book is coming from
 * Tyler: "is it beta?" Developers generally know when what they're doing is going to impact others (eg: destroying a db). If they do that then just make a copy for themselves to play with (paraphrased)
 * Brennen: "build scripts should be treated like your code base". Deployment is a process on a wiki page, we all know its bad, the book keeps reminding us about this :)
 * Lars: it could be so much worse :)
 * Jeena: blamey culture "know who broke the build and why"
 * Zeljko: if things break, everyone's highest priority is to fix it
 * Jeena: could be mitigated by gating and not merging to master
 * Zeljko: I think the book is in an SVN world, not git
 * Lars: hints of blaming who made a mistake, which is an anti-pattern. Makes people really careful of making changes.
 * Greg: "Get your build process to perform analysis of your source code (test coverage, code duplication, adherence to coding standards, cyclomatic complexity, etc) "
 * Tyler: don't feel like the authors are doing a great job of highlighting the benefits of doing things. Good at highlighting best practices. Example is commit messages. You need the WHY in a commit message, not just the what. I can see the what in the code. Not to blame, but so I can understand.
 * Mukunda: Phab/Differential takes it a few steps beyond. The default config requires a test plan (verbal evidence that you tested) and revert plan.
 * Greg: Fail commit tests if they take too long?
 * JR: Z and I were talking about this earlier. Good idea to investigate.
 * Zeljko: some places don't use commit vs integration but instead fast vs slow.
 * JR: Google small, medium, large
 * Lars: speed is a good indicator, but there's other aspects like be able to be run without external things (db, fs, network). Can be enforced if necessary.
 * Lars: if we have a specific unit test phase, we should fail it if it takes too long even if all the tests pass. If the machine happens to be slow then too bad. I did this with my python unit test runner, it can be set to do this.
 * Tyler: We fail after 45 minutes.
 * commit = test stage
 * acceptance? == gate and submit
 * Zeljko: "Everyone associated with development—project managers, analysts, developers, testers—should have access to, and be accessible to, everyone else on IM and VoIP. It is essential for the smooth running of the delivery process to fly people back and forth periodically, so that each local group has personal contact with members from other groups." p75
 * but we have a communication problem with too many mediums of communication

Chapter 4: Implementing a Testing Strategy

 * Tyler: a lot of emphasis on cucumber style testing, moreso than anythign else
 * JR: bluuuuguh
 * Antoine: cucumber was trendy
 * Lars: http://git.liw.fi/cgi-bin/cgit/cgit.cgi/cmdtest/tree/README.yarn is inspired by cucumber and explicitly aims to have test suites be documentation
 * "The design of a testing strategy is primarily a process of identifying and prioritizing project risks and deciding what actions to take to mitigate them." p84
 * JR: testing pyramid etc
 * Zeljko: I actually love cucumber. We tried here. It failed because developers used it. We didn't use it as a communication tool (which is what it is meant for). It was just overhead.
 * Zeljko: I dont think any team has a clear testing strategy. Finding project risks and deciding what to do about them in terms of testing. A valid strategy would be to have no tests *if* after reviewing the risk you decide that the risks are not going to be mitigated by tests.
 * Jeena: are devs responsible for writing them? (Yes)
 * Tyler: devs write them, but Antoine is responsbile for how they run
 * "The best way to introduce automated testing is to begin with the most common, important, and high-value use cases of the application. This will require conver- sations with your customer to clearly identify where the real business value lies" p94
 * 

Chapter 5: Anatomy of the Deployment Pipeline

 * "In many organizations where automated functional testing is done at all, a common practice is to have a separate team dedicated to the production and maintenance of the test suite. As described at length in Chapter 4, “Implementing a Testing Strategy,” this is a bad idea. The most problematic outcome is that the developers don’t feel as if they own the acceptance tests." p125
 * "developers must be able to run automated acceptance tests on their development environments" p125
 * "While acceptance tests are extremely valuable, they can also be expensive to create and maintain." p126
 * "Automated acceptance testing is what frees up time for testers so they can concentrate on these high-value activities, instead of being human test-script execution machines." p128
 * Lars: we should start measuring cycle time, but maybe not as the book defines it. From the first push to gerrit to when it's in production.
 * Lars: should start measuring and graphing it now.
 * TODO: ^^^^
 * Tyler: we should build it into the pipeline itself
 * Tyler: the peice we're missing is metadata about deployments, see eg: https://gist.github.com/thcipriani/8c2dfc746591342c4bc332e5bccc9226
 * Lars: at All Hands, we wanted to brag about what we achieved last year, if be great if we could say we did 12,237 deploys with a avg cycle-time and cycle time is so and so many seconds
 * 
 * Jeena: what's the bottleneck? Aka use metrics to identify the main bottleneck and address that first.
 * Antoine: discussion on the logspam. We spend lot of time dealing with them during train deployment.
 * Antoine: logs are not really feeded back to developers as a whole (but some do look at them)
 * Greg: Takeaways:
 * Main thing: Lack of ability for us to have a testing environment that we can easily create multiple of - push-button testing environment.  I'm sure Tyler's heartrate just increased.  We can't lead that work by ourselves.  We need help from SRE etc.  Keep on a tickler with SRE?
 * JR: Pain around testing environment (staging discussion) - those are hard discussions because the environment was growing in terms of requirements. Felt like initial discussion was about replacing beta with a staging env; but instead how do we satisfy a bunch of other requirements...  Some of this may not be as tightly coupled to SRE.  What are things besides staging environment that we could develop?
 * Greg: The solution isn't to make beta better.
 * Tyler: I disagree. We need to make beta better.  It's the only thing close to production that we have.
 * Jeena: Not sure I understand beta very well, but I think we could utilize pieces that are already there to be able to also make separate envs that are disposable.
 * Greg: Improving beta seems like fixing instead of rebuilding, is that feasible?
 * Tyler: I agree that it's a question of fixing rather than building a new one. I don't think it's possible to build a new production like thing - production was smaller then.  I think we should be building new things, new environments that approach production, but beta is as close to production like as we will get without SRE building staging.  I don't think we could ever have a staging that's production-like that we can tear down and spin back up - we _can't do that with production itself_.  Because it's hard to build a prod-like environment, we have a bunch of hardware sitting idle.
 * From chat: This is a political problem.
 * Tyler: discussed during thursday meeting. There will be a staging at some point. Will be supplanting some uses of Beta (mostly the pre prod deploy). They don't want devs to ever touch staging. We also need a place were devs can touch and change things.
 * Jeena: how can we use producationt o make it reproducible
 * Mukunda: we don't have the rights/political capital
 * Tyler: we can use Beta as this venue, it's the same puppet.
 * Tyler: Joe had a proposal on how to help beta from a while ago
 * TODO: Tyler dig that up.... ✅ https://phabricator.wikimedia.org/T161675
 * Lars: what are your overall opinoins so far, has it been intersting/useful?
 * Greg: it's useful as a framework and a means to communicate and a reminder to do the things we know we need to do
 * Brennen: sometimes the most interesting bits are the parts I disagree with
 * Zeljko: reading it will be a good exercise for us to find out what we agree one etc (do we really want all libraries checked in, push button deploys, etc)

= Second meeting =
 * Lars' notes for ch 6 and 7: https://files.liw.fi/temp/ch6-7.mdwn (will be deleted eventually)

Željko's quotes from the book

 * 1) chapter 6

p 153: Use the Same Scripts to Deploy to Every Environment

p 166: Test Targets Should Not Fail the Build In some build systems, the default behavior is that the build fails immediately when a task fails. This is almost always a Bad Thing—instead, record the fact that the activity has failed, and continue with the rest of the build process


 * 1) chapter 7

p 169 Introduction Ideally, a commit stage should take less than five minutes to run, and certainly no more than ten.

p170 Commit Stage Principles and Practices p171 Provide Fast, Useful Feedback p172 A common error early in the adoption of continuous integration is to take the doctrine of “fail fast” a little too literally and fail the build immediately when an error is found. We only stop the commit stage if an error prevents the rest of the stage from running—such as, for example, a compilation error. Otherwise, we run the commit stage to the end and present an aggregated report of all the errors and failures so they can all be fixed at once.

p177 Commit Test Suite Principles and Practices The vast majority of your commit tests should be comprised of unit tests Sometimes we fail the build if the suite isn’t sufficiently fast.

p178 Figure 7.3 Test automation pyramid

Chapter 6

 * Z: Thoughts on our deployment process - we changed from https to ssh for push...
 * Z: use the same scripts (p 153): we don't, eg for Beta, only way to test is to do it in production
 * Z: fixing documentation as I was doing it
 * Z: not all of us were using the same process, eg: mukunda was already using gerrit over ssh instead of the http auth
 * Greg: It's very easy for parallel deploy processes to spring up that only devs / ops use, we're at the point where we have multiple deploy scripts, effectively. Were there changes needed for deployments in Beta?
 * Tyler: It's really due to branch cutting.
 * Lars: A lot of this is obsolete (on technical specifics) - but I agree with the point that we should all do deployments the same way everywhere, and agree with the further point that we should use the same script for all deployments. We're not doing that partly because we can't set up a local dev environment that resembles production.
 * Dan: We do use scap to deploy to the beta cluster, so the tools are the same, but the environment is still drastically different.
 * Z: I think the point of this was you should test your deployment scripts from local -> intermediate -> production. If you deploy with the same script multiple times a day, things are tested thousands of times.
 * Dan: I think that's actually a false sense of security. At some point your processes have to be the same, and just because you're using the same script to push to one place as another, but it seems like there's more to this, especially in our envs.  The process that we use to deploy to beta cluster is not aligned with the process for production.  We don't actually stage  - we have a jenkins script doing that deployment.  The cadence of deployment differs so drastically taht we don't get a good sense of how the prod deploy will go based on what's happening in beta.
 * Z: I think what the book says is that your environments should be similar.
 * B: emphasis on testing the deployment system/having the deployment system interogate the environment
 * T: smoke tests?
 * L: not smoke tests, but checking that extensions are in-place and possibly doing a black-box test that verifies this
 * Z: I'm glad that scap has canaries
 * T: Things that scap checks:
 * that files exist on disk
 * runs a basic check that PHP exists
 * checks to see the app isn't throwing errors on stdout
 * does linting of JSON and PHP
 * does smoke tests at the end - deploys to canaries, then checks canaries
 * L: Seems like we're not doing things badly - perhaps we can do better in future in some ways.
 * D: That approach works in production because we have a lot of traffic. Not as much in beta.
 * T: Currently using service checker - https://en.wikipedia.org/spec.yaml
 * D: Service checker might be easier than Selenium etc.
 * T: We're currently not using it to its fullest, would like to get to the point of doing that before canaries
 * TODO: Look into running service-checker on the deployment host before pushing out to the canaries
 * L: Would like to bring up that we don't have ways of bringing up environments. We could do with fairly frequent reviews of what the deployment process looks like.  We should all be involved.
 * Z: All should deploy.
 * L: Yes, but also we should all be involved in automating the deployment.
 * Z: Two reasons everyone should deploy:
 * I don't think everybody hates deployment as much as they should.
 * different skill sets means that we have different improvments to build a robust deployment system
 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Skill_matrix
 * T: we rotate, but we don't do pairing on that, so we're just focused on doing it, not improving it, we don't have the capacity to even think about those improvements
 * B: Emphasis on pairing for deploys seems more important to me having watched this happen.
 * (Discussion of how bad or not the train experience is.)
 * J: I don't think a human should have to do it :)
 * L: I agree.
 * (Some more discussion of pairing, etc.)
 * G: To tie a bow on this - would be worth experimenting with pairing with one person focusing on automation for a quarter or something. Would be a good experiment to timebox and see what could be accomplished.  Tradeoff we have to remember is that whatever investment we put into scap is going to be lost in a year and a half.
 * Z: Thought I had about pairing was a new person + existing team member... Maybe we need a serious level up on scap, etc.
 * T: My thing about it's not worth inversting time now - there are cultural shifts that are going to have to happen for continous deployment. I think those are things that if we invest some time in scap with the view that our goal is continuous...  That sort of work won't necessarily be lost.
 * J: Maybe we just work faster to get rid of it.
 * L: Backports?
 * M: backport specifically meaning cherry-pick to the deployed production branch - the thing that SWAT does a lot of
 * T: Fixing stuff right now.
 * G: Porting code changes is easier than porting the bullet points on a wiki page.
 * M: Don't think devs will push back on continuous deployment.
 * B: There's gotta be stuff to automate.
 * T: I think the list of bullet points on the wiki page means we ran out of stuff to fix. Automating the entire train should be possible.
 * D: The manual process right now is basically bouncing around from one disparate tool to another.
 * Z: One tool to rule them all?
 * D: Either port the other tools into scap or wrap them up in something.
 * (Points from chat about rolling branch cutting / image that contains /srv/mediawiki-staging/ into the pipeline.)
 * L: logspam being something we can't automate?
 * Z: yes, it's a pain to look through the logstash dashboard, hunting people down and getting them to fix it is a pain
 * L: anything that looks like an error halts the deployment and it goes to the developer to fix it
 * Z: that's basically the process now
 * L: if there's anything in logs we abort without question
 * Z: if there's anything I still block it from going forward (iow: not roll back, unless there's other issues)
 * B: could we start with aborting on any _increase_ in logspam? i feel like the book mentioned something like this at one point.
 * L: CDep will make this easier as the amout of change will be small (one change at a time) so we know what the cause is
 * B: can we make a concerted effort to get there?
 * Z:
 * M: we have tried for a while
 * G: Tooling is partly to blame for the current state of affairs.
 * T: MediaWiki known errors dashboard
 * JR: This is the same argument that I hear when talking about TechDebt. I think saying "stop everything" is a non-starter.  However, coming up with a plan to work towards "0" could encompass a "no new spam" and "remove existing log spam in small increments" approach...
 * M: what we need is statistical anomaly detection on the error log _rate_
 * T: There's a lot of pressure from the outside - chapter 1 - the process is subverted to meet the timeframe. That's what people mean when they say this isn't workable.
 * L: Other crazy idea:
 * Get a way to set up a production-like env
 * Smoke test / etc.
 * Continuous deployment there
 * Notify people who make changes if there are errors in the logs
 * T: There are Icinga alerts for production if error rate goes above... Something. And beta may mirror production there.  Beta is continuously deployed to, but it's really not production-like.  We'd have to create a more-production-like beta...
 * D: Say the word...
 * The greek chorus: STAGING
 * J: maybe if we make this new environment we could also work on automating DB creation
 * T: Mukunda proposed a long time ago the long-lived branches thing - merging to a branch rather than cutting a new branch every week might be a prereq for being able to deploy smaller groups of changes in a more continuous fashion.
 * M: The way I see it the best place we could get to in near future is to be always SWATing. That requires everything to go through a process.
 * T: That requires long-lived branches.
 * M: Sure - always swatting to the deployed branches.
 * D: A lot of what we're discussing in this are things we've considered for the pipeline design - incrememental test gating, etc., so we could invest time on our existing process and tooling or we could invest that into implementing the timeline.
 * T: I'm curious how much of this is just necessary prerequisite work for the pipeline.


 * (Discussion of pairing goals, logspam as a long-term problem and whether we should make that a goal.)

Chapters 7, 8, and 9 for next time.