Wikimedia Release Engineering Team/Deployment pipeline/2019-01-17

Last Time

 * 2018-12-20
 * Archive

Current Quarter Goals

 * TEC3:O6:O:6.1:Q3: Deployment Pipeline Documentation
 * TEC3:O3:O3.1:Q3: Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production CD Pipeline

General
TODO: start an email thread
 * Pipeline cabal meetup as part of All-Hands?
 * do we have any current projects (2 weeks from now) that would benefit from in-person high-bandwidth?
 * Hacking on ORES maybe?
 * Productivity aside, it was fun at the previous Hackathon. :)
 * Joe: We have some time Monday evening (maybe) but ideally we'd work at the hackaton that no one goes to :(
 * Lars: is anyone opposed to hanging out?
 * Alex: maybe a small group since that's more productive
 * Joe: we should come some ideas for stuff to work on


 * Lars's email > I'm looking for feedback on whether the vision I'm describing is where we want to end up.
 * fselles: sounds like a comprehensive plan. re:deployment velocity we need existing metrics for this
 * thcipriani: aside RelEng is thinking about this
 * alex: don't we have statsd counters
 * thcipriani: we do, but we can't trace individual scap commands to patches or windows etc
 * jeena: don't we care about how long it takes an individal patch to hit production?
 * dan: I think elasticsearch might be better than a statsd since we need to tie this together using logged metadata (repo deployed, scap commands, patchsets deployed, etc)
 * Joe: one thing I didn't see was self-servicing, i.e., create a repo and everything is setup for a developer -- how much toil is needed for this?
 * Alex: SRE/serviceops, RelEng, 4 different commits in 4 different projects -- there is quite a lot of friction here
 * Joe: something more is needed in terms of UI from the point of view of the developer, we should think about setup from all points of view, maybe when someone creates a .pipeline file it sets up the pipeline for them
 * Lars: I agree that the developer experience should be massively simpler that current, as I was thinking about this I hadn't gotten as far (as UI yet)
 * Joe: we want to take it further!
 * Lars: there is a proposal to start continuous deployment with the Blubberoid service, i.e., not load balancers and k8s; is there any objection to having a continuously deployed Blubberoid?
 * Joe: what you are proposing is > CDep, it is total ownership of a service -- Icinga needs work to allow this -- but I think we can experiment with CDep with Blubberoid
 * Lars: Proposal for this is due to Blubberoid being a nice, safe, small, and friendly service -- it has no dependencies or databases; input over http and output over http -- can't get more simple
 * fselles: icinga does need work, but we need metrics
 * Lars: RelEng will start thinking about what we need to make this happen
 * Joe: we need to work on permissions for k8s
 * Antoine: or we get an Icinga container in the pod that runs the service and deploy it ourselve via helm?
 * Joe: Pearson does a namespace per project including a Jenkins instance but let's not do this :) Sadly CDep may not possible for some services since there are many interdependant services, so it's best to start with something simple

RelEng

 * Dan working on Manually defining artifacts results in default copy of all project files
 * Code proposal: .pipeline/blubber.yaml:
 * and a short-hand format/structure that expands


 * The continuous release pipeline should support more than one service per repo
 * Implicit assumption that every repo is one service Counterexample: MediaWiki! (good point :))
 * Implicit assumption that there is one test entrypoint per repo
 * Code proposal: .pipeline/config.yaml
 * Joe: let's keep everything that developer needs to control in the repo, what dan is proposing seems sane to me
 * Dan: this is the inverse pattern of mediawiki


 * Added Wikimedia Portals to tbd on the migration to k8s task https://phabricator.wikimedia.org/T198901#4881831
 * seems self-contained
 * gets it out of the mediawiki deployment tree (/srv/mediawiki-staging)
 * no more portals deploy in SWAT

Minor update things

 * Blubber docs sparkle: https://wikitech.wikimedia.org/wiki/Blubber
 * Blubber binary downloads on releases: https://releases.wikimedia.org/blubber/
 * Thanks Alex for the review!
 * ASIDE: Moving scap back to gerrit -- going to use the test portion of the pipeline to run tests -- was really nice and simple (for a person who has contributed to blubber anyway) https://phabricator.wikimedia.org/D1138

Serviceops

 * Zotero has been handled over to Marrielle today \o/
 * Managed to deploy, rollback and get changes through the pipeline
 * One issue that did come up was the difficulty of finding out the version/tag of the image.
 * Should jenkins-bot comment on the change and say "Here's the newly created image: " +1 +1 (even better if it's not in a comment but somewhere more visible)+1
 * Joe: main technical points are there, but we need to polish the ui of the pipeline
 * Jeena: is there no visual indication in jenkins?
 * Dan: Kinda sorta -- we have the blue ocean dashboard, but it's not the default, it needs work -- feedback needs to be addressed sooner rather than later
 * fsells: I ran a patch through the pipeline and it failed, which is fine, but I had no way to rerun it
 * thcipriani: currently you can comment "recheck" on a patch, but that is totally not discoverable, I want a gerrit plugin for this

Services
= As Always =
 * Release Pipeline Workboard
 * Meeting notes