Wikimedia Release Engineering Team/SSD Sync Up/2019-07-02

= 2019-07-02 = Last Time: 2019-06-25

Deployment Pipeline
Workboard

TODOs from Last Time

 * thcipriani -- Pipeline image build cleanup for contint1001
 * https://gerrit.wikimedia.org/r/519083
 * thcipriani -- Base Blubber policy file for CI
 * need to bump blubber version
 * TODO (next week) jeena + thcipriani to pair update blubber 0.8.0
 * pipeline config validation
 * brennen: patches coming Soon™
 * TODO Pipeline docs
 * I'm getting notifications that things are linking to Wikitech/Blubber
 * https://wikitech.wikimedia.org/wiki/Deploying_a_service_in_kubernetes
 * https://wikitech.wikimedia.org/wiki/Docker
 * so folks are poking at the edges
 * ❌ thcipriani: to scope task
 * contint1001 store docker images on separate partition or disk
 * Dzahn has claimed.

Other Work

 * No additional work expected in the next week. We have enough.

New CI

 * v2 of CI arch doc: https://docs.google.com/document/d/1EQuInEV-eY_5kxOZ8E1qEdLr8fb6ihwOD9V_tpVFWuU/edit
 * Only a few comments receieved, no significant changes. v3 someday, but not immediately.
 * Lars is hacking up new components around GitLab (mostly independent of what CI engine we choose, GitLab is just the first one to be tried)
 * Starting setup of components
 * VCSWorker should be done this week; works now, locally, but needs a deployment to a test instance
 * http://git.liw.fi/wmf-ci-arch/tree/vcsworker.py
 * HTTP API
 * Signed JSON web tokens
 * Controller (conductor)
 * Listens to Gerrit and triggers events in GitLab
 * DeploymentWorker
 * GitLab + Gerrit Stream events
 * Support will have to be written for stream events
 * GitLab is written in Ruby
 * lars: hopefully we will not have to touch GitLab code, but will use the API
 * Future CI UI
 * since any solution will be hidden from users, the UI must expose enough information to not frustrate our users
 * Lars: don't want to expose gitlab even for logs is due to security -- zero day exploits in Read-Only mode -- Jenkins is a good example of this
 * Migration Plans
 * Argo, Zuulv3, GitLab
 * 437 existing zuul jobs
 * doc publishing, pipeline image publishing, code-coverage + codehealth, periodic jobs in beta, and (oh right) test jobs
 * Existing pipeline
 * Docker-in-Docker seems hard to get right
 * Zuulv3
 * "Nodepool" and politics (also zookeeper for some reason)
 * antoine: difficult to deal with the long tail (historically)
 * Migration draft zuulv2 to zuulv3: (thcipriani can't find patchset :
 * One idea: move from zuulv2.5 -> zuulv3, then further migration with CI working group
 * Gets rid of Jenkins
 * Gets rid of python2 sooner (hard deadline given python2 EOL of 1 January 2020)
 * Not necessarily the end position.
 * Discussion of current status quo, who owns maintenance of existing Zuul config (Antoine mostly, James and others some). This all being a nontrivial maintenance task.
 * https://jenkins-debian-glue.org/
 * https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/jjb/operations-debs.yaml#71
 * Talk to SRE about Zuul v3 needs TODO
 * migration plan -- we don't have one

TODOs from Last Time

 * mediawiki/core blubber: Jam a shell script into the builder?
 * ✅ ??? Decide on that post-Wednesday MW extensions meeting.
 * TODO: the actual jamming of shell script
 * TODO: make a phab task to describe changes needed to scaffold.sh in the charts repo to support local dev
 * ✅ https://phabricator.wikimedia.org/T226660
 * ✅ go ahead and make patch set https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/519485/
 * ✅ poke SRE
 * ❌ (thcipriani to send email) thursday: SRE, how much can we break?

Other Work

 * Porting from local-charts to deployment-charts ( https://phabricator.wikimedia.org/T224935 ):
 * Add restbase chart (port from local-charts)