Wikimedia Release Engineering Team/MediaWiki on Kubernetes/Meeting notes/2021-02-10

= 2021-02-10 =

Always

 * Core_Platform_Team/Initiatives/MediaWiki_on_Kubernetes
 * Wikimedia_Release_Engineering_Team/MediaWiki_on_Kubernetes
 * Workboard
 * IRC:

TODOs from last time

 * Announce/post Pipeline Office Hours meeting in Slack
 * Continue conversation about CD (?)
 * We're in the middle-ish of a migration to GitLab. Do we need to wait?
 * What are the mechanisms of deployment now? The trustworthy parts, the not so much?Etc.
 * Notes from RelEng sync up:
 * What model of CD are teams envisioning? Continuous Delivery or Continuous Deployment?
 * CDep is upon merge/build, deployments are always automated
 * CDel is a human making the decision post merge to start deployment following a number of merges. Do we already have this? Does `ssh deployment.eqiad.wmnet 'helmfile apply ...'` count?
 * Rollbacks:
 * Are rollbacks automated? What are the benefits to this or risks?
 * Concept of rollback windows. A deployer explicitly opens/closes windows wherein automated rollback is allowed. After the window is closed, rollbacks are no longer automated: 1) since no user is there to handle potential fallout; 2) at some point "forward moving resolution" is a more appropriate means to fix things.
 * Deployment dependencies:
 * Do we envision deployments of separate services being independent of one another or dependent in some cases? If the latter, how do we handle dependency in our CD model?

How long is the window for considering a rolllback of a new deployment?

RelEng

 * Got pipelinelib job working in releases jenkins and publishing to registry
 * Working on multi-version build now - need to figure out how to accomplish this in pipelinelib
 * Getting localization stuff working with multi-version and getting ready for packaging helm chart
 * Adding private settings

Serviceops

 * Joe: We merged change to move apache to use general data structure for site configuration
 * Last blocker to writing chart for apache + MediaWiki
 * Some things we need to think about
 * re: chart. Do we need a dev environment case? E.g. using sqlite
 * How are we going to handle the massive amount of logging that MW generates?
 * How are we going to monitor MediaWiki in k8s?


 * Kunal (have conflict today, sorry): setting up "restricted" namespace in docker-registry is in progress

Platform Engineering

 * Shellbox merged to MW master, awaiting security review: https://phabricator.wikimedia.org/T268092