Wikimedia Release Engineering Team/Deployment pipeline/20170830-planning

=2017-08-30=

General note: Tentative timeline will ask for draft goals in about 1.5 wks

TechOps

 * Have buy-in on network policy approaches for how to deploy Kubernetes pods
 * Implementation working
 * Left: upload to Puppet repo, document it, and keep synced with Calico API
 * Kubernetes 1.7 (latest), provides dynamic admission controllers
 * Finishing Puppetization
 * Intent is NOT to break toollabs :P
 * No progress on ingress solutions
 * And finally, Giuseppe and Alex looking at the standard pod structure.
 * And finally, Giuseppe and Alex looking at the standard pod structure.

RelEng

 * Goal 1
 * Define functional tests for Mathoid running on the staging Kubernetes cluster for use in future gating decisions
 * Goal 2
 * Define method for monitoring and reacting to the mathoid functional tests
 * Status:
 * blocking tasks
 * building mathoid via blubber
 * finding build location accessible by jenkins (the ci-admin LDAP group, now done this week)
 * "Not optimistic" about completing those goals this quarter

Services

 * No official service delivery goals this quarter, not much progress last month.

TechOps

 * Pending draft goal would be infrastructure work
 * Main idea long-term is to get all services running
 * We have been waiting for Kub 1.7 to do [?] so we want to finally fix that next quarter
 * Open ingress point
 * Monitoring, since currently we have some icinga but no good checks for node down, heapster; no good graphs
 * We are used to host based monitoring, but this will be different; perhaps look into not using icinga and going straight to prometheus
 * Proposal from last Q for container security upgrades, so that will consume some time https://phabricator.wikimedia.org/T167269
 * That work should be coordinated with Moritz on general updates, so tentative for next Q
 * After that, we would be blocked on a trial service running, so (ideally) we should aim for that by end of Q2
 * We should be at a point where we could run a real trial (non-production) service...maybe halfway through Q2

RelEng

 * No draft goals yet
 * If we miss Q1 goals, those will carry into Q2
 * Monitoring for services is the main thing we want to get in place
 * Be able to deploy to staging cluster and get feedback to [?] seems reasonable for Q2
 * Trial service? Should be ready after blubber is working, so by early Q2. Enough to test infrastructure.
 * Services needs a way to control what goes to production, so that could be Q2. (Helm). Depends on integrating with services
 * Could other teams help? We have work to do.
 * https://phabricator.wikimedia.org/T157469 (flow of things that need to be completed before this service is "done")

Services

 * No official service delivery Q2 goal so far, drafts at https://www.mediawiki.org/wiki/Wikimedia_Services/2017-18_Q2_Goals
 * Considering continuing dev environment / mwctl work, but resources are tight. Open to re-prioritizing other goals based on feedback.
 * Would like more clarity about long-term milestones and goals of this program. Where should we be at end of Q2 or Q3?
 * Focus for Q2 is on storage
 * Also want to do something on delivery side, but struggling with time available, so would like to shift priorities to make time
 * Uncertainty about coordination with other teams
 * We should aim to be ready by end of Q2

Discussion

 * Seems like RelEng may be a bottleneck by late Q2 into Q3
 * A lot of the work will be RelEng + Services
 * As we think about Q3 goals, think about other teams, not just our own; we should work even closer than we are today
 * Milestones
 * Able to run and test a service for infrastructure
 * We can get that going with a lot of workarounds
 * Full pipeline with self-serve (but that's a big step up)
 * I don't think we have a proper design for that yet
 * Once we start running a service, we can design that
 * Should we kick off the design work sooner? Would prefer not to wait until Q3 planning before we start
 * Design arttifact as a possible milestone in Q3
 * There is a weekly meeting
 * Currently building the pipeline to create the containers
 * Next step would be deployment tool with defacto being helm
 * Create a wiki page design doc
 * We should have that as an early Q2 goal. Let's make it explicit as a goal
 * ACTION: Alexandros can write 80% of the design doc by the next meeting
 * Next meeting is at end of quarter, so too early for design
 * ACTION: Mark will create phab tasks
 * ACTION: Tyler document RelEng draft goals before next week (but really that's true for all teams)
 * Services might be conservative this quarter and only make soft commitments on design work & dev environment
 * Go-around of concerns
 * Dan: Good to externalize our vision
 * Gabriel: Looking forward to more clarity on design
 * Giuseppe: Main concern is unknowns, since we haven't actually run something yet (e.g. managing configuration) +1, configuration already came up as part of dev environment work -- Gabriel
 * Greg: I'm noodling about implications of program-centric goalsetting