Wikimedia Release Engineering Team/Deployment pipeline/20170830-planning

From mediawiki.org

2017-08-30[edit]

General note: Tentative timeline will ask for draft goals in about 1.5 wks

Status[edit]

TechOps[edit]

  • Have buy-in on network policy approaches for how to deploy Kubernetes pods
    • Implementation working
    • Left: upload to Puppet repo, document it, and keep synced with Calico API
  • Kubernetes 1.7 (latest), provides dynamic admission controllers
    • Finishing Puppetization
    • Intent is NOT to break toollabs :P
  • No progress on ingress solutions
  • And finally, Giuseppe and Alex looking at the standard pod structure.

RelEng[edit]

Services[edit]

  • No official service delivery goals this quarter, not much progress last month.

Upcoming quarter goals (by team)[edit]

TechOps[edit]

  • Pending draft goal would be infrastructure work
    • Main idea long-term is to get all services running
    • We have been waiting for Kub 1.7 to do [?] so we want to finally fix that next quarter
    • Open ingress point
    • Monitoring, since currently we have some icinga but no good checks for node down, heapster; no good graphs
      • We are used to host based monitoring, but this will be different; perhaps look into not using icinga and going straight to prometheus
    • Proposal from last Q for container security upgrades, so that will consume some time https://phabricator.wikimedia.org/T167269
      • That work should be coordinated with Moritz on general updates, so tentative for next Q
  • After that, we would be blocked on a trial service running, so (ideally) we should aim for that by end of Q2
    • We should be at a point where we could run a real trial (non-production) service...maybe halfway through Q2

RelEng[edit]

  • No draft goals yet
  • If we miss Q1 goals, those will carry into Q2
  • Monitoring for services is the main thing we want to get in place
  • Be able to deploy to staging cluster and get feedback to [?] seems reasonable for Q2
  • Trial service? Should be ready after blubber is working, so by early Q2. Enough to test infrastructure.
  • Services needs a way to control what goes to production, so that could be Q2. (Helm). Depends on integrating with services
  • Could other teams help? We have work to do.
  • https://phabricator.wikimedia.org/T157469 (flow of things that need to be completed before this service is "done")

Services[edit]

  • No official service delivery Q2 goal so far, drafts at https://www.mediawiki.org/wiki/Wikimedia_Services/2017-18_Q2_Goals
  • Considering continuing dev environment / mwctl work, but resources are tight. Open to re-prioritizing other goals based on feedback.
  • Would like more clarity about long-term milestones and goals of this program. Where should we be at end of Q2 or Q3?
  • Focus for Q2 is on storage
  • Also want to do something on delivery side, but struggling with time available, so would like to shift priorities to make time
  • Uncertainty about coordination with other teams
    • We should aim to be ready by end of Q2

Discussion[edit]

  • Seems like RelEng may be a bottleneck by late Q2 into Q3
  • A lot of the work will be RelEng + Services
  • As we think about Q3 goals, think about other teams, not just our own; we should work even closer than we are today
  • Milestones
    • Able to run and test a service for infrastructure
      • We can get that going with a lot of workarounds
    • Full pipeline with self-serve (but that's a big step up)
      • I don't think we have a proper design for that yet
      • Once we start running a service, we can design that
      • Should we kick off the design work sooner? Would prefer not to wait until Q3 planning before we start
    • Design arttifact as a possible milestone in Q3
  • There is a weekly meeting
    • Currently building the pipeline to create the containers
    • Next step would be deployment tool with defacto being helm
    • Create a wiki page design doc
      • We should have that as an early Q2 goal. Let's make it explicit as a goal
      • ACTION: Alexandros can write 80% of the design doc by the next meeting
  • Next meeting is at end of quarter, so too early for design
  • ACTION: Mark will create phab tasks
  • ACTION: Tyler document RelEng draft goals before next week (but really that's true for all teams)
  • Services might be conservative this quarter and only make soft commitments on design work & dev environment
  • Go-around of concerns
    • Dan: Good to externalize our vision
    • Gabriel: Looking forward to more clarity on design
    • Giuseppe: Main concern is unknowns, since we haven't actually run something yet (e.g. managing configuration) +1, configuration already came up as part of dev environment work -- Gabriel
    • Greg: I'm noodling about implications of program-centric goalsetting