Wikimedia Release Engineering Team/Quarterly review, November 2013

From mediawiki.org

Date: 2013-11-19 | Time: 10:00 Pacific | Slides: google docs | Notes

Who:

  • Leads: Greg G, Chris M
  • Virtual team: Greg G, Chris M, Antoine, Sam, Chris S, Chad, Zeljko, Michelle G, Andre, Ariel
  • Other review participants (invited): Robla, Sumana, Quim, Maryana, James F, Ryan Lane, Ken, Terry, Tomasz, Alolita

Topics: Deploy process/pipeline, release process, bug fixing (the how of priority), code review, code management, security deploy/release, automation prioritization

Big picture[edit]

Release Engineering and QA are where our efforts in Platform can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our focus. What we want to accomplish:

  • More appreciation of, response to, and creation of tests in development
  • Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
  • Reduce time between code being merged and being deployed
  • Provide information about software quality in a way that informs release decisions
  • Help WMF Engineering learn and adapt from experience

...All in an effort to pave the path to a more reliable continuous deployment environment.

Team roles[edit]

Many people outside of the virtual team play an important role in releases, but this review will focus on the work of the following people in the following roles:

  • Release engineering: Greg G, Sam, Chris S (security)
  • QA and Test Automation: Chris M, Zeljko, Michelle G*,
  • Bug escalation: Andre, Greg G., Chris M, Chris S (security)
  • Beta cluster development/maintenance:' Antoine, Ariel(?), Sam
  • Development tools (e.g. Gerrit, Jenkins): Antoine

Goals from last quarter[edit]

  • Align browser test coverage to high profile features.
    • Yes Done Apply model of Language/Mobile embedded QA to a new feature team (specifically VisualEditor)
    • Yes Done Include more user contributed code testing (eg: Gadgets)
    • Began, but stopping Increase capacity through community training for browser tests
  • Browser tests reliably tracking features across WMF software development projects in beta cluster
    • In progress In progress Tests moved into the repositories of the code being tested
    • N Not done support for single-host extension e.g. Parsoid
  • Start comprehensive quarterly assessments of postmortems
    • In progress In progress but much more to do here. Possibly more next quarter (see below)
  • Improve our deployment process
    • In progress In progress automate as much as possible
    • In progress In progress improve monitoring
    • In progress In progress improve tooling (eg: atomic updates/rollbacks and cache invalidation)
  • Take the Beta Cluster to the next level
    • monitoring of fatals, errors, performance
      • Yes Done ganglia
      • Blocked Blocked still needed: graphite and icinga (Ops supported needed)
    • add more automated tests for eg the API
      • pending Beta Cluster support for Parsoid bug 57233
      • pending implement API tests ontop of VE+Parsoid bug 56622
    • General improvements:
      • Yes Done support for automatic deployment of git submodules e.g. VisualEditor
      • Yes Done support for pre-release extensions with full i18n messages
    • In progress In progress feed experiences/gained knowledge of Beta Cluster automation up to production automation (ONGOING)

New priorities:

  • Yes Done Successfully streamline the review and deployment of extensions (brought in from Q2)

Goals and dependencies for the next quarter[edit]

Goals[edit]

vis a vis the WMF Engineering 2013-14 goals.

Keep:

  • Successfully managed the first release of MediaWiki in conjunction with our outside contractor
    • Work with Antoine on swift backed download.wikimedia
  • Browser tests managed in feature repos with feature teams
  • more comprehensive quarter assessments of postmortems

New Priority:

  • Automated API integration tests in important areas
    • UploadWizard
    • Parsoid
    • ResourceLoader
  • Expose fatal errors from both unit tests and browser tests to teams
  • Create process documentation for ideal test/deployment steps (eg: ThoughtWorks exercise)

Deprioritize:

  • Successfully streamline the review and deployment of extensions (Done in Q1)
  • Manage build times, parallel execution, continuous execution of browser tests for optimum coverage (vague, ongoing goal)
  • Focus on community contributions and non-WMF

Dependencies[edit]

Ops dependency:

  • Provide true HTTPS support on the Beta Cluster
  • Incinga for Beta Cluster
  • More comprehensive quarterly assessments of postmortems

MW Core dependency:

  • automation of deployment process
  • Monitoring of deploys (performance)
  • Security patch management on cluster (especially after new wmfXX branches are cut)

Questions[edit]

  • Q: who is working on Vagrant support?
    • A: mostly distributed (e.g. Adam, Yuvi, etc)
  • Tomasz: what is being done to distribute work to other teams?
    • Greg: I think that's what our first bullet point was trying to get at
    • Chris: we have a great model working with the Language team
  • Resourcing for deployment tooling work (Erik)
    • Platform needs to determine it's ability to delegate some of it's long tail of responsibilities
    • begin with a sprint to kick off dev tooling with aaron/ori (arch) and bryan (dev)?
    • maybe easier to do limited bursts of support with an explicit backlog of issues for a specific component (eg ResourceLoader)

Actions[edit]

  • ACTION: Need to document the feature requirements of sartoris/etc - possible task for Bryan Davis after scholorship app (GG)
  • ACTION: clarify priority of work with antoine re vagrant spin ups for Jenkins builds (GG)
  • ACTION: change checkins to weekly (GG)
  • ACTION: revamp meeting style and project management (GG)