Wikimedia Release Engineering Team/Quarterly review, August 2014

From mediawiki.org

Date: September 3rd | Time: 17:00 UTC | Slides | Notes

Topics: Deploy process and pipeline; browser, manual, and unit testing; everything from development to deployment.

Team[edit]

Who[edit]

  • Lead: Greg
  • Team: Antoine, Chris, Dan, Mukunda, Rummana, Sam, Zeljko

Big picture[edit]

Release Engineering is where our code quality efforts can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our mission.

What we want to accomplish:

  • More appreciation of, response to, and creation of tests in development
  • Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
  • Reduce time between code being merged and being deployed
  • Provide information about software quality in a way that informs development and release decisions
  • Help WMF Engineering learn and adapt from experience

Team roles[edit]

  • Phabricator: Mukunda
  • Deployment tooling1: Sam, Mukunda
  • Jenkins/Zuul: Antoine, Zeljko
  • Beta cluster development/maintenance: Antoine
  • Automated browser tests: Chris M, Zeljko
  • Manual browser testing: Rummana, +1
  • Vagrant: Dan

Previous Quarter Review[edit]

Deployment tooling[edit]

  • Yes Done - Process through all (useful) pain points from the Dev/Deploy review session
  • on-going - Integrate HHVM support into our deployment systems
  • Yes Done - start the scap(py) & trebuchet integration conversation (strech goal)

Beta cluster[edit]

  • Yes Done - Support HHVM in Beta Cluster
  • on-going - Swift cluster in beta (strecth goal)

MediaWiki Release[edit]

  • Yes Done - Successfully support the release of MediaWiki 1.23
  • Yes Done - Kickoff/complete second RFP
  • on-going - Investigate and create useful release/deployment metrics visualizations (stretch goal)

Browser tests[edit]

  • on-going - Use tags to run builds appropriate to released versions (e.g. don't run master build on test2wiki)
  • Yes Done - Retire Cloudbees Jenkins instance
  • Yes Done - Integrate WMF Jenkins with new WMF SauceLabs account
  • Yes Done - Use API to create test data at runtime more widely
    • Used by MobileFrontend
    • Used by VisualEditor
    • Used by smoke tests
  • Yes Done - Add browsertests to new repos
    • GettingStarted

Hiring[edit]

Next Quarter[edit]

Phabricator[edit]

Mukunda

  • Migration from Bugzilla completed
    • Be an example early adopter of features
  • Migration from Trello/Mingle started
  • Migration from Gerrit completed (pending unforeseen issues)

metrics

Number of team migrated to Phabricator vs number of teams using Trello/Mingle right now

Deployment tooling[edit]

Sam, Mukunda

  • scap(py) & trebuchet integration
  • increasing bus factor (important due to new hires/team changes)

Jenkins[edit]

Antoine, Zeljko

  • Jenkins performance improvements
    • performance is suffereing as it controls more and more tasks, causing many false failures.
    • provisioning more slaves is one aspect of this
  • maintenance and new test infrastructure requests (ongoing)

Beta cluster[edit]

Antoine, Dan, Sam

  • Add new services (-oids)
  • Swift cluster (remove NFS)
  • Beta Cluster monitoring (baseline)
  • Yet Another Cluster

metrics

  • Real data and graphs from monitoring services

Browser tests[edit]

Chris, Zeljko, Dan

  • Workshops/trainings in lieu of one-to-one pair programming
  • Improved "best practices" and "getting started" documentation
  • Continued pairing with WMF Engineering teams
  • Begin pairing with the Flow team
  • Environment abstraction layer in mediawiki-selenium to allow for less fragile and more advanced step definitions

metrics

tracking state of browser tests before Thursday branch cut
days since last green build, per Jenkins job
note that the biggest factor in false failures today is poor performance from Jenkins (see above)

Vagrant[edit]

Dan

  • Wrap up pairing with MobileFrontend
  • Better browser test runner, eg "vagrant run-browser-tests"
  • Investigate creating shareable vagrant- or docker-based test environments
  • Optimize memory hungry services running in the vagrant VM (reduce base memory usage)

metrics

qualitative survey of WMF teams on their use of Vagrant
number of/percentage of WMF production deployed extensions available in Vagrant

Hiring[edit]

Greg, Chris

Questions[edit]

from notes

Action Items[edit]

from notes:

  • add value statement with each goal in next quarter's presentation
  • create RFC around deployment systems
  • Rob/Greg/Rummana/Chris team discussion about scripted testing, especially with new QA Tester starting
  • revisit/discuss role of scripted testing vs exploratory testing at next quarterly review
  • Greg to bring up use case for bare metal test cluster