Wikimedia Release Engineering Team/Quarterly review, August 2014

Date: September 3rd | Time: 17:00 UTC | Slides | Notes

Topics: Deploy process and pipeline; browser, manual, and unit testing; everything from development to deployment.

Who

 * Lead: Greg
 * Team: Antoine, Chris, Dan, Mukunda, Rummana, Sam, Zeljko

Big picture
Release Engineering is where our code quality efforts can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our mission.

What we want to accomplish:
 * More appreciation of, response to, and creation of tests in development
 * Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
 * Reduce time between code being merged and being deployed
 * Provide information about software quality in a way that informs development and release decisions
 * Help WMF Engineering learn and adapt from experience

Team roles

 * Phabricator: Mukunda
 * Deployment tooling1: Sam, Mukunda
 * Jenkins/Zuul: Antoine, Zeljko
 * Beta cluster development/maintenance: Antoine
 * Automated browser tests: Chris M, Zeljko
 * Manual browser testing: Rummana, +1
 * Vagrant: Dan

Deployment tooling

 * ✅ - Process through all (useful) pain points from the Dev/Deploy review session
 * on-going - Integrate HHVM support into our deployment systems
 * ✅ - start the scap(py) & trebuchet integration conversation (strech goal)

Beta cluster

 * ✅ - Support HHVM in Beta Cluster
 * on-going - Swift cluster in beta (strecth goal)

MediaWiki Release

 * ✅ - Successfully support the release of MediaWiki 1.23
 * ✅ - Kickoff/complete second RFP
 * on-going - Investigate and create useful release/deployment metrics visualizations (stretch goal)

Browser tests

 * on-going - Use tags to run builds appropriate to released versions (e.g. don't run master build on test2wiki)
 * ✅ - Retire Cloudbees Jenkins instance
 * ✅ - Integrate WMF Jenkins with new WMF SauceLabs account
 * ✅ - Use API to create test data at runtime more widely
 * Used by MobileFrontend
 * Used by VisualEditor
 * Used by smoke tests
 * ✅ - Add browsertests to new repos
 * GettingStarted

Hiring

 * ✅ - Complete hiring for Release Engineer
 * ✅ - Complete hiring for Automation Engineer (Ruby)

Phabricator
Mukunda
 * Migration from Bugzilla completed
 * Be an example early adopter of features
 * Migration from Trello/Mingle started
 * Migration from Gerrit completed (pending unforeseen issues)

metrics
 * Number of team migrated to Phabricator vs number of teams using Trello/Mingle right now

Deployment tooling
Sam, Mukunda
 * scap(py) & trebuchet integration
 * how much and what it looks like TBD, some from the requirements doc
 * increasing bus factor (important due to new hires/team changes)

Jenkins
Antoine, Zeljko
 * Jenkins performance improvements
 * performance is suffereing as it controls more and more tasks, causing many false failures.
 * provisioning more slaves is one aspect of this
 * maintenance and new test infrastructure requests (ongoing)

Beta cluster
Antoine, Dan, Sam
 * Add new services (-oids)
 * Swift cluster (remove NFS)
 * Beta Cluster monitoring (baseline)
 * Yet Another Cluster

metrics
 * Real data and graphs from monitoring services

Browser tests
Chris, Zeljko, Dan
 * Workshops/trainings in lieu of one-to-one pair programming
 * Improved "best practices" and "getting started" documentation
 * Continued pairing with WMF Engineering teams
 * Begin pairing with the Flow team
 * Environment abstraction layer in mediawiki-selenium to allow for less fragile and more advanced step definitions

metrics
 * tracking state of browser tests before Thursday branch cut
 * days since last green build, per Jenkins job
 * note that the biggest factor in false failures today is poor performance from Jenkins (see above)

Vagrant
Dan
 * Wrap up pairing with MobileFrontend
 * Better browser test runner, eg "vagrant run-browser-tests"
 * Investigate creating shareable vagrant- or docker-based test environments
 * Optimize memory hungry services running in the vagrant VM (reduce base memory usage)

metrics
 * qualitative survey of WMF teams on their use of Vagrant
 * number of/percentage of WMF production deployed extensions available in Vagrant

Hiring
Greg, Chris
 * Complete hiring for QA Tester

Questions
from notes

Action Items
from notes:
 * add value statement with each goal in next quarter's presentation
 * create RFC around deployment systems
 * Rob/Greg/Rummana/Chris team discussion about scripted testing, especially with new QA Tester starting
 * revisit/discuss role of scripted testing vs exploratory testing at next quarterly review
 * Greg to bring up use case for bare metal test cluster