Wikimedia Release Engineering Team/Quarterly review, August 2014

Date: August 14th | Time: 17:00 UTC | Slides: TBC | Notes: etherpad - on-wiki

Topics: Deploy process and pipeline; browser, manual, and unit testing; everything from development to deployment.

Who

 * Lead: Greg
 * Team: Antoine, Chris, Dan, Sam, Mukunda, Rummana, Zeljko

Big picture
Release Engineering is where our code quality efforts can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our mission.

What we want to accomplish:
 * More appreciation of, response to, and creation of tests in development
 * Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
 * Reduce time between code being merged and being deployed
 * Provide information about software quality in a way that informs development and release decisions
 * Help WMF Engineering learn and adapt from experience

Team roles

 * Deployment tooling1: Sam, Mukunda
 * Automated browser tests: Chris M, Zeljko
 * Manual browser testing: Rummana
 * Beta cluster development/maintenance: Antoine
 * Jenkins/Zuul: Antoine, Zeljko
 * Phabricator: Mukunda
 * Vagrant: Dan

Deployment tooling

 * ✅ - Process through all (useful) pain points from the Dev/Deploy review session
 * on-going - Integrate HHVM support into our deployment systems
 * ✅ - start the scap(py) & trebuchet integration conversation (strech goal)

Beta cluster

 * ✅ - Support HHVM in Beta Cluster
 * on-going - Swift cluster in beta (strecth goal)

MediaWiki Release

 * ✅ - Successfully support the release of MediaWiki 1.23
 * ✅ - Kickoff/complete second RFP
 * on-going - Investigate and create useful release/deployment metrics visualizations (stretch goal)

Browser tests

 * on-going - Use tags to run builds appropriate to released versions (e.g. don't run master build on test2wiki)
 * ✅ - Retire Cloudbees Jenkins instance
 * ✅ - Integrate WMF Jenkins with new WMF SauceLabs account
 * ✅ - Use API to create test data at runtime more widely
 * Used by MobileFrontend
 * Used by VisualEditor
 * Used by smoke tests
 * ✅ - Add browsertests to new repos
 * GettingStarted

Hiring

 * ✅ - Complete hiring for Release Engineer
 * ✅ - Complete hiring for Automation Engineer (Ruby)

Phabricator

 * Migration from Bugzilla completed
 * Be an example early adopter of features
 * Migration from Trello/Mingle started
 * Migration from Gerrit completed (pending unforeseen issues)

metrics
 * Number of team migrated to Phabricator vs number of teams using Trello/Mingle right now

Deployment tooling

 * scap(py) & trebuchet integration
 * how much and what it looks like TBD, some from the requirements doc
 * increasing bus factor (important due to new hires/team changes)

Jenkins

 * Jenkins performance improvements
 * performance is suffereing as it controls more and more tasks, causing many false failures.
 * provisioning more slaves is one aspect of this
 * maintenance and new test infrastructure requests (ongoing)

Beta cluster

 * Add new services (-oids)
 * Swift cluster (remove NFS)
 * Yet Another Cluster
 * outages caused by HHVM testing adversely affected many teams, including a live demo at Wikimania of which RelEng was unaware. A second beta cluster would be valuable.
 * This work includes the monitoring support services
 * Not a RelEng-only undertaking

Browser tests

 * Workshops/trainings in lieu of one-to-one pair programming
 * Improved "best practices" and "getting started" documentation
 * Continued pairing with WMF Engineering teams
 * Begin pairing with the Flow team
 * Environment abstraction layer in mediawiki-selenium to allow for less fragile and more advanced step definitions

metrics
 * tracking state of browser tests before Thursday branch cut
 * days since last green build, per Jenkins job
 * note that the biggest factor in false failures today is poor performance from Jenkins (see above)

Vagrant

 * Wrap up pairing with MobileFrontend
 * Better browser test runner, eg "vagrant run-browser-tests"
 * Investigate creating shareable vagrant- or docker-based test environments
 * Optimize memory hungry services running in the vagrant VM (reduce base memory usage)

metrics
 * qualitative survey of WMF teams on their use of Vagrant
 * number of/percentage of WMF production deployed extensions available in Vagrant

Hiring

 * Complete hiring for QA Tester

Questions
from notes

Action Items
from notes