Wikimedia Release Engineering Team/Quarterly review, April 2014

Date: April 30th | Time: 18:00 UTC | Slides: pdf | Notes:: etherpad on wiki

Who:
 * Leads: Greg G, Chris M
 * Virtual team: Greg G, Chris M, Antoine, Sam, Bryan, Chris S, Chad, Zeljko, Andre, Rummana
 * Other review participants (invited): Robla, Sumana, Quim, Maryana, James F, Terry, Tomasz, Alolita, Erik

Topics: Deploy process/pipeline, release process, bug fixing, code review, code management, security deploy/release, automation prioritization

Big picture
Release Engineering and QA are where our efforts in Platform can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our focus. What we want to accomplish: ...All in an effort to pave the path to a more reliable continuous deployment environment.
 * More appreciation of, response to, and creation of tests in development
 * Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
 * Reduce time between code being merged and being deployed
 * Provide information about software quality in a way that informs release decisions
 * Help WMF Engineering learn and adapt from experience

Team roles
Many people outside of the virtual team play an important role in releases, but this review will focus on the work of the following people in the following roles:
 * Release engineering: Greg G, Sam, Chris S (security), Bryan Davis
 * QA and Test Automation: Chris M, Zeljko, Rummana
 * Bug escalation: Andre, Greg G., Chris M, Chris S (security)
 * Beta cluster development/maintenance:' Antoine, Sam, Bryan Davis
 * Development tools (e.g. Gerrit, Jenkins): Antoine, Zeljko

Goals
vis a vis the WMF Engineering 2013-14 goals.

Deployment Tooling

 * Process through all (useful) pain points from the Dev/Deploy review session (Greg)
 * some done, not all
 * Scap incremental improvements
 * step 1:
 * mostly - Refactor existing scap scripts to enhance maintainability and reveal hidden complexity of current solution (Bryan)
 * "Easy" parts are done. Remaining work was blocked on getting scap running in beta so that changes chould be tested somewhere larger than a Vagrant VM and less potentially catastrophic than production.
 * step 2:
 * - create matrix of tool requirements per software stack (MW, Parsoid, ElasticSearch) (Greg)
 * - Use above matrix to add/fix functionality in scap (or related) tooling for ONE software stack, prioritized by cross stack use (Bryan)

Beta cluster
Goal: continue to have beta labs emulate production more closely (Antoine, all)
 * - Make database in beta emulate production (set up db slaves) (Antoine)
 * partly - Use beta labs as a testing ground for the above Deployment Tooling work (Greg, Bryan, all)
 * Infra work in place, so far working out.
 * - Migrate Beta cluster from pmtpa to eqiad
 * Not from last QR but was a big priority
 * Much (most?) of the beta cluster configuration was puppetized during the migration. This is a great implevement over the prior cluster in pmtpa which included many hand-built instances.
 * Beta now includes a local puppet master which allows cherry-picking work-in-progress puppet changes and applying them across the cluster. This unblocks Antione and others from getting +2 approval in operations/pupet.git for each desired change. It also provides a testing platform for changes prior to usage in production.
 * Beta now includes a salt master which allows the use of Trebuchet and general experimentation with salt by non-roots.

Browser tests
Goal: use the API to create test data for given tests at run time. (Jeff, Chris, Željko) in heavy use in MobileFrontend tests, queued for VisualEditor and others Goal: create the ability to test headless (Željko, Jeff, Chris) but so much more to come now that we have the basic operation working Goal: run versions of tests compatible with target test environments (Chris, all) tracking this at https://bugzilla.wikimedia.org/show_bug.cgi?id=62509 but have not implemented anything from it
 * target dev environments with bare wikis/one off instances//vagrant/"hermetic" test environments
 * in support of teams who requested this, for example Mobile and public Mediawiki release (Chris)
 * in support of browser tests on WMF Jenkins '''(Jeff, Željko)
 * requires thoughtful use of the API
 * first pass: create articles with particular title and content. Create users with particular names and passwords.
 * although vagrant languishes. One focus for new hire is to bring vagrant back to current
 * targets build systems (Antoine, all)
 * today we always run the master branch of browser tests. This is inconvenient, as target environments such as test2wiki lag beta labs by at least one week.
 * create the ability in Jenkins builds to run the versions of tests appropriate to the versions of extensions in the target wiki.
 * discussion is only begun, but this would be worthwhile.

Ongoing:
 * Continue to move shared code to shared repo; e.g. Login
 * current status: https://www.mediawiki.org/wiki/Quality_Assurance/Browser_testing/Shared_features
 * Continue to maintain tests and keep them green, e.g. connection issues
 * builds WMF-Jenkins -> beta labs in place
 * builds WMF-Jenkins -> SauceLabs coming
 * builds WMF-Jenkins -> SauceLabs coming

Hiring

 * - Complete hiring and train new Test Infrastructure Engineer Release Engineer (Greg, all)
 * - Complete hiring and train new QA Automation Engineer Automation Engineer (Ruby)   (Chris, all)

Dependencies
Ops dependency: MW Core dependency:
 * Deployment Tooling (see above)
 * Deployment Tooling (see above)
 * Vagrant

Last quarter actions

 * - Greg Bryan to send periodic updates about scap refactoring
 * Greg convene conversation with labs folks post migration re labs-vagrant (including OpenStack API etc)
 * : Have a plan for Vagrant
 * determine fit within test infra explicitly
 * : add MW release tarball as goal in next quarterly review
 * : figure out if a central developer to generate metrics on unit tests, maintaining the framework, etc

Goals
vis a vis the WMF Engineering 2013-14 goals.

Deployment tooling

 * (continued from last quarter) Process through all (useful) pain points from the Dev/Deploy review session - (Greg)
 * Integrate HHVM support into our deployment systems - (Bryan, Greg, ytbh RelEngineer, others from Platform)
 * start the scap(py) & trebuchet integration conversation
 * dependent upon beta cluster work below

Beta cluster

 * Support HHVM deployment tooling and puppet configuration testing - (Bryan, Antoine, ytbh RelEngineer)
 * Swift cluster in beta
 * RFC support

MediaWiki Release

 * Successfully support the release of MediaWiki 1.23 - (Antoine, Greg)
 * Kickoff/complete second RFP
 * Investigate and create useful release/deployment metrics visualizations - (Greg)
 * eg: # of builds per day, # of commits/day, # of deploys/day, etc

Browser tests

 * (From last quarter) Use tags to run builds appropriate to released versions (e.g. don't run master build on test2wiki)
 * Retire Cloudbees Jenkins instance
 * Integrate WMF Jenkins with new WMF SauceLabs account
 * Execute tests in parallel
 * Use API to create test data at runtime more widely (not just for MobileFrontend but also VisualEditor, Flow, local dev env etc.)
 * Add browsertests to new repos e.g. GettingStarted

Hiring

 * Complete hiring and train new Release Engineer (Greg, all)
 * Complete hiring and train new Automation Engineer (Ruby)  (Chris, all)

Questions

 * Placeholder for questions during review
 * Zuul vs Jenkins: What is the future? - conversation didn't happen
 * Phabricator - maybe a real possibility

Action items

 * create a plan for browser testing of MediaWiki 1.23 - Chris M
 * Greg get firm requirements from Antoine, circle back to Mark who'll have an idea of Op's timelime for production Shinken.
 * Figure out how to keep HHVM unit tests from delaying +2 for standard production commits (Antoine, Chris M, Zeljko)