Wikimedia Release Engineering Team/Quarterly review, February 2014

Date: February 13th | Time: 17:00 UTC | Slides:  | Notes

Who:
 * Leads: Greg G, Chris M
 * Virtual team: Greg G, Chris M, Antoine, Sam, Chris S, Chad, Zeljko, Andre, Bryan, Jeff, Rummana
 * Other review participants (invited): Robla, Sumana, Quim, Maryana, James F, Terry, Tomasz, Alolita

Topics: Deploy process/pipeline, release process, bug fixing (the how of priority), code review, code management, security deploy/release, automation prioritization

Big picture
Release Engineering and QA are where our efforts in Platform can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our focus. What we want to accomplish: ...All in an effort to pave the path to a more reliable continuous deployment environment.
 * More appreciation of, response to, and creation of tests in development
 * Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
 * Reduce time between code being merged and being deployed
 * Provide information about software quality in a way that informs release decisions
 * Help WMF Engineering learn and adapt from experience

Team roles
Many people outside of the virtual team play an important role in releases, but this review will focus on the work of the following people in the following roles:
 * Release engineering: Greg G, Sam, Chris S (security), Bryan Davis
 * QA and Test Automation: Chris M, Zeljko, Jeff Hall, Rummana
 * Bug escalation: Andre, Greg G., Chris M, Chris S (security)
 * Beta cluster development/maintenance:' Antoine, Sam
 * Development tools (e.g. Gerrit, Jenkins): Antoine

Goals from last quarter
Last quarterly review

Keep: New Priority: Deprioritize:
 * Successfully managed the first release of MediaWiki in conjunction with our outside contractor -
 * Work with Antoine on swift backed download.wikimedia -
 * Browser tests managed in feature repos with feature teams - mostly
 * more comprehensive quarter assessments of postmortems -
 * Automated API integration tests in important areas
 * UploadWizard -
 * Parsoid -
 * ResourceLoader - is a goal for Feb.
 * Expose fatal errors from both unit tests and browser tests to teams - Logstash in beta/production and the fatal monitoring script on Beta
 * Create process documentation for ideal test/deployment steps (eg: ThoughtWorks exercise) -
 * Successfully streamline the review and deployment of extensions (Done in Q1) - still
 * Manage build times, parallel execution, continuous execution of browser tests for optimum coverage (vague, ongoing goal) -
 * Focus on community contributions and non-WMF -

Goals
vis a vis the WMF Engineering 2013-14 goals.

Deployment Tooling

 * Process through all (useful) pain points from the Dev/Deploy review session
 * explicitly: Begin discussions (on list and/or wiki), complete discussions, distill requirements and next steps, prioritize
 * Scap incremental improvements
 * step 1:
 * Refactor existing scap scripts to enhance maintainability and reveal hidden complexity of current solution
 * step 2:
 * create matrix of tool requirements per software stack (MW, Parsoid, ElasticSearch)
 * Use above matrix to add/fix functionality in scap (or related) tooling for ONE software stack, prioritized by cross stack use

Hiring

 * Complete hiring and train new Test Infrastructure Engineer
 * Complete hiring and train new QA Automation Engineer

Browser tests
Goal: use the API to create test data for given tests at run time.


 * target dev environments with bare wikis/one off instances//vagrant/"hermetic" test environments
 * in support of teams who requested this, for example Mobile and public Mediawiki release
 * in support of browser tests on WMF Jenkins
 * requires thoughtful use of the API
 * first pass: create articles with particular title and content. Create users with particular names and passwords.

Goal: create the ability to test headless


 * targets build systems

Goal: run versions of tests compatible with target test environments


 * today we always run the master branch of browser tests. This is inconvenient, as target environments such as test2wiki lag beta labs by at least one week.
 * create the ability in Jenkins builds to run the versions of tests appropriate to the versions of extensions in the target wiki.
 * discussion is only begun, but this would be worthwhile.

Goal, ongoing: continue to consolidate duplicate code into libraries shared among repo-specific tests.
 * Example: login code is yet to be merged to common library

Goal, ongoing: continue to make builds as green as possible.
 * Example: https://bugzilla.wikimedia.org/show_bug.cgi?id=60037

Beta labs
Goal: continue to have beta labs emulate production more closely
 * fix lack of database slave setup could have demonstrated a Flow problem before it was deployed
 * Use beta labs as a testing ground for the above Deployment Tooling work

Dependencies
Ops dependency:
 * Provide true HTTPS support on the Beta Cluster -
 * Incinga for Beta Cluster -, but not yet accurate
 * https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep
 * http://ganglia.wmflabs.org/latest/?c=deployment-prep
 * http://icinga.wmflabs.org/cgi-bin/icinga/status.cgi?hostgroup=deployment-prep&style=detail

MW Core dependency:
 * Deployment Tooling (see above)
 * Vagrant

Questions

 * Are we doing enough to promote NON-browser testing? Are dev teams not writing unit tests, not writing integration tests, not doing monitoring, because browser tests are seen as the only automated tests? See http://martinfowler.com/bliki/TestPyramid.html
 * to fill in...