Wikimedia Release Engineering Team/Quarterly review, February 2014

Date: February 13th | Time: 17:00 UTC | Slides: gdocs | Notes:: on wiki

Who:

Leads: Greg G, Chris M
Virtual team: Greg G, Chris M, Antoine, Sam, Chris S, Chad, Zeljko, Andre, Bryan, Jeff, Rummana
Other review participants (invited): Robla, Sumana, Quim, Maryana, James F, Terry, Tomasz, Alolita

Topics: Deploy process/pipeline, release process, bug fixing (the how of priority), code review, code management, security deploy/release, automation prioritization

Big picture

Release Engineering and QA are where our efforts in Platform can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our focus. What we want to accomplish:

More appreciation of, response to, and creation of tests in development
Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
Reduce time between code being merged and being deployed
Provide information about software quality in a way that informs release decisions
Help WMF Engineering learn and adapt from experience

...All in an effort to pave the path to a more reliable continuous deployment environment.

Team roles

Many people outside of the virtual team play an important role in releases, but this review will focus on the work of the following people in the following roles:

Release engineering: Greg G, Sam, Chris S (security), Bryan Davis
QA and Test Automation: Chris M, Zeljko, Jeff Hall, Rummana
Bug escalation: Andre, Greg G., Chris M, Chris S (security)
Beta cluster development/maintenance:' Antoine, Sam
Development tools (e.g. Gerrit, Jenkins): Antoine

Last Quarter Review

Goals from last quarter

Last quarterly review

Keep:

Successfully managed the first release of MediaWiki in conjunction with our outside contractor - Status: Done
- Work with Antoine on swift backed download.wikimedia - Status: Done
Browser tests managed in feature repos with feature teams - mostly Status: Done
more comprehensive quarter assessments of postmortems - Status: in-progress

New Priority:

Automated API integration tests in important areas
- UploadWizard - Status: Done
- Parsoid - Status: Done
- ResourceLoader - is a goal for Feb.
Expose fatal errors from both unit tests and browser tests to teams - Status: Done (Logstash in beta/production and the fatal monitoring script on Beta)
Create process documentation for ideal test/deployment steps (eg: ThoughtWorks exercise) - Status: Done

Deprioritize:

Successfully streamline the review and deployment of extensions (Done in Q1) - still Status: Done
Manage build times, parallel execution, continuous execution of browser tests for optimum coverage (vague, ongoing goal) - Status: Not done
Focus on community contributions and non-WMF - Status: Not done

Actions from Last Quarter

ACTION: Need to document the feature requirements of sartoris/etc - possible task for Bryan Davis after scholorship app (GG)
ACTION: clarify priority of work with antoine re vagrant spin ups for Jenkins builds (GG)
ACTION: change checkins to weekly (GG)
ACTION: revamp meeting style and project management (GG)

Goals and dependencies for the next quarter

Goals

vis a vis the WMF Engineering 2013-14 goals.

Deployment Tooling

Process through all (useful) pain points from the Dev/Deploy review session (Greg)
- explicitly: Begin discussions (on list and/or wiki), complete discussions, distill requirements and next steps, prioritize
Scap incremental improvements
- step 1:
  - Refactor existing scap scripts to enhance maintainability and reveal hidden complexity of current solution (Bryan)
- step 2:
  - create matrix of tool requirements per software stack (MW, Parsoid, ElasticSearch) (Greg)
  - Use above matrix to add/fix functionality in scap (or related) tooling for ONE software stack, prioritized by cross stack use (Bryan)

Hiring

Complete hiring and train new Test Infrastructure Engineer (Chris, all)
Complete hiring and train new QA Automation Engineer (Chris, all)

Browser tests

Goal: use the API to create test data for given tests at run time. (Jeff, Chris, Željko)

target dev environments with bare wikis/one off instances//vagrant/"hermetic" test environments
- in support of teams who requested this, for example Mobile and public Mediawiki release (Chris)
- in support of browser tests on WMF Jenkins (Jeff, Željko)
- requires thoughtful use of the API
  - first pass: create articles with particular title and content. Create users with particular names and passwords.

Goal: create the ability to test headless (Željko, Jeff, Chris)

targets build systems (Antoine, all)

Goal: run versions of tests compatible with target test environments (Chris, all)

today we always run the master branch of browser tests. This is inconvenient, as target environments such as test2wiki lag beta labs by at least one week.
- create the ability in Jenkins builds to run the versions of tests appropriate to the versions of extensions in the target wiki.
  - discussion is only begun, but this would be worthwhile.

Ongoing:

Continue to move shared code to shared repo; e.g. Login
Continue to maintain tests and keep them green, e.g. connection issues

Beta labs

Goal: continue to have beta labs emulate production more closely (Antoine, all)

Make database in beta emulate production (set up db slaves) (Antoine)
- This could have demonstrated a Flow problem before it was deployed
Use beta labs as a testing ground for the above Deployment Tooling work (Greg, Bryan, all)

Dependencies

Ops dependency:

Deployment Tooling (see above)

MW Core dependency:

Deployment Tooling (see above)
Vagrant

Questions

Are we doing enough to promote NON-browser testing? Are dev teams not writing unit tests, not writing integration tests, not doing monitoring, because browser tests are seen as the only automated tests? See http://martinfowler.com/bliki/TestPyramid.html
to fill in...

Actions

ACTION: Greg to send periodic updates about scap refactoring
ACTION: Greg convene conversation with labs folks post migration re labs-vagrant (including OpenStack API etc)
ACTION: Have a plan for Vagrant
- determine fit within test infra explicitly
ACTION: add MW release tarball as goal in next quarterly review
ACTION: figure out if a central developer to generate metrics on unit tests, maintaining the framework, etc