Wikimedia Release Engineering Team/Quarterly review, August 2013

Date: August 21st, 2013

Time: 1:30pm Pacific (20:30 UTC)

Who: Topics: Deploy process/pipeline, release process, bug fixing, code review, code management, security deploy/release, automation prioritization
 * Leads: Greg G, Chris M
 * Participants (invited): Antoine, Sam, Chris S, Chad, Zeljko, Michelle G, Robla, Sumana, Quim, Maryana, James F, Ryan Lane, Ariel, Ken, Terry, Tomasz, Alolita

Big picture
Release Engineering and QA are where our efforts in Platform can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our focus.

What we want to accomplish:
 * More appreciation of, response to, and creation of tests in development
 * Better monitoring and reporting out of our development and deployment processes
 * Reducing time between code being written and being deployed, simultaneously discovering issues with code earlier in the process with more certainty.
 * Provide information about software quality in a way that informs release decisions
 * Help WMF Engineering learn from our mistakes, and adapt to what we learn.

What we've done

 * Built the Beta Cluster to be something that is instrumental in the quality of our code production
 * Provided embedded QA support to important feature teams (Language and Mobile)
 * Successfully transitioned to a one-week deploy cycle
 * Community growth through eg OPW
 * Virtual team creation

Still in progress

 * Proper support for all extensions in beta cluster https://bugzilla.wikimedia.org/show_bug.cgi?id=49846

Goals for the next quarter
We have a lot - see also, the list of sprints with associated tracking tickets
 * Align high profile features with QA testing levels
 * Increase capacity through community training for browser tests
 * Also included here is Gadget support
 * Improve monitoring and reporting of errors so we can prevent outages from occuring and/or respond to outages faster.
 * Improve our deployment process (mostly focusing on git-deploy and ensuring caches are invalidated when needed)
 * Take the Beta Cluster to the next level
 * monitoring of fatals, errors, performance
 * add more (non-feature specific) automated tests for eg the API
 * Provide hermetic test environments for developers/testers that mimic labs' abilities.
 * Apply model of Language/Mobile embedded QA to a new feature team (E3?)

Measures of success

 * Successfully integrate QA support in one more feature team (as defined by: more regular/predictable testing and more test coverage)
 * The monitoring of Beta Cluster sufficiently that our monitoring software notifies us of downtime, not failing browser tests.
 * The Beta Cluster has an equal amount of monitoring to that of production (just without the paging to Ops).
 * We are able to quickly spin up throw-away browser testing sessions per merge to master.

Qustions

 * Is E3 the right next team to apply the Mobile/Language team model to?
 * Priorities generally for QA test coverage effort: are we hitting this appropriately? How do we communicate this?
 * There's a lot to do, where should we prioritize?
 * Sign off for bigger feature deploys/enablings?
 * Is how we plan on measuring success sufficient for your needs?

Worries

 * Our goals are wide-ranging and need support from multiple teams, maybe moreso than your average goal list