Wikimedia Release Engineering Team/Offsites/2017-05-Vienna

Raw notes from our Offsite in Vienna.

= CI Interview Discussion =

QA Tribe

 * 1) More of this?
 * 2) * Naming things
 * 3) * Folks are confused by it/&#x22;tribe&#x22;
 * 4) * Not a lot of discussions of next-steps right now
 * 5) * Possibly need more time
 * 6) * Current Purpose -- give people a sense of community
 * 7) * people doing QA not on our team are isolated sometimes
 * 8) * shouldn't be just QA people (currently it's not, that should continue)

Tech Talks

 * 1) RelEng could drive this
 * 2) * Good for newbies
 * 3) * Has slowed down recently
 * 4) * Pull in external speakers
 * 5) * Could draw in external folks/community
 * 6) ** hashar does a lot of talks (greg-g: you should record those)
 * 7) * related to tech blogging
 * 8) * Stuff on wikis is different
 * 9) ** different writing style
 * 10) ** different sense of authority
 * 11) * CREDIT showcase
 * 12) ** small investment
 * 13) ** good overview of topic where we dig deeper later

Code health group

 * 1) To make it easier to develop good code together
 * 2) * Possibly biggest/most important sub-point here
 * 3) * Trend, other orgs have it
 * 4) * formalizes things we want to make better
 * 5) * &#x22;as an org&#x22; vs &#x22;heroics&#x22;
 * 6) * There is an informal cabal that does these kinds of things
 * 7) ** Nobody is aware
 * 8) ** No communication
 * 9) ** is organic
 * 10) ** may be good to formalize
 * 11) * May not be an issue now, but may become more of an issue
 * 12) * Could have their own backlog of things
 * 13) * Could sell to other teams to dedicate some resources to this
 * 14) * RelEng should do X vs. The org should move to X
 * 15) * There are folks who do this
 * 16) ** others
 * 17) * There are limitations to informality
 * 18) ** No roadmap
 * 19) * Victoria could be a stakeholder
 * 20) ** formalizes a direction
 * 21) * A microcosm of organizational problems
 * 22) ** no team in charge, just cabals doing organic things
 * 1) * A microcosm of organizational problems
 * 2) ** no team in charge, just cabals doing organic things

Open questions

 * 1) How do QA tribe and Code health group mesh?


 * 1) * If Code Health group becomes more formal, then QA Tribe does less
 * 2) * A centralized code test group vs an operations-like QA team
 * 3) ** embedded testers
 * 4) ** success and failure in both variants
 * 5) ** lack of authority for embedded folks
 * 6) What does a code health group do?


 * 1) * What could we be doing/what should we be doing?
 * 2) * What makes sense organizationally?
 * 3) * Specifically
 * 4) ** Code coverage
 * 5) ** Dashboards
 * 6) ** Vision statement
 * 7) * Do we need folks from all teams on the code health group?
 * 8) ** Need investment from other teams
 * 9) ** Development stakeholders from many teams
 * 10) ** How much development time will other teams put into code health priorities?
 * 11) * Sponsorship
 * 12) ** Victoria as CTO is sponsor
 * 13) * best practices and assessments
 * 14) * similar to tpg
 * 15) ** promotes scrum to the remainder of the organization
 * 16) ** start small with folks who want to do this
 * 17) * Making tools that give you an overview of all repositories
 * 18) * How to develop software in a healthy way

estimation

 * self-inflicted schedule issues
 * greatly under-estimate time to do things
 * general problem with quarterly planning, etc
 * needs to bubble-up to org-planning-level
 * there is currently a discussion that greg will bring up
 * Isolated discussion and process + dialog
 * eventualisim can create estimation problems
 * green check marks on the quarterly goals are perverse incentives
 * misaligned with how we work

bug mgmt

 * features vs bugs are not differentiated in phab (or bugzilla)
 * need for identifying &#x22;rework&#x22; (i.e. bugs)
 * upstream changes in phab will make this easier
 * should do retroactively?
 * Andre is kinda doing
 * paladox? GSoC? Not hard, just needs doing?
 * Would like a baseline for &#x22;rework&#x22;
 * &#x22;rework&#x22;
 * Post-release rework vs pre-release problems
 * Tech debt
 * conscious decision vs unanticipated problems
 * 2 kinds of bugs
 * pre-release/features/etc
 * bugs that escape to the wider world

test metrics

 * collecting and reporting
 * pass/fail and track progress for tests
 * visibility (i.e. testing dashboard)
 * Keeping data for a longer period of time
 * Currently keeping a varying amount of data kept for varying amount of time
 * The amount of tests and quality varies a lot per-extension
 * some extensions have bitrot
 * maybe run the tests periodically
 * this is a low priority since it's not in production
 * Are tests that have never failed bad tests?
 * Collecting the right metrics and making sure they're actionable metrics

test strategy

 * common base for testing, even vocabulary
 * i.e. some unit tests are not really unit tests

Browser-based

 * Had some success
 * Should maybe be expanded

test infra

 * it's an improvement
 * environmental parity
 * SSD is work in this area
 * beta cluster is an afterthought, sometimes
 * this depends on the team
 * test data parity
 * we don't need all the data
 * depending on the kind of testing, could use production stuffs
 * dumps of smaller wikis and overwrite data occasionally
 * putting thought into which wikis are supported in beta
 * test setup takes a long time
 * how much of a production stack do we want to provide?
 * Being able to reproduce what CI does locally

= Topics =

How to formalize Code Health group

 * votes 6
 * formalize it
 * make it public

What do we want from this?

 * Reason for existence
 * 1 page overview for Victoria
 * Success factors for this group

quotes

 * quotes from Code Health: Google's Internal Code Quality Efforts

Any aspect of how software was written that can increase the readability, stability, maintainability and simplicity Improve the lives of engineers

Success Factors

 * Not &#x22;heroic&#x22; vs funded(?) with feelings
 * sponsorship
 * cannot be an unfunded mandate
 * senior leadership behind it
 * encourage participation
 * &#x22;official&#x22;
 * diffuse bad behavior
 * Guinea pig
 * mobileweb -- eager and care about quality/willing to test
 * hashar: We can't really determine success by ourselves in a vacuum
 * hashar: RelEng's criteria is to have a team with success criteria
 * greg: can't sell that :)
 * hashar: let's build it first, then sell it
 * jr: what's the cross-organizational benefit and value for this, what's the success criteria if it works for a team
 * start small vs start big -- top-down vs bottom-up

Long-term vision (selling)

 * Victoria cares about tech-debt
 * removing tech debt vs preventing tech debt
 * disseminates knowledge through organization
 * able to attack more complex problems since dedicated group (vs. individual contributors)

Outcomes

 * Healthy code
 * Metrics
 * cyclomatic complexity
 * code coverage
 * Support/mandate
 * &#x22;renewed focus on QA&#x22; outcomes
 * Code health group
 * It's a good idea to start small, i.e. test with mobile web
 * Management wants long-term vision
 * Quality vs Quality Assurance
 * code health group can assure big Q quality
 * Make developers more quality oriented
 * testing does not provide value, lack of bugs provides value
 * Enable developers to do the best work they can
 * test engineer enables the team to test
 * high-level criteria, i.e. success factors
 * couple people on releng + j.robson
 * this is comparable to efforts by the security and performance team
 * reduces duplication of effort
 * code health group documents best practices
 * prescriptive
 * IDEA: adding code-health outcomes to roadmap
 * extension quality assessment

What does this look like?

 * weekly meeting
 * core permanent group that is a steering committee
 * steering committee needs to be cross-org
 * J.R. leads steering committee + liaison from RelEng
 * don't call it qa -- &#x22;code health&#x22;
 * rotating cast of supporting characters in addition to core
 * interested in specific areas
 * or has past experience
 * Determine scope within group
 * defining a roadmap for code health

VISION
Make things less complex so developers can quickly add value to our product.

What the Code Health group does is work on efforts that universally improve the lives of engineers and their ability to write products with shorter iteration time, decreased development effort, greater stability, and improved performance.
 * example: by decreasing code complexity we increase code success whatever


 * 1) TODO Add word &#x22;value&#x22; here

Triage of projects

 * votes: 3
 * Project workboards should be use for categorization
 * RelEng team workboard used for overall view

Projects
Some projects benefit from triage


 * scap
 * ci infra
 * browser tests

Reorganization of projects in phab
&#x3C;2017-05-15 Mon&#x3E;

Dashboard Testing Health

 * votes: 5
 * likely to fail, but worth trying! :)
 * CI artifacts details
 * No easy way to find out historical pass/fail rates
 * Does not answer if you can merge:
 * What information is available for planning
 * Historical information
 * Need a shared understanding of terms: e2e, smoke tests, etc.
 * information is more interesting for functional tests historically
 * mobile team gates with browser tests, other teams have browser tests passing at the end of the sprint
 * Running browser tests against testwiki after branch cut would be neat
 * There is very little visibility into a lot of manual testing activity
 * No way to alert the org about big features that are rolling out as part of a deployment
 * Attempt to quantify &#x22;quality&#x22;/&#x22;health&#x22; of our code
 * Could be used to make deployment decisions
 * Basic information: SLOC churn, complexity, coverage, logspam is useful for release-like decision making
 * We currently find our problems in production
 * Used to have a Friday deployment meeting to make deployment decisions
 * Where can people raise a flag about the train?
 * deployment blockers task
 * let's think about the dashboard as a way to help teams plan their work and think about code health
 * let's think about the dashboard as a way to help teams plan their work and think about code health

Users/Uses

 * Identify real issues behind breakage
 * facebook and the one-person project
 * Phabricator has a tag for each branch for hotfixes
 * could feed into the dashboard
 * What is the data that is useful for planning activity

Who/What/How/How much

 * WHAT: SLOC churn, complexity, coverage, logspam, code breakdown
 * Ask code health group
 * extract code-review information from gerrit
 * phabricator bug-count could be useful
 * gerrit: number of changes, review backlogs, etc.
 * No coverage for extensions
 * Core is generated once a day
 * Code complexity for PHP stuff -- nothing currently
 * code climate
 * Sonar cube
 * Puppet out of scope
 * MediaWiki/config is a fucking mess.
 * Code Health Group as shareholder
 * Analytics team has knowledge and tooling to store random metrics
 * tooling
 * junit for unittest output
 * clover for code coverage
 * composer and npm for linting tools
 * phpcs
 * phplint
 * https://www.mediawiki.org/wiki/User:Legoktm/ci

Little Steps

 * votes: 4
 * Background:
 * Nodepool caused openstack pressure and the cloud team was mad
 * Meanwhile: CI was super slow and other people were mad

Annual Planning next steps

 * votes: 3

SSD / Pipeline Planning

 * Goal: be able to run (docker) containers in production, and use those same containers in development (and potentially CI).


 * Container technology, not tied to docker directly. Part of why Dan built Blubber. Right now docker is most stable, but maybe move to rkt in the medium term future for reasons (eg: has an init system whereas docker doesn't)
 * In the interirm we're doing docker.
 * Recommended tech is blubber, if you use blubber we'll help you migrate. If not, you're on your own.
 * Blubber: what you run locally to build you test image. Currently Services use service-runner to do similar things. Blubber compiles down into a binary (it's Go). Also yay types in Go.
 * (troll on tcl)
 * Docker/k8s is also in Go.
 * Blubber input/config files is yaml. Dumps out the dockerfile.
 * yaml -> Blubber -> dockerfile -> Jenkins build the docker image
 * Point: reproducible environments.
 * CI is out of scope for now. If we want to be a part of the pipeline at all, this is the compromise.
 * In Lyon, Joe and Yuvi did a here's what k8s is and mesos and etc, we really think will save our resource allocation.
 * Services/gwicke: in grand SOA future third-parties wouldn't be able to install "MW". Docker is the solution for third-parties.
 * Pragmatism makes everyone attend the Monday meeting :)
 * ci-staging jenkins can build from the blubber yaml and push into the ci-staging docker registry
 * Ops is supposed to give us access to the swift backed registry.
 * Ops is getting hardware for the staging k8s cluster. Should be getting it soon/now-ish. Just be services for a bit.
 * End of this quarter (Q1): build mathoid in Jenkins and push to the new staging cluster.
 * How big are images? depends. Ops is working on building base images. About 300megs for base image. Then you have a node-based image which is base image + node packages. Then you use that for eg: mathoid. kinda like service-node (the base puppet for all node services)
 * docker is as shitty of a solution as all the other solutions

What do we need from Ops to complete the CI work?

 * CI cluster from Ops
 * We maintain k8s for CI
 * We jetison to Travis
 * We do a temporary container thingy
 * we need buy-in and commitment to migrate to a shared

What's happening next year

 * Code Health group work
 * 13 remaining trebuchet repos
 * SSD
 * Little Steps + Cloud team talk

Trying to figure CI in SSD

 * replace nodepool
 * replace most of CI
 * integrate everything in the base image
 * migrating CI is us + cloud that's interested

Other thoughts

 * Scap3 work will be discarded, this is frustrating

CI work

 * php55 on jessie with ops support
 * nodepool remains on jessie and stretch
 * container to run images

Part of SSD

 * ops provide a k8s cluster for CI
 * using production registry for CI

Move to travis?

 * takes time

problem

 * : we're on nodepool and no one is happy

solutions:

 * 1) CI K8s Cluster from ops
 * 2) we maintain a k8s cluster (NOOOOOOOOOOOOOOOOOO!)
 * 3) we move to travis

temporary contianer things

 * 1) Why do I need k8s?
 * 2) Do I need k8s?
 * 3) registry or portion of
 * 4) creates code ghettos
 * 5) Can ops maintain a bunch of servers that have just docker installed?
 * 6) if we have budget for servers can ops just use those?
 * 7) what's the delta between what's already happening vs what we need?

Next steps

 * 1) productionize blubber
 * 2) define workflow pipeline in jenkins (pipeline is groovy)
 * 3) Jenkins plugin for k8s
 * 4) Jenkins master speak to k8s cluster
 * 5) Jenkins master public internet
 * 6) Jenkins artifacts capture

Quarterly breakdown

 * Now: productionize blubber to push images to staging
 * Q1:
 * staging cluster e2e tests
 * other services move to staging?
 * Jenkins master to speak to k8s cluster (changes to jenkins master? Ops may need changes for Reasons™)
 * Use minikube to PoC with Jenkins plugins
 * use blubber to produce test images that can be used on CI-k8s
 * assumption: ops is working on CI-k8s cluster
 * outcome: have k8s requirements for cluster
 * Q2: ops CI-k8s, Jenkins master

Container/Blubber annual plan

 * Mathoid PoC work
 * Now: productionize blubber to push images to staging
 * local development (minikube)
 * build &#x22;production&#x22; images in Jenkins for use in staging (based on mathoid work)
 * staging k8s cluster (ops)
 * end-to-end tests in staging -- MUST BEFORE PRODUCTION
 * What does e2e test mean in this context?
 * Is this a functional test?
 * Keep in mind that webdriver is needed for real e2e testing
 * after a bunch of services have migrated:
 * put MW in front
 * Run webdriver
 * Profit
 * Push production-ready images to production
 * Other services → production
 * CI Infra (depends on staging k8s cluster)
 * CI Infra (depends on staging k8s cluster)


 * 1) * CI-K8s cluster
 * 2) * Use minikube to PoC with Jenkins plugins (while waiting for CI-K8s permananent cluster)
 * 3) * Jenkins master to speak to k8s cluster (changes needed for jenkins master? Ops may need changes for Reasons™)
 * 4) * use blubber to produce test images that can be used on CI-k8s
 * 5) Migrate CI Jobs to containers + k8s


 * 1) * depends on CI Infra
 * 2) * There is a lot of work that need to happen here

Kill Trebuchet

 * Kill all the things (T129290)

Webdriver.io

 * CI for extensions (hackathon is when work starts)
 * CI only in core now
 * Workshop (@ hackathon &#x26;&#x26; online)
 * Pairing with folks
 * 6 months from &#x3C;2017-05-16 Tue&#x3E; kill ruby MediaWiki stuffs
 * End of Q2 No Ruby, Node only
 * Projects needing assistance:
 * CirrusSearch
 * Wikibase
 * Announce deprecation
 * No Ruby support in new CI (see Container/Blubber annual plan)

Code Health Group

 * scope/vision
 * form a steering committee
 * roadmap
 * sell it:
 * Success factors:
 * leadership sponsorship
 * steering committee formed
 * funded resource allocation
 * find a team to work with (mobileweb)

MediaWiki test runner standalone

 * 1) nitpicker/shitbot/chipotte

MediaWiki decouple unit / component / integration tests

 * votes: 3
 * magical explosions

Problem 1

 * Unit tests have dependency chains
 * we end up running a lot of tests that should all pass
 * this is slow
 * example: Math depends on visualeditor which depends on Cite, a change in Math will run tests for VisualEditor and Cite

Problem 2

 * When there are changes to extensions that are depended-upon by other extensions that breakage is not bubbled-up anywhere
 * example: Math depends on VisualEditor which has a change that breaks the Math extension, but that breakage goes unnoticed, untracked

Idea

 * pre-merge!
 * Ensure that an extension being merged has its tests run, but do not run the tests of that extension's dependencies
 * Ensure that all extensions that depend on the extension being merged have their integration tests run

MediaWiki/core

 * Need some mechanism by which we identify the extensions that depend on a particular piece of core
 * IDEA1: submitter determines which extensions need their integration suite's run
 * IDEA2: ShitBot

= Parking Lot =

DONE Next steps for Tech Talks
Up to 5 minutes
 * New stuff in scap


 * new stuff in CI
 * Nodepool: what's this thing?
 * Nodepool: how are we fixing it :)
 * webdriver.io browser test updates
 * demo mathoid PoC
 * How #RelEng uses Phab for work management (our workboards, sub-projects, dashboards, and milestones)

DONE Triage of projects

 * has fallen off
 * less frequent triage meetings maybe?

= Post offsite TODOs =

✅ - greg: bubble up estimation discussion for org annual planning

 * Discussed. Point taken by both Victoria and the rest of tech-mgt. Seemed to be how other teams were experiencing the world as well.

- JR: reach out to Google for help with Code Health meetings

 * this stuff may be Google-specific
 * delicious secret sauces
 * What did you learn trying to put this together
 * JR Having lunch with Google Code Health Lead on July 12th

- JR: Code Health Dashboard

 * Ask code health group what belongs on the dashboard
 * Talk with analytics about this
 * Long term storage of jenkins artifacts (elastic search?)
 * Investigate options for code complexity and coverage

✅ - Greg: Phame blog for techblog content
"Outcome of the tech-mgt F2F. Rough consensus to re-enable."
 * We (RelEng) should write up a "why we're turning it back on" as the first new post from the "Doing the Needful" blog. tl;dr: Tech-mgt wants a place to share technologist focused blog posts. The Wikimedia Blog is not it (it is the wrong audience and the process is too heavy weight). This meets our needs easily and is low cost (as in person time).

- Greg (and Chad, Mukunda, Tyler): JFDI the changes to deploy branch cutting and train cycle
Discussed at team offsite. Let's finally just put the branch cut on a timer. And, at the same time, let's make any changes to the deploy cycle/cadence and use of Beta Cluster that would benefit from such a timer (eg: cutting on Thursday, deploying to a multiversion'd  (or whatever) over Friday and the weekend for E2E and manual testing.