Wikimedia Release Engineering Team/Offsites/2018-05-Barcelona/Notes

These are the raw notes from our 2 days of offsite discussions.

Summary of action items[edit]

Data Data Data[edit]

Talk with Analytics - JR
Talk with CE/Bitergia - JR
Explore Bitergia - JR
Identify data sources we want to collect - RelEng (who know what systems)
Erik Bernhardson / Guillaume Lederrey

SWATs/Trains[edit]

Tyler reasses scap swat in mw-config from Mukunda
Look into parsing scap messages for known patterns and pulling out the data
Look into enabling scap start/done
Look into recording if mwdebug was used during the deploy (eg: 'scap stage')
H/Now will we get time for this?
Have Mukunda do a couple weeks of SWATs
- Mukunda has a lot to say about this subject.... writeup incoming

Staging[edit]

Greg to talk with Deb about what to do next with talking to Victoria
Greg to figure out how we can better market what we are accomplishing (eg "monthly showcase")
Get a k8s cluster from SRE for CI to deploy to.

Data Data Data[edit]

Lead: Jean-René

Data for code-stewardship reviews (historic data)
- Commits & patch sets
- Jenkins & CI, test results discarded after 15||30 days
Where can I put new kind of data/metrics. Is there a shared environment to store them?
jr: for example, talking to explanatory testers. No idea about the result of their work. Hard to get new QA testers on board. Role is broad, but a sure thing is they will either produce or consume testing data.
We have lots of data/dashboard, but we have not statistics over long-term
antoine: raita was the dashboard (but it has been decomissioned)
- Historic dashboard for metrics and data
- Dan: targeted towards browser-tests
Hypothetical Entity Relationship (ER) diagram
- Patchsets relate to deployments
- Deployments relate to outages
- Relationships in a tree format
Relationships between gerrit change and phabricator tasks
Developer/maintainers page. For an extension/skin JR would like to:
- Activity (commits and changes)
- Outstanding tasks
- How it follows mediawiki latest standard (ex: extension.json, versions of linters, test coverage etc)
- Tests that are running:
  - How frequent are errors
  - How many tests are failing
  - Average resolution for a failed test (E2E, unit tests failling on unrelated change because core changed months ago and extension is barely active)
  - The pace of changes being merged
- extension status, alpha, maintenance, wikimedia deployed, obsolete. That is mostly on mediawiki.org (partly in CI config as "archived")
Overview of stewardship
- https://www.mediawiki.org/wiki/Development_policy/Code_Stewardship
github pulse ( https://github.com/wikimedia/mediawiki/pulse ) -- do we want that?
- Human process oriented vs repository oriented (merges vs task closing)
- time to resolution (TTR) for tasks (filed to resolved/declined/whatever)
  - but this is only meaningful for "bugs" not other planning type tasks
what are the systems that we have, how do we normalize the data for those systems, where do we put it?
- A consistent interface for retrieving data
- We need to keep all the data that we can -- get data outside of jenkins (for example we could send that data to elasticsearch, but currently this is locked-up in jenkins)
We have an agreement that we'd like to collect all the test data...somewhere somehow
- RelEng is the best place for this data
- Do we set this up? Or do we work with other teams to do this?
- Proposal: prepare for a 20 minute analytics team at the hackathon
- A system: https://wikimedia.biterg.io/app/kibana#/dashboard/Overview (see also https://www.mediawiki.org/wiki/Community_metrics )
Stewardship creates these open questions, useful for annual planning as well
Going through, system-by-system, and finding out what data we want to store

Open Questions

Is our current analytics stack open for use by others in open ended ways?
- Example: https://pivot.wikimedia.org/ for page view/requests ( upstream: https://imply.io ). Lets one easily build whatever graph by country/browser etc
Analytics: Can we start dumping various data sources into a place and figure out how we're going to view/make sense of it later?
How can we interact with Bitergia to extend the data sources and views (poke Quim/Andre)
identify reviewers/maintainers: https://www.mediawiki.org/wiki/Git/Reviewers | https://www.mediawiki.org/wiki/Developers/Maintainers

Next Steps:

Talk with Analytics - JR
Talk with CE/Bitergia - JR
Explore Bitergia - JR
Identify data sources we want to collect - RelEng (who know what systems)
Erik Bernhardson / Guillaume Lederrey

SWATs/Trains[edit]

Lead: Tyler

Automating/improving logging of SWATs and Trains - https://phabricator.wikimedia.org/T193311 :
It would be nice to have concrete data about SWAT windows without having to dig in the SAL. Some nice-to-have info: number of syncs per SWAT window and time spent deploying patches for a given SWAT window.
Problem: We've wanted to change SWAT windows/deploys. People hated that we wanted to change things (namely: reduce # of patchsets deployed and how they are done). We need data to make informed decisions. eg: correlating syncs with swats and outages.
Definition: SWAT is three 1 hour windows per day for developers to propose hotfixes/config changes. Served by releng / deployment group users.
now we have sync and we have windows and they're only relation is through the wiki pages
out of scope:
- relating patches -> swat window
- proposing patches in a window
- Zeljko: we are just pushing buttons. We do not have much added value
NEEDs:
- Given a time window, get the list of syncs / patchset deployed (and utlimately a developer / point of contact)
- we need the data
- a place to display/query it
Minimal Viable Solution
- Have scap ask "is this a SWAT? y/n" each time it's not a full scap or --force
This Deployment did this Change associated with this Task.
what about...
- scap swat start (or: `scap swat` starts a shell)
- (query wiki page, list changes, etc)
- scap swat done
- See: "scap swat" patch from Mukunda
- ( https://gerrit.wikimedia.org/r/#/c/306259/ / https://phabricator.wikimedia.org/T142880 ). Demo: https://asciinema.org/a/1x54kw77tvatxiqv45ba6ael7
current documentation https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#Full_deployment
- current command: scap sync-file path/to/file 'SWAT: Commit message (T456)'
- if the comment is not in this format, scap asks you swat/gerrit/phabricator
not allow deploys without first indicating what window you're starting
- scap swat start or scap deploy start (or --force)
- that informs scap on what how to act/log
mw-config.php
- assume as soon as it's merged it's deployed

TODO

Tyler reasses scap swat in mw-config from Mukunda
Look into parsing scap messages for known patterns and pulling out the data
Look into enabling scap start/done
Look into recording if mwdebug was used during the deploy (eg: 'scap stage')
H/Now will we get time for this?
Have Mukunda do a couple weeks of SWATs
- Mukunda has a lot to say about this subject.... writeup incoming

Staging[edit]

https://docs.google.com/document/d/1CT_pKjwiDmFhZZ9LW9mz0z434-wgr3NFdapUPWUvMNA/edit?ts=5aba5398#heading=h.ra4sbg2fs7zl 2018-2019 annual plan https://www.mediawiki.org/wiki/Wikimedia_Technology/Annual_Plans/FY2019

Lead: Greg

The presentation
The project as defined by operations is incomplete

The response to Victoria
- We are here due to the initial issue of a choice between doing the Pipeline project vs a Staging project. That either/or is now a both/and.
- Operations wants an environment that can potentially prevent outages depending on how they define it. It could potentially prevent outages of services that we don't control nor deploy.
- We are making a survey to gather the current usage of the Beta Cluster that can help inform SRE's decisions/planning.
- We have defined use cases
- The other questions are best answered by SRE as they heavily depend on technical implementation decisions
- protocol changes as proposed are out of scope to this dicussion and truthfully feel like reach through micromanagement without any real data nor reasoning.

What RelEng needs:

Just to continue to do our positive interaction with SRE in our weekly Pipeline meetings
A simple part of that is for SRE to provide a k8s cluster and/or namespace for CI to deploy to (as previously discussed and agreed upon)

Idea (Dan) rebrand "deployment pipeline" project to "Continuous Delivery of MediaWiki Stack"

Greg to talk with Deb about what to do next with talking to Victoria
Greg to figure out how we can better market what we are accomplishing (eg "monthly showcase")
Get a k8s cluster from SRE for CI to deploy to.

Developer Productivity JD[edit]

Lead: Greg Blog post: https://squiggle.city/~frencil/archives/20150625.html#anatomy_of_a_healthy_job_post

You will be leading the effort to improve overall developer productivity. We will want you to create a replacement for our homebuilt Vagrant-based local development environment using the latest technologies such as Kubernetes (minikube), Docker, and Helm. You will be working closely with several teams and volunteers in the community.

Responsibilities

Help engineer container based tooling for MediaWiki application development and deployment
Maintain integration of developer tooling into a continuous delivery pipeline
Proactively find and create productivity improvements
Working in a highly collaborative and open organization and community

Requirements

Proficiency with software, systems, or devops engineering
Collaboration skills are as, if not more, important as technical skills
Experience with continuous integration/deployment systems
Experience with virtualization or container technologies
Experience with server configuration management software

Nice to haves

Free Software experience
Experience working in a remote-first organization
Experience using a Kubernetes environment
MediaWiki and/or Wikimedia project experience
Golang experience

Moving to a "everyone deploys their own changes" model (for SWAT)[edit]

Why are SWATs scheduled?
Why are there only a limited number of people in-charge of doing them?

Z: Would like everyone already staff/contractors to be able to do their own deploys. Z: lot of european swat users now self deploy (eg: Amir, David Causse).

Turn SWATs into "volunteer patch deployment" windows. If you are staff/contractor, you deploy your own thing when you need to do it.

Pipeline Demo[edit]

Lead: Dan/Tyler https://integration.wikimedia.org/ci/job/service-pipeline-test-only-debug Job using Jenkins Pipeline. Defined in Groovy.

Presentation of Blubber and pipeline
What is minikube

Blubber and MediaWiki + extensions[edit]

We use docker-pkg w/ Quibble and Blubber in the pipeline. Is problem? No. Not really.
- Use of docker-pkg is appropriate in domains that require/allow full control of Dockerfile and image build (root)
- Base images are controlled by SRE (operations/docker-images/production-images)
- CI images for use with Quibble are controlled by RelEng (integration/config)
Talked about whether we should use Quibble as entrypoint in pipeline testing. Should we? No. Probably not.
- Different use case. Quibble depends on environment that has superset of MW+ext dependencies. Blubber is meant to be repo-authoritative.
- EVERYTHING IS GREAT, AGAIN.
What does a Blubberized MediaWiki look like? For limited scope of FY1718Q4 goal ((MediaWiki + Math) + Mathoid)? For far future?
Discussion about how to deal with Debian dependencies and extensions depending on each other.
- For Q4 goal, we don't technically need to solve the ext dependency issue (Math does not depend on other extensions or skins)

Are we testing a lot[edit]

all quibble jobs -- combinations mysql/vendor/php70 mysql/composer/php70 mysql/vendor/php55 mysql/vendor/hhvmT:

php/js lint/eslint
qunit/phpunit
webdriver.io