Wikimedia Release Engineering Team/Checkin archive/2021-02-24

= 2020-02-24 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * 15 Feb: Presidents' Day -- US staff with reqs
 * 22 Feb: Dan out


 * 29 Mar: US staff with reqs


 * 12 Apr: US staff with reqs
 * 22 Apr: Earth Day -- US staff with reqs


 * I made this: https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R
 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Important_dates


 * 16 Nov - wmf.18 - Ahmon + Antoine
 * 23 Nov - wmf.19 - No Train - Thanksgiving Thurs/Fri https://phabricator.wikimedia.org/T263185
 * 30 Nov - wmf.20 - Antoine + Mukunda
 * 7 Dec - wmf.21 - Mukunda + Dan
 * 14 Dec - wmf.22 - Dan + Jeena
 * 21 Dec - wmf.23 - No Train
 * 28 Dec - wmf.24 - No Train
 * 4 Jan - wmf.25 - Jeena + Lars Antoine
 * NB: Lars is only back from holiday on Thursday Jan 7
 * 11 Jan - wmf.26 - Lars + Jeena
 * 18 Jan - wmf.27 - Brennen + Lars (Monday is a holiday)
 * 25 Jan - wmf.28 - Ahmon + Brennen
 * 1 Feb - wmf.29 - Antoine + Ahmon
 * 8 Feb - wmf.30 - Mukunda + Antoine
 * 15 Feb - wmf.31 - Dan + Mukunda (Monday is a holiday)


 * 22 Feb - wmf.31 - Jeena + Dan
 * 1 Mar - wmf.31 - Lars + Jeena
 * 8 Mar - wmf.31 - Brennen + Lars

Status

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor

SoS

 * 2019-08-14 onwards: Zeljko 🎸 🎷 \o/
 * 2020-08-26 onwards: Deb is in charge/SoS is async
 * 2020-11-25: Brennen
 * 2020-12-02: Ahmon
 * 2020-12-09: Tyler
 * 2020-12-16: Antoine
 * 2021-01-06: Tyler
 * 2021-01-13: Text only update
 * 2021-01-20: Mukunda
 * 2021-01-27: Text only update
 * 2021-02-03: Thcipriani
 * 2021-02-10: Thcipriani
 * 2021-02-24: Thcipriani

Outgoing

 * Blocked by:
 * Blocking:
 * Updates:
 * [All] Deployments/Covid-19 https://wikitech.wikimedia.org/wiki/Deployments/Covid-19
 * Train Health
 * Last week: 1.36.0-wmf.31 T271344
 * This week: 1.36.0-wmf.32 T274936
 * Last week: 1.36.0-wmf.33 T274937
 * This week: 1.36.0-wmf.32 T274936
 * Last week: 1.36.0-wmf.33 T274937

Thanks

 * Serviceops unsticking VMs for GitLab
 * Moritz, jbond, godog for input on GitLab things

Incoming/Needs attention

 * Feedback on: https://lists.wikimedia.org/pipermail/wikitech-l/2021-February/094250.html
 * "Cool"/"Can I join the triage meeting"
 * Triage meeting: needs documented
 * TODO: Tyler to document ❌


 * TRAIN, train, train
 * Discuss the idea of having a Tuesday train checkin about current errors and whether to block/roll
 * Should it include the entire team?
 * Public, all WMF tech/product, or team private?
 * IRC or Slack?
 * Note from earlier discussion: Matters whether it's community-accessible.
 * Discussion notes
 * Question of whether signoff is meaningful before code rolls out
 * Question of timing for EU folks
 * Distinction between signoff for train to roll initially and log triage
 * Lars: Proposal to automate as far as deploy to group0
 * Tyler:
 * The pain of the process isn't that we have to deploy, it's that we have to care about other people's errors.
 * Want to make the overall process better, not just shift it around
 * Work to determine who knows what's going on is untracked
 * Jeena:
 * Re: Lars' proposal -
 * Lars: Augment previous proposal: How about we negotiate with Platform Engineering etc. to select a representative for each train. Go / no-go committee every week, we know who they are before the train starts
 * Brennen: fundamental problem -- if you have code going out you need to be watching logs -- however we get to that is how we make things better independent of the mechanisms of deploying
 * Jeena: Competing idea to go/no-go: Could have a RelEng partner for each product team that would help them do their CI and deployments in a more individualized way. So that they'd be on the hook but things could happen faster.
 * +1s from Lars, Mukunda
 * Tyler: Complementary to this idea, want to push forward https://wikitech.wikimedia.org/wiki/User:Thcipriani/Deployments/Patch_type_criteria
 * Please have a look at this.
 * Idea to get folks deploying their own code... People are relying more on the train.  First step might be deciding whether a change should ride the train or be backported.
 * Tyler: Actions: Some sort of proposal. Is the representatvie a good first step?
 * Ahmon: are those folks looking at the logs?
 * Dan: Instead of representatives could we have individuals? All of the people who wrote patches.
 * Brennen: sometimes mechanisms are better than org structures. Formalized mechanisms rather than mandated meetings might work better. You have to push this button if you want your code to code to stay deployed.
 * Antoine: CR+2 is already sort of this
 * Brennen: I guess I'm advocating for something like +2 for verified-in-production
 * Antoine: Staging / beta
 * Lars: post deployment voting on patches -- how do I know that *my code* is working.
 * Jeena: Manual testing of new code after code rolls out to each group
 * Brennen: Add a comment to each patchset that your code has reached groupX
 * We have a release tagger bot, but it doesn't say anything about whether your patch is actually in production.

Book club/Lunch and Learn

 * https://www.mediawiki.org/wiki/Wikimedia_Engineering_Productivity_Team/Book_club
 * https://www.mediawiki.org/wiki/Wikimedia_Engineering_Productivity_Team/Lunch_and_learn
 * https://www.mediawiki.org/wiki/Wikimedia_Engineering_Productivity_Team/Read_papers_and_talk
 * Feb 15th Mar 1st: Lars -- David Allen's Getting Things Done (GTD)
 * Jeena Suggestion: Falling Down: A guide
 * Brennen suggestion (maybe): nebulous Zettelkasten rant what is this?!
 * Zettelkasten is a note taking system

Monthly reflection on accomplishments - Feb '21 edition

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
 * Add as you have them!


 * PipelineLib fully working on releases-jenkins.wikimedia.org
 * Rust introduction talk (not strictly RelEng business)
 * logspam-watch minimum hits consolidation feature
 * Gearman plugin deployed. Merged bunch of pending changes + a fork from GoodData company which adds support for Pipeline jobs

Ahmon

 * Updates:
 * Verified that rebuildLocalisationCache.php doesn't currently require DB access in production. Will attempt to turn that into a policy so that building l10n files can safely be treated as a fully offline operation.
 * Thinking about approaches to a no-etcd mode for mediawiki-config.
 * Design change: Include l10n files in the built MW images. It's just better.
 * Blocked by:
 * none
 * Blocking:
 * none

Antoine

 * Updates:
 * Gearman plugin deployed. Merged bunch of pending changes + a fork from GoodData company which adds support for Pipeline jobs Canonical repo https://github.com/jenkinsci/gearman-plugin/ (added to monthly accomplishements)
 * Something about go garbage collection and large number of items which I found interesting (spoiler they went to Rust): https://blog.discord.com/why-discord-is-switching-from-go-to-rust-a190bbca2b1f.
 * Blocked by:
 * workflow-jobs are not registered. They are tied to `master` however it does not have any executor and thus no GearmanWorkerThread able to elect itself to register the function (reproduced locally).
 * Blocking:

Brennen

 * Blocked by:
 * Blocking:
 * Updates:
 * Went to the airport. It was weird.
 * GitLab
 * Kickoff meeting yesterday.
 * "GitLab (Initialization)" milestone for the init project: https://phabricator.wikimedia.org/project/view/5212/
 * Wrote up some request numbers for Gerrit to give S&F a rough idea what kind of traffic they should test GitLab instance against.
 * Today: Finish bashing out a description of desired auth situation.
 * logspam-watch is crying out for emojis🎉
 * Today: Finish bashing out a description of desired auth situation.
 * logspam-watch is crying out for emojis🎉

Dan

 * Blocked by:
 * Some docker container networking issue on releases1002
 * Blocking:
 * Updates:
 * Added a bunch of new features to pipelinelib to get m8s multiversion image build working
 * Releases jenkins can _almost_ build a multiversion image
 * Releases jenkins can _almost_ build a multiversion image

Jeena

 * Blocked by:
 * Blocking:
 * Updates:
 * Choo choo🚂
 * pet-expedition
 * Choo choo🚂
 * pet-expedition

Lars

 * Blocked by:
 * Computers
 * Blocking:
 * Good things
 * Updates:
 * Fixing train-dev which broke since Friday

Mukunda

 * Blocked by:
 * kibana is a bastard
 * Blocking:
 * Updates:
 * Have tried and failed at a bunch of different angles of recreating phatality. Latest idea is to pull data from the phabricator side instead of pushing from the kibana side. Details coming soon.
 * Have tried and failed at a bunch of different angles of recreating phatality. Latest idea is to pull data from the phabricator side instead of pushing from the kibana side. Details coming soon.

Tyler

 * Blocked by:
 * Myself: https://gerrit.wikimedia.org/r/c/mediawiki/tools/release/+/662778
 * ^ does someone know the magic line to have tox ignore the import?
 * add tox.ini section [flake8] with something like extend-ignore = W605,E501,E203
 * but with the right numbers from the console output
 * https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/scap/+/refs/heads/master/tox.ini#42 (link will stop working once anti-zombie change is merged, grab it NOW)
 * Blocking:
 * Updates:
 * GitLab
 * GitLab