Wikimedia Release Engineering Team/Checkin archive/2023-08-30

= 2023-08-30 =


 * Last time

🏆 Wins

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
 * Aug '23 edition


 * Developer Satisfaction Survey got presented
 * Gerrit repo archiving script for GitLab migrations \o/
 * Dan's back!
 * Gerritlab adoption
 * JWT auth changes
 * T272693 - reviewed non-standard phabricator policies
 * Downstream phabricator patches for php8 + logspam
 * Upstream phorge patches for logspam
 * Overwrote feed transaction default query in conduit (T344232#9092848)
 * Scap3 can now be configured to disable the service on secondary hosts: https://phabricator.wikimedia.org/T343447
 * Kokkuri is now using the new gitlab id tokens: https://phabricator.wikimedia.org/T337474
 * We're on Phorge (assuming it sticks)
 * Gitlab CI-built kask container image deployed today. (https://phabricator.wikimedia.org/T335691)
 * Gitlab local hacks in progress
 * Ahmon passed his CKA! Read Kubernetes in action
 * Merged 3 fixes to Phorge upstream for phab logspam
 * 🎉 Delayed announcement: Jeena's back, and she's a senior software engineer
 * Blubber refactor ripping out dockerfile passing acceptance tests—straight to llb
 * Added another pool to our DO cloud runner pull—memory-optimized
 * Refactored the patch to tune-down staging substatually(sp?)
 * Now there are 4 runner-controller runners running + 4 nodes ready to go
 * GerritLab commits merged to speed up sending patches and does the right things given GitLab's weirdness
 * scap backport bugfix

Last week
The six questions I answer week-by-week about our work. This is pretty much all CTPO/VP/Director-types see for what we're doing. If there are specific things to call out here, let's do.

On track


 * Progress update on the hypothesis for the week
 * GitLab (Pipeline Services Migration🐤) workboard for GitLab Pipeline Services – shows all services that can move to GitLab today. Tagged with teams (where available)
 * T335691 – Migrate mediawiki/services/kask to GitLab deployed this week (need to archive the old repository and we're done)
 * T300819 – Speed up stacked merge requests spent some time optimizing push requests for stacked patchsets in GitLab


 * Any new metrics related to the hypothesis
 * Repositories on Gerrit decreased (2022 last week → 2023 this week)


 * Any emerging blockers or risks
 * Finding teams to steward migration for some services may be tricky
 * Needs more investigation, but Timo's comment about services previously stewarded by the Platform team is a good example of the challenges (T344739#9119148)


 * Any unresolved dependencies - do you depend on another team that hasn’t already given you what you need? Are you on the hook to give another team something you aren’t able to give right now?
 * No


 * Have there been any new learnings from the hypothesis?
 * No


 * Are you working on anything else outside of this hypothesis? If so, what?
 * Migrated our Phabricator installation to Phorge as an upstream, now working on bugfixes and features there.
 * T344754 - Concurrently running Selenium tests end up captured in the same video causing confusion
 * gerrit:949986 Provide a secondary database in MediaWiki test suite (quibble)
 * Zuul migration from Buster to Bullseye
 * 🚂 MediaWiki 1.41.0-wmf.23
 * 678 Patches ▂▁▁▇█ in 187 repos by 58 authors
 * 0 Rollbacks ▁██▁▁
 * 0 Days of delay ▁▁▁█▁
 * 1 Blockers ▂▅█▅▁

🌻 Open source/Upstream contributions

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Upstream


 * https://github.com/yaoyuannnn/gerritlab/pulls?q=is%3Apr+is%3Aclosed

Code review

 * +1'd gerrit changes
 * (filed as: https://phabricator.wikimedia.org/T344361 )

Gerrit Access requests

 * Gerrit access requests

Private repo requests
https://phabricator.wikimedia.org/search/query/E7t2_WXX01bB/#R

Gerrit repo requests

 * https://www.mediawiki.org/wiki/Gerrit/New_repositories/Requests

GitLab Access requests

 * Accounts and auth -
 * GitLab access requests

High priority tasks

 * UBN! + High: https://phabricator.wikimedia.org/maniphest/query/PkxR1BXrbbU4/#R
 * New in inbox: https://phabricator.wikimedia.org/maniphest/query/7vRDrcVnt8OI/#R

📅 Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2023
 * https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off

August 2023

 * 09 Wed: International Day of the World's Indigenous Peoples, US staff with reqs
 * 11 Fri: Brennen out for Folks Fest (?)
 * 7-11 Mon-Fri: Dan out for family vacation
 * 31 Mon Jul – 21 Mon Aug – Antoine
 * 23 Fri Jun–18 Fri Aug: Jeena → Mongolia :D :D :D


 * 24 Aug–04 Sep: Brennen (🔥)
 * 27AugSun – 31AugThu: Andre

September 2023

 * 04 Sep: Labor day (US Staff with reqs)
 * 26 Aug–05 Sep: Brennen (🔥)
 * 13 Weds–17 Sun: Brennen → KS (approximate)

October 2023

 * 2-16 Oct: Jaime

Future

 * 15Jan - 15Mar: Andre

🔥🚂 Train

 * https://tools.wmflabs.org/versions/
 * https://train-blockers.toolforge.org/
 * https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar


 * 2 Jan - wmf.17 - Dan + Antoine (Jaime out)
 * 9 Jan - wmf.18 - Jeena + Dan (Jaime out)
 * 16 Jan - wmf.19 - Jaime + Jeena
 * 23 Jan - wmf.20 - Brennen + Jaime
 * 30 Jan - wmf.21 - Ahmon + Brennen
 * 6 Feb - wmf.22 - Chad + Ahmon
 * 13 Feb - wmf.23 – Dan + Chad
 * 20 Feb - wmf.24 – Antoine + Dan
 * 27 Feb - wmf.25 – Jaime + Antoine
 * 6 Mar – wmf.26 – Jeena + Jaime
 * 13 Mar – wmf.27 – Brennen + Jeena
 * 20 Mar – wmf.1 – Ahmon + Brennen
 * 27 Mar – wmf.2 – Chad Dan + Ahmon
 * 3 Apr – wmf.3 – Antoine + Dan
 * 10 Apr – wmf.4 – Chad + Antoine
 * 17 Apr – wmf.5 – Jaime + Chad
 * 24 Apr – wmf.6 – Jeena + Jaime
 * 1 May – wmf.7 – Brennen + Jeena
 * 8 May – wmf.8 – Antoine + Brennen (Ahmon out + Antoine Out 8th)
 * 15 May – wmf.9 – Ahmon + Antoine (Dan out + Chad out)
 * 22 May – wmf.10 – Chad + Ahmon (Dan out + Jeena out 26th)
 * 29 May – wmf.11 – Dan + Chad (Memorial Day 29th)
 * 5 Jun – wmf.12 – Jeena + Dan (Brennen out, Jaime out)
 * 12 Jun – wmf.13 – Jaime + Jeena
 * 19 Jun – wmf.15 – Cancelled for offsite
 * 26 Jun – wmf.16 – Brennen + Jaime (Jeena out)
 * 3 Jul – wmf.17 – Antoine + Brennen (3rd + 4th holidays)
 * 10 Jul – wmf.18 – Dan + Antoine (Ahmon out)
 * 17 Jul – wmf.19 – Ahmon+Dan (Brennen out Friday)
 * 24 Jul – wmf.20 – Jaime+Ahmon
 * 31 Jul – wmf.21 – Ahmon+Jaime (Jeena out, Antoine out) (Ahmon volunteered)
 * 7 Aug – wmf. 22 – No train
 * 14 Aug - wmf.23 – Ahmon+Jaime (Jeena out, Antoine out)
 * 21 Aug - wmf.24 – Dan(brennen out, Jeena out, Antoine out)


 * 28 Aug – wmf.25 – Jeena+Dan
 * 04 Sep – wmf.26 – Antoine+Jeena+Andre as lurker!
 * 11 Sep – wmf.27 – +Antoine
 * 18 Sep – wmf.28 – Brennen+
 * 25 Sep – wmf.29 –

Team discussions

 * Let's do this! https://phabricator.wikimedia.org/T264231

a no-stupid-quesitons fireside chat with Dan Duvall

 * What's this change? https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/commit/a6eab36c860b77c5855afcf34a5bab08a2d6e8b8
 * What pools of runners do we have?
 * Two runner pools
 * Four different node pools
 * General pool: unoptimized, standard DO droplets—control plane + controllers
 * Nginx controller
 * Runner controller (GitLab thing that schedules pods)
 * Anything except buildkit
 * Runners: cpu-optimized/memory-optimized
 * K8s tainted by default: this repels workloads, so unless you explicitly tolerate this taint, you will not be scheduled here
 * CPU-optimized: I tolerate the taint workload=cpu, counteract the tolerant of the taint
 * Memory-optimized: I tolerate workload=memory
 * IOps-optimized: buildkit runs here, nothing runs there except buildkitds—image building is io-intensive
 * How do we control which jobs go to which pool?
 * Both cpu-optimized + mem-optimized grab untagged jobs
 * If a job includes `cpu-optimized` then that will only go to the cpu-optimized pool; same for memory-optimized
 * People control where there job lands by adding one of those tags
 * Scaling
 * We messed with a horizontal pod autoscaling
 * There is no pod autoscaling for the runners
 * GitLab runner is polling for jobs and spawning pods and there's a concurrency setting there
 * gitlab-cloudrunner project has terraform template with this setting
 * `concurrency` and/or `limit` setting in runner yaml file.
 * https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/blob/main/gitlab/gitlab-runner-values.yaml.tftpl
 * Nodepool will respond by vertically scaling by spinning up new nodes (which is currently slow)
 * Takes 3 mins

Next time: Resource request limits in k8s