Jump to content

Wikimedia Release Engineering Team/Checkin archive/2023-09-06

From mediawiki.org


[edit]

πŸ† Wins[edit]

Aug '23 recap
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
Sep '23 edition
  • Image published for Blubber that is native LLB, no dockerfile anymore
    • implications
      • dockerfile is unnecessary since no one sees the dockerfileβ€”we can customize each llb instruction and what it displays to the users: a name that corresponds to the blubber.yaml config
      • now we have the ability to create our own instructions
      • dockerfile2llb gone! No more external helper images that haven't been maintained just to copy files aroundβ€”no more cross-platform compatibility/emulation issues
      • llb gets new stuff firstβ€”ex: diffop/mergeop https://www.docker.com/blog/mergediff-building-dags-more-efficiently-and-elegantly/
  • Phorge working on the scap3∞ deployment environment
  • Landed 3 upstream phorge patches, 1 is one we've had for years the blocks some tasks rendering (T284397)
  • Patch for T&S could outputs the MediaWiki SUL account along with the phab username (T344303)

OKR update[edit]

Last week[edit]

The six questions I answer week-by-week about our work. This is pretty much all CTPO/VP/Director-types see for what we're doing. If there are specific things to call out here, let's do.

On track

  • Progress update on the hypothesis for the week
    • T345000 – Create a separate memory optimized GitLab runner pool for memory hungry jobs. We created a cpu-optimized and memory-optimized GitLab runners this week
  • In the process tweaked the size of our staging cluster to save cost
    • T300819 – Created UI to make stacked merge requests clearer (upstream)
    • T337570#9133281 – Local Gems for our GitLab instance in testing on our devtools instanceβ€” hopefully enables lots of UI customization.
  • Any new metrics related to the hypothesis
    • Repositories on Gerrit decreased (2022 last week β†’ 2020 this week)
  • Any emerging blockers or risks
    • Reached out/set up conversations about pulling apart/scheduling migrations of repos (for T344739 – Old Platform Team projects + T344733 – Metrics Platform as I believe they're unblocked)
  • Any unresolved dependencies - do you depend on another team that hasn’t already given you what you need? Are you on the hook to give another team something you aren’t able to give right now?
    • No
  • Have there been any new learnings from the hypothesis?
    • No
  • Are you working on anything else outside of this hypothesis? If so, what?
    • MediaWiki 1.41.0-wmf.24
      • 309 Patches β–β–β–‡β–ˆβ–‚
      • 0 Rollbacks β–ˆβ–ˆβ–β–β–
      • 0 Days of delay β–β–β–ˆβ–β–
      • 1 Blockers β–…β–ˆβ–…β–β–
    • T345458 – Refactor Blubber's BuildKit frontend gateway to use LLB directlyβ€”enables some nicer features in our docker image builds
    • T343967 – Bugfixes for scap backport deploying two stacked patches

This week[edit]

Progress update on the hypothesis for the week

  • Β 

Any new metrics related to the hypothesis

  • Β 

Any emerging blockers or risks

Any unresolved dependencies - do you depend on another team that hasn’t already given you what you need?

  • Β 

Are you on the hook to give another team something you aren’t able to give right now?

  • Β 

Have there been any new learnings from the hypothesis?

  • Β 

Are you working on anything else outside of this hypothesis? If so, what?

  • Β 

🌻 Open source/Upstream contributions[edit]

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Upstream

😢 Let's keep these empty[edit]

Code review[edit]

Gerrit Access requests[edit]

Private repo requests[edit]

https://phabricator.wikimedia.org/search/query/E7t2_WXX01bB/#R

Gerrit repo requests[edit]

GitLab Access requests[edit]

High priority tasks[edit]

πŸ“… Vacations/Important dates[edit]

https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2023
https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off

September 2023[edit]

  • 04 Sep: Labor day (US Staff with reqs)
  • 08 Sep: Tyler
  • 15, 18 Sep: Tyler
  • 26 Aug–05 Sep: Brennen (πŸ”₯)
  • 13 Weds–17 Sun: Brennen β†’ KS (approximate)

October 2023[edit]

  • 2-16 Oct: Jaime

Future[edit]

  • 15Jan - 15Mar: Andre


πŸ”₯πŸš‚ Train[edit]

https://tools.wmflabs.org/versions/
https://train-blockers.toolforge.org/
https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar


  • 2 Jan - wmf.17 - Dan + Antoine (Jaime out)
  • 9 Jan - wmf.18 - Jeena + Dan (Jaime out)
  • 16 Jan - wmf.19 - Jaime + Jeena
  • 23 Jan - wmf.20 - Brennen + Jaime
  • 30 Jan - wmf.21 - Ahmon + Brennen
  • 6 Feb - wmf.22 - Chad + Ahmon
  • 13 Feb - wmf.23 – Dan + Chad
  • 20 Feb - wmf.24 – Antoine + Dan
  • 27 Feb - wmf.25 – Jaime + Antoine
  • 6 Mar – wmf.26 – Jeena + Jaime
  • 13 Mar – wmf.27 – Brennen + Jeena
  • 20 Mar – wmf.1 – Ahmon + Brennen
  • 27 Mar – wmf.2 – Chad Dan + Ahmon
  • 3 Apr – wmf.3 – Antoine + Dan
  • 10 Apr – wmf.4 – Chad + Antoine
  • 17 Apr – wmf.5 – Jaime + Chad
  • 24 Apr – wmf.6 – Jeena + Jaime
  • 1 May – wmf.7 – Brennen + Jeena
  • 8 May – wmf.8 – Antoine + Brennen (Ahmon out + Antoine Out 8th)
  • 15 May – wmf.9 – Ahmon + Antoine (Dan out + Chad out)
  • 22 May – wmf.10 – Chad + Ahmon (Dan out + Jeena out 26th)
  • 29 May – wmf.11 – Dan + Chad (Memorial Day 29th)
  • 5 Jun – wmf.12 – Jeena + Dan (Brennen out, Jaime out)
  • 12 Jun – wmf.13 – Jaime + Jeena
  • 19 Jun – wmf.15 – Cancelled for offsite
  • 26 Jun – wmf.16 – Brennen + Jaime (Jeena out)
  • 3 Jul – wmf.17 – Antoine + Brennen (3rd + 4th holidays)
  • 10 Jul – wmf.18 – Dan + Antoine (Ahmon out)
  • 17 Jul – wmf.19 – Ahmon+Dan (Brennen out Friday)
  • 24 Jul – wmf.20 – Jaime+Ahmon
  • 31 Jul – wmf.21 – Ahmon+Jaime (Jeena out, Antoine out) (Ahmon volunteered)
  • 7 Aug – wmf. 22 – No train
  • 14 Aug - wmf.23 – Ahmon+Jaime (Jeena out, Antoine out)
  • 21 Aug - wmf.24 – Dan(brennen out, Jeena out, Antoine out)
  • 28 Aug – wmf.25 – Jeena+Dan
  • 04 Sep – wmf.26 – Antoine+Jeena
  • 11 Sep – wmf.27 – Jaime+Antoine+Andre as lurker!
  • 18 Sep – wmf.28 – Brennen+Jaime
  • 25 Sep – wmf.29 – 

Team discussions[edit]

Offsite![edit]

  • SF
  • Approved Arrival Date: December 4, 2023
  • Approved Departure Date: December 9, 2023
  • In Person Meeting Days: December 5, 6, 7, 8

Please complete the survey by September 19

DX Runs the train[edit]

  • We want to work closer with others in Developer Experience
  • We're looking for short, well-defined projects to tackle together
  • Initially, the projects should be time-bound and simpleβ€”we're trying to build our process for doing this and learn how to do this together
  • Later they will be bigger and gnarlier

We should still make sure at least one of us is on call for the train like we do currently to offer support. In particular because I'm assuming we are still taking care of the pre-train automated processes that run late Monday/early Tuesday (branch cut + train presync) There's a few places where I think we could already review/improve the docs beforehand: Security patches: They fail relatively often. This is the only documentation I could find about patches. It would be useful to have something that explains who to contact/how to get and updated security patch when necessary Triage/Bug reporting: Especially relevant people/teams to tag: https://www.mediawiki.org/wiki/Developers/Maintainers Rollback/holding the train:I would consolidate the sections we have about breakage, holding/rolling back and where to monitor in a single place. I would add more direct links to the relevant dashboards in logstash and grafana and revise the criteria themselves too; for example, based on how we normally operate, this criterion sounds too draconian: "In general, if there is an unexplained error that occurs within 1 hour of a train deployment β€” always roll back the train": https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Breakage https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Issues_that_hold_the_train

  • Update on "Investigate whether issues, operations, wikis, etc. can be disabled globally on GitLab"

Β  https://phabricator.wikimedia.org/T264231

    • Antoine tried it!
  • Continuous delivery all the things
    • Have to work with SRE on this since this is the deployment-charts repo
      • Need a different way to keeping track of state
    • Access control?
    • We want GitLab to do it?
    • Why do we store the version?
      • Information needs to be stored if we need to rebuild the cluster
    • Need the image name to run, should be store somewhere
    • Git is a question of access: team A can bump versions for team B
      • Don't object to *a* git repo, but the mechanismβ€”should give them control in what's gets deployed
    • don't want something deployed, don't merge it
    • Image tags are currently not meaningful
    • Building on a tag (although this may not be the strictest definition of continuous deployment)
      • Enforcing main always deployable would constrain people
      • Keeping those decoupled would remove that constraint on folks
    • Agree having the mentality of main always deployable is a good mindset, but it's too restrictive if our goal is to make things easier for developers