Wikimedia Release Engineering Team/Checkin archive/2023-03-13

From mediawiki.org


2023-03-13[edit]

πŸ† Wins/winterrogation[edit]

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
Mar 2024
  • Nightly security patch failures updating phabricator tasks merged, ready to release
  • Merged deploys-in-progress reset script
  • Two repos have patches for git-fat β†’ git-lfs
  • scap: replaced canary swagger checks with test server httpbb checks
  • Phorge integration with GitLab in its third round of review
  • GitLab webhooks also still going, looks like it'll go through
  • People like scap backport - more patches, fewer things typed into terminals.
  • Security patch notification now working!
  • GitLab webhooks have a more accurate regex for "Bug: TXX"
  • Foreachwiki in beta
  • Getting rid of the /srv/mediawiki/php symlink
  • Upgraded GitLab k8s/cloud cluster to new k8s version and documented the process

Stuff from last time[edit]

πŸ“… Vacations/Important dates[edit]

https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2024
https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off (page needs updating for Dayforce)

Mar 2024[edit]

  • 29 Feb, 1st Mar, 4th Mar - 8th Mar - Antoine
  • 14 Mar–14 May: Dan
  • 29 Mar: Brennen, Jeena

Apr 2024[edit]

  • Mon 22 Apr: Global holiday, all staff
  • 26 Apr: Brennen (tentative)
  • Fri 05 Apr–Fri 12 Apr -- Tyler, eclipse viewing

May 2024[edit]

  • Mon 27 May: Memorial Day (US staff with reqs)

Future[edit]

  • A few days around July 4: Brennen
  • 25 Aug - 03 Sep: Brennen

πŸ”₯πŸš‚ Train[edit]

https://tools.wmflabs.org/versions/
https://train-blockers.toolforge.org/
https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar

Rotation[edit]

  • 3 Dec – 1.42.0-wmf.8 – No Train offsite
  • 11 Dec – 1.42.0-wmf.9 – Brennen + Antoine (Jaime out)
  • 18 Dec – 1.42.0-wmf.10 – Ahmon + Brennen (Jaime out)
  • 25 Dec – 1.42.0-wmf.11 – No Train
  • 1 Jan – 1.42.0-wmf.12 – Dan + Ahmon (Jaime out)
  • 8 Jan – 1.42.0-wmf.13 – Jeena + Dan (Jaime out)
  • 15 Jan – 1.42.0-wmf.14 – Jaime + Jeena
  • 22 Jan – 1.42.0-wmf.15 – Antoine + Jaime
  • 29 Jan – 1.42.0-wmf.16 – Ahmon + Antoine(Brennen out Wed–Fri)
  • 05 Feb – 1.42.0-wmf.17 – Brennen + Ahmon
  • 12 Feb – 1.42.0-wmf.18 – Brennen+Antoine (Friday)
  • 19 Feb – 1.42.0-wmf.19 – Jeena+Brennnen
  • 26 Feb – 1.42.0-wmf.20 – Dan + Jeena
  • 04 Mar – 1.42.0-wmf.21 – Jaime + Dan (Antoine out)
People for train: Ahmon, Antoine, Brennen, Jeena, Jaime
  • 11 Mar – 1.42.0-wmf.22 – Antoine + Jaime (Dan out)
  • 18 Mar – 1.42.0-wmf.23 – Ahmon + Antoine
  • 25 Mar – 1.42.0-wmf.24 – Jeena + Ahmon
  • 1 Apr – 1.42.0-wmf.25 – Jaime + Jeena
  • 8 Apri – 1.42.0-wmf.26 – Antoine + Jaime
  • 15 Apr – 1.42.0-wmf.27 – Ahmon + Antoine
  • 22 Apr – 1.42.0-wmf.28 – Brennen + Ahmon (Global holiday Monday; Brennen out Friday)

Team Discussions[edit]

Annual planning[edit]

Meta page: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2024-2025/Goals

  • How this works: Goals β†’ Buckets β†’ Objectives β†’ KRs β†’ Hypotheses
  • Where our work fits: Infrastructure β†’ WikiExperiences β†’ WE6 Developer Services β†’ WE6.2
  • WE6:

Technical staff and volunteer developers have the tools they need to effectively support the Wikimedia projects

    • WE6.2: By Q4, complete an intervention and run an experiment each aimed at providing maintainable, targeted environments to serve developers' high-priority testing needs

Experiment: the goal is we learn some things Intervention: we make a thing based on stuff we learned

WE6.2 Long version[edit]

Developers and users depend on the Wikimedia Beta Cluster (beta) to catch bugs before they affect users. Over time, the uses of beta have grown and come into conflictβ€”-the uses are too diverse to fit in a single environment. We will perform one intervention and conduct one experiment each aimed at replacing a single high-priority testing need currently fulfilled by beta with a maintainable alternative environment that better serves each use case's needs.

Hypotheses-areas:

  • Experiment: Group -1
  • Intervention: Catalyst

Discussion of our hypotheis (alongside ServiceOps):

  • Rollback faster
  • Smaller, single-version images
  • Wikiversions should be config rather than code (no deploy needed)
  • Continuous deployment to test wikis
    • ServiceOps open to the idea of testwikis being the victim here
    • We don't know how caching works when it's updated every minute
    • Social change here, working closely with developers to change expectations
    • User interface challenges: ssh to server, lots of output to interpret, we can present things to be less-scary, web-ui would be really awesome
    • What's scary about deploys now is what's happening in production and what do I have to do about it as a deployer?
    • Logging and monitoring and alerts exposed in a way for developers to feel confident deploying themselves vs speeding up
      • Something about making the summary of the state of production more visible

Framing that might make sense, post-discussion:

Hypothesis one: group -1

  • Lots of work falling in ServiceOps, our work is building single-version images (+ wikiversion/mw/config work)
  • Single version makes actual deployment faster
  • Wikiversions outside of code means fewer deploys (makes deployment faster)
  • Draft hypothesis: If we build a single version container image and experiment to move wiki-to-verison routing outside of code deployment, we'll

Hypothesis two: speeding time to deploy

  • Lots of work in our team, little work in the ServiceOps space
  • Making it less scary to deploy: rollback, web ui, giving deployers an easy way to see what's happening in production, making it obvious what to do about it