Wikimedia Release Engineering Team/Checkin archive/2021-02-24

From mediawiki.org


2020-02-24[edit]

Vacations/Important dates[edit]

https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
How to do it
  • 15 Feb: Presidents' Day -- US staff with reqs
  • 22 Feb: Dan out


  • 29 Mar: US staff with reqs
  • 12 Apr: US staff with reqs
  • 22 Apr: Earth Day -- US staff with reqs

Train[edit]

Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Important_dates


  • 16 Nov - wmf.18 - Ahmon + Antoine
  • 23 Nov - wmf.19 - No Train - Thanksgiving Thurs/Fri https://phabricator.wikimedia.org/T263185
  • 30 Nov - wmf.20 - Antoine + Mukunda
  • 7 Dec - wmf.21 - Mukunda + Dan
  • 14 Dec - wmf.22 - Dan + Jeena
  • 21 Dec - wmf.23 - No Train
  • 28 Dec - wmf.24 - No Train
  • 4 Jan - wmf.25 - Jeena + Lars Antoine
    • NB: Lars is only back from holiday on Thursday Jan 7
  • 11 Jan - wmf.26 - Lars + Jeena
  • 18 Jan - wmf.27 - Brennen + Lars (Monday is a holiday)
  • 25 Jan - wmf.28 - Ahmon + Brennen
  • 1 Feb - wmf.29 - Antoine + Ahmon
  • 8 Feb - wmf.30 - Mukunda + Antoine
  • 15 Feb - wmf.31 - Dan + Mukunda (Monday is a holiday)
  • 22 Feb - wmf.31 - Jeena + Dan
  • 1 Mar - wmf.31 - Lars + Jeena
  • 8 Mar - wmf.31 - Brennen + Lars


Status[edit]

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor

SoS[edit]

  • 2019-08-14 onwards: Zeljko 🎸 🎷 \o/
  • 2020-08-26 onwards: Deb is in charge/SoS is async
  • 2020-11-25: Brennen
  • 2020-12-02: Ahmon
  • 2020-12-09: Tyler
  • 2020-12-16: Antoine
  • 2021-01-06: Tyler
  • 2021-01-13: Text only update
  • 2021-01-20: Mukunda
  • 2021-01-27: Text only update
  • 2021-02-03: Thcipriani
  • 2021-02-10: Thcipriani
  • 2021-02-24: Thcipriani

Outgoing[edit]

Thanks[edit]

  • Serviceops unsticking VMs for GitLab
  • Moritz, jbond, godog for input on GitLab things

Callouts[edit]

Incoming[edit]

Team Business[edit]

Incoming/Needs attention[edit]

  • TRAIN, train, train
    • Discuss the idea of having a Tuesday train checkin about current errors and whether to block/roll
    • Should it include the entire team?
    • Public, all WMF tech/product, or team private?
    • IRC or Slack?
      • Note from earlier discussion: Matters whether it's community-accessible.
    • Discussion notes
      • Question of whether signoff is meaningful before code rolls out
      • Question of timing for EU folks
      • Distinction between signoff for train to roll initially and log triage
      • Lars: Proposal to automate as far as deploy to group0
      • Tyler:
        • The pain of the process isn't that we have to deploy, it's that we have to care about other people's errors.
        • Want to make the overall process better, not just shift it around
        • Work to determine who knows what's going on is untracked
      • Jeena:
        • Re: Lars' proposal -
      • Lars: Augment previous proposal: How about we negotiate with Platform Engineering etc. to select a representative for each train. Go / no-go committee every week, we know who they are before the train starts
      • Brennen: fundamental problem -- if you have code going out you need to be watching logs -- however we get to that is how we make things better independent of the mechanisms of deploying
      • Jeena: Competing idea to go/no-go: Could have a RelEng partner for each product team that would help them do their CI and deployments in a more individualized way. So that they'd be on the hook but things could happen faster.
        • +1s from Lars, Mukunda
      • Tyler: Complementary to this idea, want to push forward https://wikitech.wikimedia.org/wiki/User:Thcipriani/Deployments/Patch_type_criteria
        • Please have a look at this.
        • Idea to get folks deploying their own code... People are relying more on the train. First step might be deciding whether a change should ride the train or be backported.
      • Tyler: Actions: Some sort of proposal. Is the representatvie a good first step?
      • Ahmon: are those folks looking at the logs?
      • Dan: Instead of representatives could we have individuals? All of the people who wrote patches.
      • Brennen: sometimes mechanisms are better than org structures. Formalized mechanisms rather than mandated meetings might work better. You have to push this button if you want your code to code to stay deployed.
      • Antoine: CR+2 is already sort of this
      • Brennen: I guess I'm advocating for something like +2 for verified-in-production
      • Antoine: Staging / beta
      • Lars: post deployment voting on patches -- how do I know that *my code* is working.
      • Jeena: Manual testing of new code after code rolls out to each group
      • Brennen: Add a comment to each patchset that your code has reached groupX
        • We have a release tagger bot, but it doesn't say anything about whether your patch is actually in production.

Book club/Lunch and Learn[edit]

Monthly reflection on accomplishments - Feb '21 edition[edit]

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
Add as you have them!
  • PipelineLib fully working on releases-jenkins.wikimedia.org
  • Rust introduction talk (not strictly RelEng business)
  • logspam-watch minimum hits consolidation feature
  • Gearman plugin deployed. Merged bunch of pending changes + a fork from GoodData company which adds support for Pipeline jobs

Standup![edit]

Ahmon[edit]

  • Updates:
    • Verified that rebuildLocalisationCache.php doesn't currently require DB access in production. Will attempt to turn that into a policy so that building l10n files can safely be treated as a fully offline operation.
    • Thinking about approaches to a no-etcd mode for mediawiki-config.
    • Design change: Include l10n files in the built MW images. It's just better.
  • Blocked by:
    • none
  • Blocking:
    • none


Antoine[edit]

  • Updates:
  • Blocked by:
    • workflow-jobs are not registered. They are tied to `master` however it does not have any executor and thus no GearmanWorkerThread able to elect itself to register the function (reproduced locally).
  • Blocking:

Brennen[edit]

  • Blocked by:
  • Blocking:
  • Updates:
    • Went to the airport. It was weird.
    • GitLab
      • Kickoff meeting yesterday.
      • "GitLab (Initialization)" milestone for the init project: https://phabricator.wikimedia.org/project/view/5212/
      • Wrote up some request numbers for Gerrit to give S&F a rough idea what kind of traffic they should test GitLab instance against.
      • Today: Finish bashing out a description of desired auth situation.
    • logspam-watch is crying out for emojis🎉

Dan[edit]

  • Blocked by:
    • Some docker container networking issue on releases1002
  • Blocking:
  • Updates:
    • Added a bunch of new features to pipelinelib to get m8s multiversion image build working
    • Releases jenkins can _almost_ build a multiversion image

Jeena[edit]

  • Blocked by:
  • Blocking:
  • Updates:
    • Choo choo🚂
    • pet-expedition

Lars[edit]

  • Blocked by:
    • Computers
  • Blocking:
    • Good things
  • Updates:
    • Fixing train-dev which broke since Friday

Mukunda[edit]

  • Blocked by:
    • kibana is a bastard
  • Blocking:
  • Updates:
    • Have tried and failed at a bunch of different angles of recreating phatality. Latest idea is to pull data from the phabricator side instead of pushing from the kibana side. Details coming soon.

Tyler[edit]