Jump to content

Wikimedia Release Engineering Team/Checkin archive/20180924

From mediawiki.org


Vacations/Important dates[edit]

How to do it
  • September 27th (Thursday) - Antoine busy handling paperwork
  • Beginning October - Mid october, Antoine to take off some weeks/days/part time
  • October 5th (Friday) - Željko on a conference (https://2018.webcampzg.org/ )
  • October 8th - Holiday (Indigenous People's Day, Independence Day - Željko)
  • October 8th - New hire start date
  • November 1 (Thursday) - Holiday (All Saints' Day - Željko)
  • November 9th - Holiday (Veteran's Day)
  • November 22+23 - Holidays (Thanksgiving)
  • November 25-december 2nd: Mukunda vacation (in California ahead of the offsite)
  • Week of December 3rd - Team offsite
  • December 24-28 - Holidays (Christmas)

Rotating positions[edit]


Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
  • July 02 - wmf.11 - Zeljko - no train, Fourth of July
  • July 09 - wmf.12 - Zeljko
  • July 16 - wmf.13 - Zeljko
  • July 23 - wmf.14 - Zeljko
  • July 30 - wmf.15 - Mukunda
  • Aug 06 - wmf.16 - Mukunda
  • Aug 13 - wmf.17 - Mukunda (No train - Wednesday is a holiday)
  • Aug 20 - wmf.18 - Tyler
  • Aug 27 - wmf.19 - Dan && Antoine lurking over the shoulders
  • Sep 03 - wmf.20 - Antoine
  • Sep 10 - wmf.21 - Antoine (No train due to DC switchover)
  • Sep 17 - wmf.22 - Antoine
  • Sep 24 - wmf.23 - Zeljko <----
  • Oct 01 - wmf.24 - Dan
  • Oct 08 - wmf.25 - Dan (No train due to DC switchover)
  • Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
  • Oct 22 - wmf.1 - Mukunda


  • July 04 - Dan
  • July 11 - Antoine
  • July 18 - Antoine
  • July 25 - Tyler
  • Aug 01 - Tyler
  • Aug 08 - Zeljko
  • Aug 15 - Dan (No SoS this week)
  • Aug 22 - Zeljko
  • Aug 29 - Zeljko
  • Sep 05 - Tyler / Željko
  • Sep 12 - Tyler / Željko
  • Sep 19 - Dan / Željko
  • Sep 26 - Zeljko <----
  • Oct 03 - Zeljko
  • Oct 10 - Zeljko
  • Oct 17 - Zeljko
  • Oct 24 - Zeljko
  • Oct 31 - Zeljko

Team Business[edit]


First Offsite[edit]


  • Week of December 3rd
  • At the Queen Mary hotel in Long Beach
  • Deb T will be facilitating


Development plans[edit]

  • Due end of the week!

Needs attention[edit]

Operational Excellence posts[edit]

  • greg got it at 5:45 on Friday, hasn't had a chance to review yet....

Scrum of Scrums[edit]

Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

This week[edit]

Release Engineering[edit]

Last week[edit]

Release Engineering[edit]

  • Blocked by:
  • Blocking:
  • Updates:
    • Train Health: no train last week due to DC switchover, train continues this week
    • Log Health:
    • Code Health:
      • Code Health Metrics Working Group Kickoff last week
      • Code Health Metrics Working Group meeting this week - further discuss/define the workgroup's scope and next steps

Train status and happenings[edit]


1.32.0-wmf.22 went well. Antoine wrote a quick summary at end of task with some thank you for people involved. https://phabricator.wikimedia.org/T191068#4604040

Things potentially worth attention:

  • New but not blocking T204871: Promoting group1 to 1.32.0-wmf.22 caused a spam of web request took longer than 60 seconds and timed out
    • wikiversions.json update (and probably any scap action) cause a spam of requests timeout. That selfs resolves. The timeouts were previously NOT enforced, so we probably always had the issue and they just show up now. To be investigated.
    • For next train: the times out can be ignored for the next 3 or 4 minutes. See task for details.
  • Worked around T204907: Scap is checking canary servers in dormant instead of active-dc
    • scap dsh groups were still referencing EQIAD server making the canary check useless. Antoine changed to codfw hosts. A better solution would have to be found to change them automagically based on the active datacenter. Maybe conftool/etc can come to help.
  • Known T204961: ORES requests for wikidatawiki models=damaging end up with HTTP request timed out
    • When wikis change versions, ORES seems to have troubles handling the new requests. There are a few http timeouts when reaching ORES service. Amir stepped in immediately, asked on Friday whether that was UBN worthy, but Antoine said it could wait for Monday SWAT.

Past week status updates[edit]

All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q4

Quaterly Goals for Q1[edit]

Pipeline: Move verify stage from Minikube to CI k8s namespace in production context[edit]

tracking task
  • some movement for next quarter stuff -- zotero-v2/node10js images

Code Health[edit]

  • T199253 - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page)
    • On track to have first pass proposal defined
  • Perform existing Stewardship review process for Q1 cycle.
  • T199254 - Add test evaluation to post mortem review process.
  • Review existing e2e test coverage.
  • Define prioritization scheme.
  • Prioritize e2e testing gaps.
  • T199257 - make current unit testing coverage more visible by reporting out to Engineering Management.
    • Will have first pass Code Health Newletter (which will include coverage info) by the end of week.
  • T199259 - Platform and Search Platform teams are using TDM PoC
  • T199262 - Identify key Tech Debt areas
  • T199263 - Put in place Tech Debt management process for PEP
  • T199261 - Define base Code Health metric set.
    • Working group met last week as well, have base tasks defined, and have started defining some metric candidates.

Developer Productivity[edit]

  • Make a hire to create the capacity needed for this program.
  • Write and share a survey to measure developer satisfaction and areas for investment. - task T197635
  • hiring
  • survey?

Other work[edit]


  • T199133 Find top 15 target projects that could use Selenium tests to prevent incidents
    • Review existing e2e test coverage - done
    • Define prioritization scheme - doing
    • Prioritize e2e testing gaps - next




  • Timo is writing a wikitech-l newsletter and including a section about our recent CI work (disk space issues, consolidation of instances, etc.). He wants to link out to a more substantial post from us. This would need to be done by Tuesday. :)
    • (Covered. See Production Excellence section under Team Business)


  • Had QA sig meeting last week. Spoke with Elena to see if additional discussions about QA career paths took place in Audiences. None so far.


  • Scap REAL canary patch: https://phabricator.wikimedia.org/D1114
    • thcipriani: accepted! land at will.
  • the rebuildLocalisationCache.php takes 40 minutes task is complete
    • Took 1m 7s without any changes, so it will be slower than that, but should be much much faster



Did train, a bit of quibble and CI config. Train went well!

  • What I plan to do this week
  • What I'm blocked on
  • Other?



  • What I plan to do this week
    • interviewing
    • doing a SWAT today :)
    • "finalize" ya'lls development plans
    • ping Deb on when to start planning out our Offsite - delay this
    • review of onboarding docs again (steal some good stuff from Discovery Team's) (thcipriani: https://wikitech.wikimedia.org/wiki/Ops_Onboarding ops has good stuff to steal, too :))
    • production excellence blog review
    • Pipeline presentation outlining?
  • What I'm blocked on
  • Other?


  • What I plan to do this week
    • wrap up Q1 Goals
    • Dev plan
  • What I'm blocked on
  • Other?


  • What I plan to do this week
  • What I'm blocked on
  • Other?


  • What I plan to do this week
    • Development plan convo
    • CoC footer patch
    • keyholder code review
  • What I'm blocked on
  • Other?
    • zotero-v2 followup as needed
    • scap workboard cleanup as there's time


  • What I plan to do this week
    • T191069 1.32.0-wmf.23 deployment blockers
    • T199133 Find top 15 target projects that could use Selenium tests to prevent incidents
      • Review existing e2e test coverage - done
      • Define prioritization scheme - doing
      • Prioritize e2e testing gaps - next
  • What I'm blocked on
  • Other?


Team Kanban Board Review and Triage[edit]

Once / month-ish review of backlog(s)[edit]

Kanban stats[edit]

Burnup chart