Wikimedia Release Engineering Team/Checkin archive/20160829

From mediawiki.org

2016-08-29[edit]

Vacations/Important dates[edit]

How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off

  • Sept 02: Q2 goals draft published, Dan out
  • Sept 05: US Holiday (Labor day)
  • Sept 23: Q2 goals finalized
  • Oct 01: Start of Q2
  • October 10: US Holiday (Indigenous People's Day)
  • October 17-21: Offsite in Washington D.C.
  • October 31: Mukunda
  • October 28 - Nov 2 (ish) - Chad
  • November 24: US Holiday (Thanksgiving)
  • January 9-11: Dev Summit
  • January 12-13: All Hands

Team Business[edit]

Time spent spreadsheet[edit]

Rotating positions and absences[edit]

Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/u/blockers

weeks of Aug 22 and Aug 29[edit]

weeks of Sep 05 and Sep 12[edit]

Actions from last meeting[edit]

Scrum of Scrums[edit]

https://phabricator.wikimedia.org/project/board/64/
Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R

This week[edit]

Last week[edit]


Other Team Business[edit]

Offsite[edit]

"Upgrade all mw* servers to debian jessie"[edit]

Q2 (Oct - Dec) Goals[edit]

Previously listed goals[edit]
  • Differential
    • fix Jenkins tests, maybe
    • migrate android
    • not a goal
  • Malu
    • pause
  • LLB + MW + Extension deploys to scap3 ?
    • not a goal
New goal proposals[edit]
  • Python software deployment via scap3 (Zuul + Nodepool)
    • think more on it (Tyler and Antoine), not a goal for now
  • CI Tech Debt
    • Determine long term plan for Nodepool
      • This needs to be more specific
    • Anything else?
    • think about how we use queues
      • split queue per branch (eg: security releases hitting multiple branches, 600 jobs), can make more run parallel
      • SWAT deploys: make the wmf branch go through as fast as it can
      • ""Review and adjust CI queues for more parallel operations"
  • MW deploy tech debt (Experiment/Stretch)
    • scap swat
    • ability to have multiple checks on MW deploys (in addition to logstash), eg swagger spec for MW (node endpoints checking)



Q1 goal/project check-in[edit]

Phase out Ubuntu Precise[edit]

Replace primary production Continuous Integration host (gallium) - task T95757[edit]


Upgrade Phabricator database servers to Maria10/Jessie - task T138460[edit]

  • Yes Done

Upgrade Beta Cluster database servers to Maria10/Jessie - task T138778[edit]

  • Chad will review Dan's patches
  • Dan will coord with Jaime for $whenever_works_for_them

Move Gerrit off of ytterbium - task T125018[edit]

  • Yes Done

Reduce Technical Debt[edit]

Perform a technical debt analysis of software and services maintained by WMF Release Engineering - task T138225

Next steps:

  • Greg get the documentation documented and call it done (for this goal for this quarter)


Streamline deployments (long-lived branches)[edit]

keyresult task:

  • Convert our production deployment strategy to use long-lived branches - task T89945

project view: https://phabricator.wikimedia.org/project/view/2117/

  • Tooling will probably be done
  • static asset conclusion might not be
  • scap swat coming along nicely
    • use gerrit rest api (has need features not avail over ssh)
      • will need some sort of shared account (with frequent credential rotation, potentially each deploy)
    • can use a .netrc right now
    • scraping Deployment calendar page is crappy

Non-Quarterly goal work[edit]

SWAT deploy changes[edit]

  • European SWAT deploys (task T137970
  • Future changes?
    • requiring a task associated with each change being pushed out?
    • Add all swatters to each swat window, stop segmenting based on their availability (worst case they get a ping when they're not online)

CI Scaling/Nodepool[edit]

Browser tests[edit]

Beta Cluster[edit]

  • Long lived cherry pick stuff popping up again
  • Antoine got one out thanks to Brandon

Other[edit]

DB Inconsistencies[edit]

https://phabricator.wikimedia.org/T132416 and https://phabricator.wikimedia.org/T104459 (see also: https://www.mediawiki.org/wiki/Development_policy#Database_patches )


People status updates[edit]

Antoine[edit]

Last week[edit]

  • Catch up on Nodepool incident - DONE
  • Migrate jobs back to Nodepool instance - Week of Aug 29
    • Ideally get quota raised
  • Figure out contint1001 network with ops / Tyler
  • done: clear out 3 weeks worth of mails
  • personal: learn how to play https://www.youtube.com/watch?v=d9i_zXmULyk
  • pet project: rake / rspec on puppet.git and tox for operations/software.git

This week[edit]

  • Migrate jobs back to Nodepool instance. Chase to monitor wmflabs as we progressively switch back. Starting on Tuesday Aug 30th
  • Figure out contint1001 network with ops / Tyler
    • Haven't pushed for it. Faidon in vacations this week. -- solved with mark --> public IP
  • personal: working on Ukulele major chords. C, D, F, G done. Todo: A B E. Probably gonna buy a guitar.
  • Branch cut / train deploy - done

Chad[edit]

Last week[edit]

  • MW release today (finally)
  • Finally going to do DB consistency script -- per our 1:1 this shouldn't be so hard
  • Long lived branches (long may they ilve)

This week[edit]

  • Diving into the DB consistency script. Doable, but hard :)
  • More long lived branches

Dan[edit]

Last week[edit]

This week[edit]

Mukunda[edit]

Last week[edit]

  • Finish the `scap swat` tool which is taking shape nicely.
  • Propose Improvements to the scap remote execution api to make it easy to use from scap plugins
    • This could facilitate development of arbitrary scap checks which can be ran separately from deployments
    • Will discuss with Tyler during the deployments meeting and go from there.

This week[edit]

  • Still finishing up scap swat
    • Hopefully, resolve the screen scraping debate
  • Start on `scap merge` branch management tooling

Tyler[edit]

Last week[edit]

  • Bugfix scap update
  • nodepool things

This week[edit]

Željko[edit]

Last week[edit]

This week[edit]