Wikimedia Release Engineering Team/Checkin archive/20160718

From mediawiki.org

2016-07-18[edit]

Special Guest - Rachel Farrand![edit]

Team offiste planning!

Spreadsheets!

Notes:

  • Rachel will begin working on hotel/venue options in Chicago and DC \o/

Special Guest - Andrew with CI questions[edit]

  • Need a good metric to watch for labs changes impact on CI
  • Respawn may be causing DNS issues, can we increase the wait time there?
  • What metrics do we have:
   https://grafana.wikimedia.org/dashboard/db/releng-kpis
   https://grafana.wikimedia.org/dashboard/db/releng-zuul

Vacations/Important dates[edit]

How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off

  • July 25 - August 15: Željko vacation. Will have laptop with me. Reachable via phone.
  • July 30 - August 21: Antoine vacation. At home 1st week.
  • August 1st - 5th: Mukunda - vacation: Concert & relaxation

...

  • January 9-11: Dev Summit
  • January 12-13: All Hands

Team Business[edit]

Rotating positions and absences[edit]

Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/u/blockers

weeks of July 11 and 18[edit]


weeks of July 25 and Aug 1[edit]


Time spent spreadsheet[edit]

Actions from last meeting[edit]

  • Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 https://phabricator.wikimedia.org/T138778
    • TODO: Greg. What is the priority? Check with Jaime. We have other priorities.
    • Yes Done Commented/asked on task.
  • SWAT deploy next steps:
    • Yes Done TODO: Zeljko do an 8am Pacific SWAT deploy with Tyler
    • Yes Done TODO: After that, update docs
    • NEXT: stalled pending finding people to do the SWAT window while Antoine and Zeljko are on vacation


Scrum of Scrums[edit]

https://phabricator.wikimedia.org/project/board/64/
Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R

This week[edit]

  • Blocking
  • Blocked
  • Updates
    • Zuul upgraded this week, should address a bunch of issues

Last week[edit]


Other Team Business[edit]

  • European SWAT deploys next steps (task T137970
    • stalled until after Antoine and Zeljko's vacations, unless 2 other trained SWATers step forward
  • Andrew interrupts with nodepool questions
    • New labvirt nodes coming online today, please be alert to weird behavior
    • Labs OPs would like to see metrics about testing performance:
      • Benefit from increasing # of concurrent nodes
      • Cost/benefit from changing rate of node recreation

Q1 goal/project check-in[edit]

Phase out Ubuntu Precise[edit]

keyresult tasks:

  • Replace primary production Continuous Integration host (gallium) - task T95757
    • Meeting with Chase on Thursday was skipped
    • Faidon will respond this week with his thoughts, we're waiting on him
  • Upgrade Phabricator database servers to Maria10/Jessie - task T138460
    • waiting on Jaime to failover m3-master
  • Upgrade Beta Cluster database servers to Maria10/Jessie - task T138778
    • waiting on Jaime to priority


Reduce Technical Debt[edit]

Perform a technical debt analysis of software and services maintained by WMF Release Engineering - task T138225


Streamline deployments (long-lived branches)[edit]

keyresult task:

  • Convert our production deployment strategy to use long-lived branches - task T89945

project view: https://phabricator.wikimedia.org/project/view/2117/

  • reorganized/repurposed other meetings to work on this
  • time this past week was mostly spent on Phabricator fixing (task graphs, oh boy do we like tracking tasks)


Non-Quarterly goal work[edit]

CI Scaling/Nodepool[edit]

Browser tests[edit]

Differential migration[edit]

Differential weekly (https://etherpad.wikimedia.org/p/diffuerential-weekly ) TODOs:

    • semi-related TODO: file task re upgrading MW-Vagrant guests to Jessie

Beta Cluster[edit]

  • "deployment-fluorine becomes unresponsive frequently" - https://phabricator.wikimedia.org/T140313
    • From Matt (who's trying to diagnose login issues): "Happened again. I worked around it by rebooting in wikitech, but shouldn't keep happening."


Other[edit]

People status updates[edit]

Antoine[edit]

Last week[edit]

  • Gerrit upgrade / Zuul upgrade
  • Target host to replace gallium
  • Sync up with Tyler for CI / gallium phase out
  • Moaar maintenance
  • Offsite site/date

This week[edit]

Chad[edit]

Last week[edit]

  • Gerrit. Gerrit. Gerrit.

This week[edit]

  • Moar gerrit. Train. Choo choo.

Dan[edit]

Last week[edit]

  • Getting back

This week[edit]

Mukunda[edit]

Last week[edit]

  • Phabricator upgrade on wednesday
  • Figure out where to start on the long lived branches project

This week[edit]

  • Get the merge-wmf-branch script cleaned up and shared with the team for feedback
  • Brainstorm improvements / other ideas around branch merging / cherry-picking

Tyler[edit]

This week[edit]

  • MW Canary work

Last week[edit]

  • SWAT training/documentation
  • Task wrangling


This week[edit]

Željko[edit]

Last week[edit]

This week[edit]