Wikimedia Release Engineering Team/Checkin archive/20151207

From mediawiki.org

2015-12-07[edit]

Vacations/Confs/etc[edit]

How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off

  • Dec 4th: Greg - disconnected, leaving Thursday evening, returning Sunday :)
  • Dec 14-Jan 1: Greg - vacation (3 weeks, will be checking email)
  • Dec 22-29: Chad - Christmas (will be reachable by e-mail, will have laptop in case of emergencies)
  • Dec 23–25: Tyler — Hopeful, probable, Christmas in Kansas!
  • Dec 24-Jan 3: Dan - Holidays
  • Dec 24-30: Antoine - Holidays (bringing laptop - ring phone as needed)
  • Dec 24: mukunda - holiday
  • Dec 25: US HOLIDAY - Christmas Day - December 25
  • Dec 28: mukunda - holiday
  • Dec 31: mukunda - holiday
  • Jan 1: US HOLIDAY - New Year's Day
  • Jan 4 - 8: WikiDev16 + All Hands
  • Jan 16-18: Chad - another music festival
  • Jan 18: US HOLIDAY - Martin Luther King Day
  • Feb 15: US HOLIDAY - President's Day
  • May 17-(?): Dan - paternity leave :D
    • PO Box for pastries? - Antoine
  • May 30: US HOLIDAY - Memorial Day
  • June-ish: Chad - EDC
  • August: France holiday - because french. :)


Team Business[edit]

Actions from last meeting[edit]

  • TODO - Antoine + Mukunda: should sit down and talk CI/Harbormaster/Nodepool
    • Mukunda just needs to find time to finally test out harbormaster triggering jenkins jobs.
      • this works fine, still need to get jenkins to report back to harbormaster
  • TODO - Greg: look into reasons for spikes of info-level log spam
    • RunJobs logging changes? See engineering-l/ops-l?
    • yes....
  • TODO - No One Yet: investigate carbon aggregation of stats >1 month old behavior
    • ACTION: Antoine to create a task


WIkiDev16[edit]


Other[edit]

Q3 Goals[edit]

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201516Q3
  • Goals timeline:
    • December 3: Group goal scoped and drafted on mediawiki.org for Technology team.
    • December 10: Group goal + all individual team *drafts completed* on mediawiki.org; discuss at Infra+Tech group and identify dependencies.
    • December 17: individual team goals + group goal *finalized* on mediawiki.org; discuss at Monthly Eng Staff.


KPIs[edit]

https://grafana.wikimedia.org/dashboard/db/releng-kpis

New vs Maint time spent[edit]

#together[edit]

  • Rotating Deploys
    • Tyler propose to roll the responsability of cutting branch / deployment train.
    • Tyler cut and deployment Tuesday 12/08 ( group0: 1.27.0-wmf.7->1.27.0-wmf.8 )


Scrum of Scrums[edit]

https://phabricator.wikimedia.org/project/board/64/
Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R


Project Updates[edit]

CI Scaling[edit]

https://phabricator.wikimedia.org/project/board/1010/
Quarterly Goal: "CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves" - https://phabricator.wikimedia.org/T111106
  • Jobs cleanup
  • Nova scheduler went wild on Sunday ~ midnight UTC. Nodepool could no more spawn instances. Fixed by ops (restart some openstack process)
  • Some Nodepool and Zuul tiny upgrade coming in (speedup related)
  • Jenkins security upgrade

Beta Cluster[edit]

https://phabricator.wikimedia.org/project/board/497/?order=priority
  • Team: might have to watch slow query logs / strict errors etc
  • labs lost DNS aliasing on Monday for a few hours
  • EventBus <-- new extension
  • DB outage followup
    • beta cluster db potential move to dedicated hardware?
    • setup a beta cluster specific tendril?
    • ACTION: Antoine to check in with Jaime on what to do next.
    • goal: monitor slow queries before they land prod

Deployment Cabal[edit]

Main: https://phabricator.wikimedia.org/project/board/349/
Scap3: https://phabricator.wikimedia.org/project/board/1449/
Quarterly Goal: "Migrate all Service team owned services and MW deploys to scap3" - https://phabricator.wikimedia.org/T109926
  • (Tyler) Deployed AQS on beta
  • Scap worked flawlessly
  • Mathoid to follow, straightfoward
    • the simpler 'oids have a shared puppet module
  • TODO: Jenkins job to wrap around scap3 deploy on beta


Diff[usion|erential] migration[edit]

https://etherpad.wikimedia.org/p/diffuerential-weekly
Gitblit-Deprecate: https://phabricator.wikimedia.org/project/board/46/
Quarterly Goal: https://phabricator.wikimedia.org/T111465
Gerrit-Migration: https://phabricator.wikimedia.org/project/board/9/
  • Redirect of Gerrit project name -> Phabricator canonical URL with callsign
  • Couple patches by Paladox pending. Highlights some issues with Phabricator


Developer Tooling (MW-Vagrant, MW-Selenium, etc.)[edit]

Other Work[edit]

  • Gerrit 2.12 - https://phabricator.wikimedia.org/T70271
    • Antoine: current web UI is gone. Not sure whether it is worth upgrading.
      • It is. The UI isn't that different and we get stability/security fixes. We're not all going to be using Phabricator in less time than it'll take to upgrade.
        • I am all for keeping up with upstream. I am worried about the community drama related to the new UI :-((( I (antoine) dont care about the UI really :D
          • #nodrama. Seriously, people will live with it if they don't like it. It's freaking gerrit, their UI has always sucked.
            • Make sense :-}
              • Security is going to be my go-to answer if anyone complains. 2.8.x is long past EOL and we're probably vuln from more than one issue.
                • ^ that, always the safe answer