Wikimedia Release Engineering Team/Checkin archive/20181010

From mediawiki.org


2018-10-10[edit]

Vacations/Important dates[edit]

https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
How to do it
  • Beginning October - Mid october, Antoine to take off some weeks/days/part time (October 1-14 according to https://phabricator.wikimedia.org/E40)
  • October 8th - Holiday (Indigenous People's Day, Independence Day - Ĺ˝eljko)
  • October 8th - New hire start date
  • October 21-28 - Greg in Portland for TechConf+TechMgrs F2F
  • November 1 (Thursday) - Holiday (All Saints' Day - Ĺ˝eljko)
  • November 12th - Holiday (Veteran's Day, Observed)
  • November 22+23 - Holidays (Thanksgiving)
  • November 25-december 2nd: Mukunda vacation (in California ahead of the offsite)
  • Week of December 3rd - Team offsite
  • December 24-28 - Holidays (Christmas)

Rotating positions[edit]

Train[edit]

Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
  • Oct 08 - wmf.25 - Dan (No train due to DC switchover) <----
  • Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
  • Oct 22 - wmf.1 - Mukunda (warning, TechConf happening, ping Greg if you need responses from anyone there...)
  • Oct 29 - wmf.2 - Tyler
  • Nov 05 - wmf.3 - Tyler
  • Nov 12 - wmf.4 - Antoine
  • Nov 19 - wmf.5 - No Train (Thanksgiving)
  • Nov 26 - wmf.6 - Antoine
  • Dec 03 - wmf.7 - No Train (Offsite)
  • Dec 10 - wmf.8 - Zeljko
  • Dec 17 - wmf.9 - Zeljko
  • Dec 24 - wmf.10 - No Train (Holiday break)
  • Dec 31 - wmf.11 - No Train (Holiday break)
  • Jan 07 - wmf.12 - Dan
  • Jan 14 - wmf.13 - Dan
  • Jan 21 - wmf.14 - Mukunda
  • Jan 28 - wmf.15 - No Train (All Hands)
  • Feb 04 - wmf.16 - Mukunda
  • Feb 11 - wmf.17 - Tyler
  • Feb 18 - wmf.18 - Tyler
  • Feb 25 - wmf.19 - Antoine


SoS[edit]

  • Sep 26 - Zeljko
  • Oct 03 - Zeljko
  • Oct 10 - Zeljko <----
  • Oct 17 - Zeljko
  • Oct 24 - Zeljko
  • Oct 31 - Zeljko
  • Nov 07 - Zeljko
  • Nov 14 - Zeljko
  • Nov 21 - Zeljko
  • Nov 28 - Zeljko
  • Dec 05 - Zeljko
  • Dec 12 - Zeljko
  • Dec 19 - Zeljko
  • Dec 26 - Zeljko
  • Jan 02 - Zeljko
  • Jan 09 - Zeljko
  • Jan 16 - Zeljko
  • Jan 23 - Zeljko
  • Jan 30 - Zeljko
  • Feb 06 - Zeljko
  • Feb 13 - Zeljko
  • Feb 20 - Zeljko
  • Feb 27 - Zeljko

Team Business[edit]

Hiring[edit]


First Offsite[edit]

Details:

  • Week of December 3rd
  • At the Queen Mary hotel in Long Beach
  • Deb T will be facilitating

Topics!

Needs attention[edit]



  • Need to get Lars' key added to pwstore
    • party time!


Scrum of Scrums[edit]

Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

Release Engineering[edit]

  • Blocked by:
  • Blocking:
  • Updates:
    • Hired Lars Wirzenius
    • Interviewing on-going for our Developer Productivity position: https://boards.greenhouse.io/wikimedia/jobs/1225258?gh_src=f15731e11
    • Train Health:
      • Last week: a few blockers, resolved in time, no problems - T191070 1.32.0-wmf.24 deployment blockers
      • This week: No train this week due to DC switchover - T191071 1.32.0-wmf.25 deployment blockers
      • Next week: the last 1.32 release, 1.33 starts the next week - T191072 1.32.0-wmf.26 deployment blockers
    • Log Health:
      • T204871 Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out"
    • Code Health:

Callouts[edit]

  • Release Engineering
    • Train Health: no train due to DC switchover - T191071 1.32.0-wmf.25 deployment blockers
    • Log Health: T204871 Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out"

Train status and happenings[edit]

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor


Quaterly Goals for Q2[edit]

TEC1 (Maint): Outcome 1 / Output 1.1[edit]

GOAL: Determine the procedure and requirements for an automated MediaWiki branch cut.
WHO: Mukunda, Tyler, Antoine
  • Locked down releases-jenkins -- too tightly, caused problem with icinga -- probably change check to check /login


TEC3 (Pipeline): Outcome 1 / Output 1.2[edit]

GOAL: Formalize the collection of CI infrastructure and tooling metrics
WHO: Dan, Antoine
  • installed prometheus plugin for ci jenkins

TEC3 (Pipeline): Outcome 2 / Output 2.3[edit]

GOAL: Develop set of metrics to assess incident reports/post mortems
WHO: Greg, Zeljko
  • T206622 Develop set of metrics to assess incident reports/post mortems


TEC3 (Pipeline): Outcome 3 / Output 3.1[edit]

GOALS:
Adopt more services into Deployment pipeline
Migrate graphoid to the Deployment pipeline
Deploy zotero v2 to the Deployment pipeline
Deploy blubberoid
WHO: Dan, Tyler, Lars
https://phabricator.wikimedia.org/T205919
  • zoterov2 has a patch
    • ideally release blubber v0.6.0 to make it a smaller patch (node_modules thing)


TEC12 (DevProd): Outcome 2 / Output 2.1[edit]

GOAL: The Annual Developer Productivity Survey results are synthesized and shared, creating a first year baseline.
WHO: Mukunda, Greg
  • Mukunda sent to Legal to get a Privacy Policy for it.
  • Should have a response from legal (re: privacy policy) sometime this week, need to start building the actual survey once that's in place.


TEC13 (Code Health): Outcome 1 / Output 1.1[edit]

GOAL: Update/refresh review queue (review process for initial code deployment)
WHO: JR

No progress


TEC13 (Code Health): Outcome 2 / Output 2.2[edit]

GOAL: 5 of the 15 prioritized repositories have at least 1 end-to-end test.
WHO: Zeljko
  • T206621 5 of the 15 prioritized repositories have at least 1 end-to-end test


TEC13 (Code Health): Outcome 2 / Output 2.3[edit]

GOAL: Assess Platform unit test practices and define improvement plan
WHO: JR, Core Platform Team

Met with Corey and Cindy to further refine this goal.


TEC13 (Code Health): Outcome 3 / Output 3.2[edit]

GOAL: Core Platform and Search Platform teams are using TDM PoC
WHO: JR, Core Platform Team

Met with Corey and Cindy to furtrher refine this goal.


TEC13 (Code Health): Outcome 3 / Output 3.4[edit]

GOALs:
Identify key Tech Debt areas
Put in place Tech Debt management process for PEP
WHO: JR, Core Platform Team

Met with Corey and Cindy to furtrher refine this goal. They have already identified some of the key areas of Tech Debt that they are addressing in the PEP.


TEC13 (Code Health): Outcome 4 / Output 4.1[edit]

GOAL: Metrics defined and deployed for all 4 Code Health areas.
WHO: JR, Code Health Metrics Working Group

Working group has met a few times. Added a new workgroup member. Team shake out phase is done and moving towards making progress.


Other work[edit]

Selenium[edit]

  • T198389 Q1 Selenium framework improvements - moved remaining tasks to Q2 :(
  • T206624 Q2 Selenium framework improvements
  • T199133 Find top 15 target projects that could use Selenium tests to prevent incidents

Gerrit[edit]

Phabricator[edit]

  • Need to get phab1002 ready with Daniel

Jenkins[edit]

QA[edit]

SCAP[edit]

The scap pre-deploy fatal error check isn't catching fatals. Mukunda and Tyler have started investigating - https://phabricator.wikimedia.org/T121597#4652873

Standup![edit]

Antoine[edit]

  • What I plan to do this week
    • Standing hardwood in dinner room
    • paint entrance and dinner room
    • first layer of painting in
  • What I'm blocked on
  • Other?
    • On my spare time, started doc about enhancing how we runntests, specially filtering out tests coming from dependencies.


Dan[edit]


Greg[edit]

  • What I plan to do this week
    • Quarterly Check-In slides
    • Board presentation slides
    • hiring
    • TechConf session planning with Birgit
    • localizationupdate.. update (tl;dr: won't be resuming l10nupdate nightly job until after we've collectively identified a suitable architecture/plan for it going forward)
  • What I'm blocked on
  • Other?


Jean-Rene[edit]

  • What I plan to do this week
    • Outcome 1 / Output 1.1 - Update/refresh review queue
    • work on presentation for quarterly check-in
    • QA Team Slideset
    • Code Health Metrics WG tasks breakout
  • What I'm blocked on
  • Other?


Lars[edit]

  • What I plan to do this week
    • Read all the wiki pages
  • What I'm blocked on
    • Brain overheating
    • Too many accounts to manage
  • Other?
    • N/A


Mukunda[edit]

  • What I plan to do this week
    • Figure out why scap fatal check isn't working
      • Pairing with Tyler to hopefully solve this and get another scap release ready.
    • Work with Daniel to get phab2001 / phab1002 in shape
    • Take some time to answer questions for Lars if he has any confusions.
    • Start building the dev productivity survey in google forms
  • What I'm blocked on
    • Response from legal re: privacy policy
  • Other?


Tyler[edit]

  • What I plan to do this week
    • Jenkins updates
    • Finish up l10n-bot-watcher setup
    • Add liw to pwstore/resign
    • new blubber release (maybe -- Dan?)
    • new scap release (maybe -- pairing with Mukunda)
    • Gerrit avatar followups
    • Releases-jenkins icinga stuff
    • Moar keyholder review
    • Docs for ORES github sync problem (with heavy disclaimer)
  • What I'm blocked on
  • Other?


Zeljko[edit]

  • What I plan to do this week
    • T199133 Find top 15 target projects that could use Selenium tests to prevent incidents
    • T206466 Onboarding liw
    • T204068 QA: Automation Testing - port Echo Notification tests to Node.js
  • What I'm blocked on
  • Other?
    • T206620 Check 'Check endpoints for mwdebug2002.codfw.wmnet' failed: /wiki/{title} (Main Page) is WARNING: Test Main Page responds with unexpected body
    • T204871 Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out"
    • Celebrating 41 tomorrow! 🎉🕺👴


Grooming[edit]

Team Kanban Board Review and Triage[edit]


Once / month-ish review of backlog(s)[edit]


Kanban stats[edit]

Burnup chart