Wikimedia Release Engineering Team/Checkin archive/20160718

= 2016-07-18 =

Special Guest - Rachel Farrand!
Team offiste planning!

Spreadsheets!
 * Timing: https://docs.google.com/spreadsheets/d/1slYNnWJOAoNGK0Hn7wtvvShD2_ORO07I0fWNokuMry8/edit#gid=0
 * Location: https://docs.google.com/spreadsheets/d/1_8KXdObI8tw033n4L245KoE1izgsdxp3h0BnZwGqk4s/edit#gid=0

Notes:
 * Rachel will begin working on hotel/venue options in Chicago and DC \o/

Special Guest - Andrew with CI questions
https://grafana.wikimedia.org/dashboard/db/releng-kpis https://grafana.wikimedia.org/dashboard/db/releng-zuul
 * Need a good metric to watch for labs changes impact on CI
 * Respawn may be causing DNS issues, can we increase the wait time there?
 * What metrics do we have:

Vacations/Important dates
How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off ...
 * July 25 - August 15: Željko vacation. Will have laptop with me. Reachable via phone.
 * July 30 - August 21: Antoine vacation. At home 1st week.
 * August 1st - 5th: Mukunda - vacation: Concert & relaxation
 * January 9-11: Dev Summit
 * January 12-13: All Hands

Rotating positions and absences
Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/u/blockers

weeks of July 11 and 18

 * Train: Chad
 * wmf.10
 * wmf.11
 * SoS: Tyler
 * https://phabricator.wikimedia.org/E155/15
 * https://phabricator.wikimedia.org/E155/16
 * Out:
 * Tyler - July 14+15 (Thur+Fri)
 * Mukunda - July 15
 * Chad - July 15

weeks of July 25 and Aug 1

 * Train: Tyler
 * wmf.12
 * wmf.13
 * SoS: Mukunda / Tyler
 * https://phabricator.wikimedia.org/E155/17 - Mukunda
 * https://phabricator.wikimedia.org/E155/18 - Tyler
 * Out:
 * Zeljko: July 25 - Aug 15
 * Antoine: July 30 - Aug 21
 * Mukunda: Aug 1-5

Time spent spreadsheet

 * FYQ1 (July-Sept 2016): https://docs.google.com/spreadsheets/d/1IrwGPdTDZ6H8x9Mf5dmCYlkK4hZ8sbUSLODEM4cFc4g/edit#gid=0
 * Fixed the "unallocated" field. Added columns and formula hadn't been updated (shouldl be 1-N9 now, not L9)

Actions from last meeting

 * Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 https://phabricator.wikimedia.org/T138778
 * TODO: Greg. What is the priority? Check with Jaime.  We have other priorities.
 * ✅ Commented/asked on task.


 * SWAT deploy next steps:
 * ✅ TODO: Zeljko do an 8am Pacific SWAT deploy with Tyler
 * ✅ TODO: After that, update docs
 * NEXT: stalled pending finding people to do the SWAT window while Antoine and Zeljko are on vacation

Scrum of Scrums

 * https://phabricator.wikimedia.org/project/board/64/
 * Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R

This week

 * Blocking
 * Blocked
 * Updates
 * Zuul upgraded this week, should address a bunch of issues
 * Updates
 * Zuul upgraded this week, should address a bunch of issues

Last week

 * Blocking
 * None
 * Bocked
 * None
 * Updates
 * New gerrit update needs testing: https://gerrit-new.wikimedia.org/r/
 * wmf.9 was reverted, wmf.10 will get pushed to group0 and group1 today on a short schedule
 * Retrospective to come https://wikitech.wikimedia.org/wiki/Incident_documentation/20160712-EchoCentralAuth

Other Team Business

 * European SWAT deploys next steps (
 * stalled until after Antoine and Zeljko's vacations, unless 2 other trained SWATers step forward


 * TechDebt Analysis
 * https://docs.google.com/spreadsheets/d/1Kxj9p4fKVNo2h23yAQVoOGg77dZ4FLxeXuYrH-1CrPA/edit#gid=0
 * Greg hasn't had time to review the sheets
 * Antoine and Zeljko paired on filling parts out, others want to do that as well? It helps :)


 * Andrew interrupts with nodepool questions
 * New labvirt nodes coming online today, please be alert to weird behavior
 * Labs OPs would like to see metrics about testing performance:
 * Benefit from increasing # of concurrent nodes
 * Cost/benefit from changing rate of node recreation

Phase out Ubuntu Precise
keyresult tasks:
 * Replace primary production Continuous Integration host -
 * Meeting with Chase on Thursday was skipped
 * Faidon will respond this week with his thoughts, we're waiting on him
 * Upgrade Phabricator database servers to Maria10/Jessie -
 * waiting on Jaime to failover m3-master
 * Upgrade Beta Cluster database servers to Maria10/Jessie -
 * waiting on Jaime to priority

Reduce Technical Debt
Perform a technical debt analysis of software and services maintained by WMF Release Engineering -

Streamline deployments (long-lived branches)
keyresult task: project view: https://phabricator.wikimedia.org/project/view/2117/
 * Convert our production deployment strategy to use long-lived branches -


 * reorganized/repurposed other meetings to work on this
 * time this past week was mostly spent on Phabricator fixing (task graphs, oh boy do we like tracking tasks)

CI Scaling/Nodepool

 * CI Outage last week: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160706-CI-Outage
 * Follow-ups:
 * https://phabricator.wikimedia.org/T139771 - "Identify metric (or metrics) that gives a useful indication of user-perceived (Wikimedia developer) service of CI"

Browser tests

 * working on survey report
 * task https://phabricator.wikimedia.org/T139247
 * work in progress report https://www.mediawiki.org/wiki/User:ZFilipin_(WMF)/Browser_testing_user_satisfaction_survey
 * core bt found a bug (worked!) last week
 * https://phabricator.wikimedia.org/T140220
 * language screenshots are now running in CI+Sauce, will upload to Commons using existing ruby gem
 * https://phabricator.wikimedia.org/T139613

Differential migration
Differential weekly (https://etherpad.wikimedia.org/p/diffuerential-weekly ) TODOs:


 * Mukunda had questiosn for antione re puppet (keys into the private store, production or other? for CI image builder)
 * see: https://cloudbees.zendesk.com/hc/en-us/articles/203802500-Injecting-Secrets-into-Jenkins-Build-Jobs


 * Update documentation on creating/renaming of repos in Diffusion
 * https://phabricator.wikimedia.org/T139688


 * Update task with discussion about ACLs?
 * https://phabricator.wikimedia.org/T130786


 * Announce plan to migrate MW-Vagrant to Differential
 * https://phabricator.wikimedia.org/T131419#2439362
 * outstanding patches should be either merged, abandoned or migrated to differential revisions.


 * semi-related TODO: file task re upgrading MW-Vagrant guests to Jessie

Beta Cluster

 * "deployment-fluorine becomes unresponsive frequently" - https://phabricator.wikimedia.org/T140313
 * From Matt (who's trying to diagnose login issues): "Happened again. I worked around it by rebooting in wikitech, but shouldn't keep happening."

Other

 * Figure out how to help Jaime with the DB schema inconsistencies issue:
 * https://phabricator.wikimedia.org/T132416 and https://phabricator.wikimedia.org/T104459 (see also: https://www.mediawiki.org/wiki/Development_policy#Database_patches )
 * What can we do in CI to help prevent, mostly?
 * Chad will lick this cookie :)


 * "Consider alternative processes for Unbreak Now bugs, especially those which cross-cut components" - https://phabricator.wikimedia.org/T140207#2456573
 * If you have opinions on this, please reply. I plan to stay engaged

Last week

 * Gerrit upgrade / Zuul upgrade
 * Target host to replace gallium
 * Sync up with Tyler for CI / gallium phase out
 * Moaar maintenance
 * Offsite site/date

Last week

 * Gerrit. Gerrit. Gerrit.

This week

 * Moar gerrit. Train. Choo choo.

Last week

 * Getting back

Last week

 * Phabricator upgrade on wednesday
 * The upgrade introduced a new task dependency graph which is awesome but also introduced a major performance issue on tracking tasks
 * I've been working on a blog post about recent phabricator stuff, including the abovementioned task graph stuff: https://etherpad.wikimedia.org/p/phabricatorphacilityworkblogpost
 * Figure out where to start on the long lived branches project

This week

 * Get the merge-wmf-branch script cleaned up and shared with the team for feedback
 * Brainstorm improvements / other ideas around branch merging / cherry-picking

This week

 * MW Canary work

Last week

 * SWAT training/documentation
 * Task wrangling

Last week

 * finishing migration of browsertests* Jenkins jobs to selenium* jobs https://phabricator.wikimedia.org/T128190
 * Analyze (and share analysis of) the browser testing feedback survey https://phabricator.wikimedia.org/T139247
 * Run language screenshots script for VisualEditor in Jenkins https://phabricator.wikimedia.org/T139613
 * Figure out what to do with Firefox + Selenium https://phabricator.wikimedia.org/T137561
 * SWAT training

This week

 * trying to do the first SWAT (depending on https://phabricator.wikimedia.org/T140264 MediaWiki deployment shell access request for zfilipin)
 * Analyze (and share analysis of) the browser testing feedback survey https://phabricator.wikimedia.org/T139247
 * Run language screenshots script for VisualEditor in Jenkins https://phabricator.wikimedia.org/T139613