Wikimedia Release Engineering Team/Checkin archive/20160926

= 2016-09-26 =

Vacations/Important dates
How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
 * Oct 01: Start of Q2
 * Oct 05: Morning, few hours, airport run - Tyler
 * October 10: US Holiday (Indigenous People's Day https://theintercept.com/2015/10/12/columbus-day-is-the-most-important-day-of-every-year/ )
 * October 17-21: Offsite in Washington D.C.
 * October 31 & November 4th: Mukunda
 * October 28 - Nov 2 (ish) - Chad (vacation to Cabo)
 * November 24: US Holiday (Thanksgiving)
 * January 9-11: Dev Summit
 * January 12-13: All Hands

Time spent spreadsheet

 * Week 38 - https://docs.google.com/spreadsheets/d/1IrwGPdTDZ6H8x9Mf5dmCYlkK4hZ8sbUSLODEM4cFc4g/edit#gid=830401392

Rotating positions and absences
Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/u/blockers

weeks of Sep 19 and Sep 26

 * Train: Tyler
 * wmf.20
 * no deploys week of Sept 26
 * SoS: Dan
 * https://phabricator.wikimedia.org/E155/25
 * https://phabricator.wikimedia.org/E155/26
 * Out:
 * September 22-23 Željko on a conference

weeks of Oct 03 and Oct 10

 * Train: Tyler
 * wmf.21
 * [ wmf.22]
 * SoS: Chad
 * https://phabricator.wikimedia.org/E155/25
 * https://phabricator.wikimedia.org/E155/26
 * Out:
 * October 10: US Holiday (Indigenous People's Day)

Oct 17 and Oct 24

 * Train
 * none on Oct 17

Actions from last meeting
TODO: Antoine write a migration plan for gallium TODO: Talk about release process/strategy first week of Q2 (Oct 3) with Ops (Brandon)
 * lIn my head only. Been busy with wmf.19 explosion / random Zend 5.5 segfault etc.
 * Still to do, went syphoned in jobrunner issue / lack of monitoring / bunch of reviews etc
 * Do this week

Scrum of Scrums

 * https://phabricator.wikimedia.org/project/board/64/
 * Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R

This week

 * Blocking
 * Blocked
 * Updates
 * New scap (3.3.0)
 * scap caches local config for it's deployment (machines don't have to reach back to tin)
 * New scap (3.3.0)
 * scap caches local config for it's deployment (machines don't have to reach back to tin)

Last week

 * Blocking
 * Blocked
 * Updates
 * wmf.19 exploded: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160915-MediaWiki
 * Job shelling out to mwscript maintenance/getConfiguration.php
 * HHVM writes to its bytecode cache (sqlite file) which fails due to ulimit
 * No monitoring of jobs
 * Antoine cant find a dashboard of jobs failling (neither in Grafana or Logstash)
 * Reminder: no deploys week of Sept 26th
 * Changes to beta cluster puppetmaster cherry-pick process Coming Soon™ https://phabricator.wikimedia.org/T135427
 * Reminder: no deploys week of Sept 26th
 * Changes to beta cluster puppetmaster cherry-pick process Coming Soon™ https://phabricator.wikimedia.org/T135427

Other Team Business

 * Contint root proposal
 * Can we just have an ops person?


 * do we know if vendors (Antoine and Željko) are coming to all hands?
 * TLDR: yes


 * Short term contractors budget
 * explicit list, obvs
 * time it takes to onboard
 * teams need to make this an explicit goal themselves

Offsite

 * Agenda being drafted at https://docs.google.com/document/d/1lmxtQkAuDJY4Vv8oFWihSmhz1y-JgUzsb11ebFCOz6g/edit#

Replace primary production Continuous Integration host -

 * Huge delay on figuring out a network lan to host the new machine
 * Puppet refactoring mostly done
 * Need a migration plan then schedule the switch

Upgrade Beta Cluster database servers to Maria10/Jessie -

 * Gotta shutdown then drop the old instances?
 * Gotta shutdown then drop the old instances?

Reduce Technical Debt
Perform a technical debt analysis of software and services maintained by WMF Release Engineering -

Streamline deployments (long-lived branches)
keyresult task: project view: https://phabricator.wikimedia.org/project/view/2117/
 * Convert our production deployment strategy to use long-lived branches -

Browser tests
- mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser T94577 - Firefox v47 breaks mediawiki_selenium T137561 - Update mediawiki_selenium to use Marionette T137540

-> need geckodriver to be packaged (tis a rust app : or drop Firefox?

Beta Cluster

 * MW apps reimaged to Jessie with ops leading
 * web servers (deployment-mediawiki*)
 * deployment servers (deployment-mira is primary, deployment-tin02 backup
 * jobrunner02 done
 * tmh to be done later

DB Inconsistencies
https://phabricator.wikimedia.org/T132416 and https://phabricator.wikimedia.org/T104459 (see also: https://www.mediawiki.org/wiki/Development_policy#Database_patches )

Last week

 * Help on wmf.19 issue has I can
 * Gallium migration plan
 * Overdue :(
 * Nodepool upgrade hopefully
 * Done (less API queries to OpenStack), follow up from August incident
 * Monitored via list of tasks on https://grafana.wikimedia.org/dashboard/db/nodepool (look at 10 days)
 * Migrate some jobs hopefully

This week

 * Gallium migration plan

Last week

 * Learn to play the ukelele
 * Finally looping back on DB consistencies since I have free cycles this week (what I have what?!)
 * Wrap up some logging-related cleanups I've been poking
 * Yell at Timo re: static files.

Last week

 * Beta DBs

Last week

 * Fix Phab permissions issue: https://phabricator.wikimedia.org/T146055
 * Hopefully get scap swat stuff code reviewed and deployed
 * code reviewed by tyler, I'm addressing his feedback
 * Looking into a way of organizing the swat patches that doesn't involve manual wikitext entry on https://wikitech.wikimedia.org/wiki/Deployments
 * Made some progress but still figuring this out

This week

 * Finish getting scap swat and cli stuff merged
 * Talk with Greg about the automation of deployment blockers, release milestones/tasks, etc.

Last week

 * wmf.19
 * scap3 updates (blocking things)
 * Code review for llb

This week

 * fixup https://gerrit.wikimedia.org/r/#/c/310719/
 * scap3 catchup

Last week

 * T94577 mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser.
 * T137561 Firefox v47 breaks mediawiki_selenium
 * T137540 Update mediawiki_selenium to use Marionette
 * T145718 CentralNotice: Intermittent unexplained browser test failures
 * Testival.eu conference
 * EU SWAT

This week

 * MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job T144912
 * mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser T94577
 * Improve documentation around running/writing (with lots of examples) browser tests T108108