Wikimedia Release Engineering Team/Checkin archive/20150602

2015-06-09

 * Poke antoine about
 * https://phabricator.wikimedia.org/T76999 Puppet keeps restarting jobrunner service
 * Antoine: root cause is https://phabricator.wikimedia.org/T77002 Puppet Trebuchet provider compares refname with commit sha1 and does NOT refresh the git repo!
 * Should we put our fingers in the crazy home made puppet provider written for Trebuchet ?


 * Antoine might not be able to attend next meeting (Jun 16th). Crazy personal agenda.

Team Business

 * Ready-to-use Docker package for MediaWiki https://phabricator.wikimedia.org/T92826
 * Similar is https://phabricator.wikimedia.org/T87774 : Evaluate and decide on a distribution strategy targeted at VMs
 * Antoine: imho we should not be involved.
 * Chad: I have to at least have an opinion.
 * Tyler: Hackathon project that was touched a little, but certainly not ready for Prime Time
 * RelEng does not drive it.
 * ACTION: make it clear to bug opener that our team will not work actively on it.


 * Code deploy dashboard https://phabricator.wikimedia.org/T280
 * There is some bot spam tags on postmerge/deploy https://phabricator.wikimedia.org/p/Forrestbot/feed/  || example https://phabricator.wikimedia.org/T100248
 * Need to review Greg ideas and verify the bot match the request
 * ACTION: Mukunda file task on possible one page app in Phab for this


 * Try out hack (<?hh) for mediawiki-config https://phabricator.wikimedia.org/T91590  (assignee chad)
 * Antoine: would close and revisit later.
 * I've unassigned myself and added some thoughts. Nice project for someone who's bored.
 * STALLED: we need to complete Zend -> HHVM migration


 * Upgrade beta cluster to Jessie https://phabricator.wikimedia.org/T98758
 * Antoine: raised by Brandon Black. Prod has migrated to Jessie a while ago. Should be straightforward but time consuming.
 * Everything's not jessie. Make sure we're using the same builds in the same services. Although maybe beta can be a jessie playground? See also: staging?
 * ACTION: Will do it as 1/1 pairings. Figure out later who
 * Tyler: prelim cookie lick *sticks tongue out*
 * Dan: also down to pair


 * Automatic deployment of backend services on beta cluster https://phabricator.wikimedia.org/T100099
 * On beta, setup some more instances as jenkins slaves / use master / run composer to bring deps ... Need pairing.
 * ACTION: Antoine looking for buddies! Set up jenkins slave, write small jobs, figure out sudo/restart command.
 * ACTION: #together


 * We now all have shell access to gerrit/gitblit hosts ( https://phabricator.wikimedia.org/T100565 )
 * Gerrit runs on ytterbium, Gitblit (and SVN) are on antimony.
 * Chad: we need to finish killing gitblit.
 * Antoine: shell access granted and should be working now \O/

Antoine is sorry for the meeting hijack.

Pairing (#together) / Weekly Triages
** https://phabricator.wikimedia.org/T45086  Capture PHP warnings with stacktraces in MediaWiki and save to logstash ** https://phabricator.wikimedia.org/T89169 Log php fatals with full backtraces again (fatal.log on fluorine)
 * Chad: I'm going to do a weekly triage of the production error logs on Thursday. Anyone is welcome to the invite (9am Pacific, 5pm UTC?)
 * Add us all! We can always decline
 * Weekly triage for Browser Tests on Tuesday before RelEng weekly meeting (8am Pacific)
 * Antoine did a triage of his own on Monday afternoon. We got columns on the board (see our mailling list and https://phabricator.wikimedia.org/tag/wikimedia-log-errors/ ).
 * Painpoint: PHP errors are missing stacktraces :((


 * Weekly triages
 * Monday @ 10:50am Pacific (post deployment-cabal): Deployment Systems
 * Monday @ 11:20am Pacific: Beta Cluster
 * Tuesday @ 7am Pacific: CI
 * Tuesday @ 8am Pacific: Browser Tests

Calendar Phabricator or Gmail ?


 * Lets try Phabricator calendar: https://phabricator.wikimedia.org/calendar/
 * Calendar is public/global.
 * Can filter per user
 * Can not add a project to the event
 * Can invite all members of a project

AGREED: lets give Phabricator calendar a try.

Team Quarterly Goals
https://phabricator.wikimedia.org/maniphest/query/O9isnUt5IGLP/#R

Scrum of Scrums

 * https://phabricator.wikimedia.org/project/board/64/
 * Blocked on us: https://phabricator.wikimedia.org/maniphest/?statuses=open%2Cstalled&allProjects=PHID-PROJ-arpazvuktn2l647rb6us#R

Beta Cluster

 * https://phabricator.wikimedia.org/project/board/497/?order=priority

Jessie upgrade (see above) Jenkins jobs/slaves for oid services

Andrew: DNS resolved being changed. puppet/salt will break Has a feature switch \O/
 * project name is inserted in the instance FQDN
 * changing all certificates == no more ec2id).

Tyler: has external node classifier been updated ? Bitrotted patch: https://gerrit.wikimedia.org/r/#/c/202790/

ACTION: verify the DNS resolve for beta use case. dig @labs-recursor0.wikimedia.org

Deployment Cabal

 * Created tasks to track discussion to inform how to move forward
 * https://phabricator.wikimedia.org/T101024
 * https://phabricator.wikimedia.org/T101022
 * https://phabricator.wikimedia.org/T101023

Abstract model being discussed/worked on https://phabricator.wikimedia.org/T97068

Isolated CI instances

 * https://phabricator.wikimedia.org/tag/continuous-integration/board/?order=priority
 * Quarterly Priority: Disposable VMs - https://phabricator.wikimedia.org/T47499

(talked about it at beginning of meeting)
 * CI Isolation
 * Shoot the project and restart from scratch? We could use LXC container on top of the labs instance.
 * ACTION: need to summarize the current situation (re image creation).
 * ACTION: meeting with Chase, Andrew, and Antoine to catch up

Hiring

 * Automation Engineer: https://boards.greenhouse.io/wikimedia/jobs/62416
 * (short link grnh.se/gj5op4)