Wikimedia Release Engineering Team/Checkin archive/20150602

Team Business[edit]

  • Ready-to-use Docker package for MediaWiki https://phabricator.wikimedia.org/T92826
    • Similar is https://phabricator.wikimedia.org/T87774  : Evaluate and decide on a distribution strategy targeted at VMs
    • Antoine: imho we should not be involved.
    • Chad: I have to at least have an opinion.
    • Tyler: Hackathon project that was touched a little, but certainly not ready for Prime Time
  • RelEng does not drive it.
  • ACTION: make it clear to bug opener that our team will not work actively on it.
  • Try out hack (<?hh) for mediawiki-config https://phabricator.wikimedia.org/T91590 (assignee chad)
    • Antoine: would close and revisit later.
    • I've unassigned myself and added some thoughts. Nice project for someone who's bored.
  • STALLED: we need to complete Zend -> HHVM migration
  • Upgrade beta cluster to Jessie https://phabricator.wikimedia.org/T98758
    • Antoine: raised by Brandon Black. Prod has migrated to Jessie a while ago. Should be straightforward but time consuming.
    • Everything's not jessie. Make sure we're using the same builds in the same services. Although maybe beta can be a jessie playground? See also: staging?
  • ACTION: Will do it as 1/1 pairings. Figure out later who
  • Tyler: prelim cookie lick *sticks tongue out*
  • Dan: also down to pair
  • Automatic deployment of backend services on beta cluster https://phabricator.wikimedia.org/T100099
    • On beta, setup some more instances as jenkins slaves / use master / run composer to bring deps ... Need pairing.
  • ACTION: Antoine looking for buddies! Set up jenkins slave, write small jobs, figure out sudo/restart command.
  • ACTION: #together
  • We now all have shell access to gerrit/gitblit hosts ( https://phabricator.wikimedia.org/T100565 )
    • Gerrit runs on ytterbium, Gitblit (and SVN) are on antimony.
    • Chad: we need to finish killing gitblit.
  • Antoine: shell access granted and should be working now \O/

Antoine is sorry for the meeting hijack.

Pairing (#together) / Weekly Triages[edit]

    • Chad: I'm going to do a weekly triage of the production error logs on Thursday. Anyone is welcome to the invite (9am Pacific, 5pm UTC?)
      • Add us all! We can always decline
    • Weekly triage for Browser Tests on Tuesday before RelEng weekly meeting (8am Pacific)
  • Antoine did a triage of his own on Monday afternoon. We got columns on the board (see our mailling list and https://phabricator.wikimedia.org/tag/wikimedia-log-errors/ ).
  • Painpoint: PHP errors are missing stacktraces :((
   **  https://phabricator.wikimedia.org/T45086  Capture PHP warnings with stacktraces in MediaWiki and save to logstash
   ** https://phabricator.wikimedia.org/T89169  Log php fatals with full backtraces again (fatal.log on fluorine)
  • Weekly triages
    • Monday @ 10:50am Pacific (post deployment-cabal): Deployment Systems
    • Monday @ 11:20am Pacific: Beta Cluster
    • Tuesday @ 7am Pacific: CI
    • Tuesday @ 8am Pacific: Browser Tests

Calendar Phabricator or Gmail ?

AGREED: lets give Phabricator calendar a try.

Team Quarterly Goals[edit]


Scrum of Scrums[edit]

Blocked on us: https://phabricator.wikimedia.org/maniphest/?statuses=open%2Cstalled&allProjects=PHID-PROJ-arpazvuktn2l647rb6us#R

Beta Cluster[edit]


Jessie upgrade (see above) Jenkins jobs/slaves for oid services

Andrew: DNS resolved being changed. puppet/salt will break

    • project name is inserted in the instance FQDN
    • changing all certificates == no more ec2id).

Has a feature switch \O/

Tyler: has external node classifier been updated ? Bitrotted patch: https://gerrit.wikimedia.org/r/#/c/202790/

ACTION: verify the DNS resolve for beta use case. dig @labs-recursor0.wikimedia.org

Deployment Cabal[edit]

Abstract model being discussed/worked on https://phabricator.wikimedia.org/T97068

Isolated CI instances[edit]

Quarterly Priority: Disposable VMs - https://phabricator.wikimedia.org/T47499

(talked about it at beginning of meeting)

  • CI Isolation
    • Shoot the project and restart from scratch? We could use LXC container on top of the labs instance.
    • ACTION: need to summarize the current situation (re image creation).
    • ACTION: meeting with Chase, Andrew, and Antoine to catch up


Other Work[edit]