Wikimedia Release Engineering Team/Goals/201516Q2/checkin-10-27

= 30 (ish) day checkin on Q2 goals =
 * https://phabricator.wikimedia.org/tag/releng-201516-q2/
 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201516Q2

KPIs

 * WMF Log Errors - https://phabricator.wikimedia.org/T108749
 * DONE! - https://grafana.wikimedia.org/dashboard/db/production-logging
 * Jenkins/Zuul queue wait time - https://phabricator.wikimedia.org/T108750
 * (will collect them all here: https://grafana.wikimedia.org/dashboard/db/releng-kpis )


 * A main page currently list all boards tagged "releng" https://grafana.wikimedia.org/dashboard/db/releng-main-page


 * do mw core gate-processing time for this quarter, reassess for next quarter if needed


 * warning: carbon aggregate metrics per average after roughly a month. So old data most probably do not represent what you want.

= Reduce number of deploy tools from 3 to 2 =

Migrate all Service team owned services and MW deploys to scap3

 * https://phabricator.wikimedia.org/T109926
 * technically mediawiki deploys already use scap3 but none of the new features ;)
 * I really want to use it for Phabricator deployments but hit a few blockers last time I tried.
 * I (Tyler) have concerns about this goal
 * * We've done 1 deploy to beta cluster
 * ** service team as a whole, unaware
 * * One deploy wasn't flawless
 * ** Ugly Puppet wrangling
 * ** I (Tyler) think it went well, but I knew where the sticking points would be
 * ** Configuration concerns were raised (resulting doc sprint this week)
 * * "How far from AQS deployments are we?"

How to stay on track
Open question.

Initial thoughts:


 * Focus on Analytics Query Service (based on RESTBase) —definitely first real-world use-case
 * Limited permissions for deployers
 * Seemingly means they can't use ansible (likely)
 * TODO Puppet wrangling required
 * TODO Who else to talk to about it?
 * TODO How can we test it?
 * https://phabricator.wikimedia.org/T114999

= Retire Gerrit and Gitblit in favor of Phabricator =

Decommission Gitblit

 * https://phabricator.wikimedia.org/T111465
 * upstream is adding an "optional, unique repository name" which should solve our issues with callsigns
 * Phabricator hosted repositories are now available over https and ssh.
 * url redirect (gerrit->phab repo name) stuffness
 * stupid naming

Code review RFC: creation, publication, discussion, feedback etc

 * https://phabricator.wikimedia.org/T114311
 * Phabricator can now merge differential patches with dry dock.
 * talk more in the weekly

Weekly checkin
Weekly checkin created on thursday. Pending Antoine familly schedule ... Will follow up on internal mailling list.

= Reduce CI wait time =

CI checkin minutes https://www.mediawiki.org/wiki/Continuous_integration_meetings/2015-10-27/Minutes

All: Please read and comment on https://lists.wikimedia.org/pipermail/qa/2015-October/002414.html

Need a better time for US folks.

CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves

 * https://phabricator.wikimedia.org/T111106


 * 10 standing instances, will automatically spike to 20 as needed
 * nodepool doesn't send metrics to statsd :( :( :(

Migrate majority of CI jobs to Nodepool

 * https://phabricator.wikimedia.org/T114315


 * < 10% right now
 * we need the caching implemented before we can do much more otherwise we risk being blocked/rate-limited by upstreams (github etc)
 * everyone, please review the caching proposal! :)

= Release MediaWiki 1.26 =

A quality MW 1.26 successfully released

 * https://phabricator.wikimedia.org/T110486
 * Board pruned of extraneous material, needs more triage. https://phabricator.wikimedia.org/tag/mw-1.26-release/
 * Chad will send email to wikitech-l asking for help with the blockers today

Be firm in what we are going to accept in 1.26.