Wikimedia Services/Meetings/2015-03-19-Ops

Deployment Process

Current steps for deploying a service on the SCA cluster:

No	Item	Owner
1	Get service owner to provide answers to some questions T97031 This step is a strong dependency of all the others as it can provide a lot of information	Services/Ops
2	Define: Internal/External IPs, setup DNS This step is as lean as it can be right now, no proposals in making it better	Ops
3	Create the service class and role. Define and Implement an as much as possible automated process T97036	Ops
4	Create the init script, logrotate, etc templates mostly done in service::node already	Services -> Ops
5	Add the repo to role::deploment for trebuchet. Use the same process as for 3, merging into 3 T97036	Ops
6	Add the service role to role::sca. Use the same process as for 3, merging into 3 T97036	Ops
7	Set up LVS. This currently means: Puppet patches. This should possible use the same process as 3, merging into step 3 T97036 ii) Changes on the pybal configuration. could be automated via a configuration discovery system iii) Restarting pybal. could be automated via a configuration discovery system	Ops
8	Set up monitoring and ferm Amend service::node to contain ferm as well, merging the step into 4 Improve and standardize monitoring T94821	Ops
9	Set up user access rights Provide sane defaults, possibly via the same process as for 3, merging the step into 3 T97036	Ops
10	Add rules to varnish Assuming Public IP is needed, the same process as for 3, merging the step into 3 T97036	Ops
11	Deployment	Services -> Ops

The table shows the deployment steps and their respective owners (executors), revealing a pretty high overhead for Ops. However, by restraining this discussion to the SCA cluster, the workflow can be greatly simplified and streamlined, especially taking into account the Service team's effort to template and standardise service configuration and execution. Hence, except steps (1) and (2), most of the workflow could be easily owned by the Services team.

Since then, the table has been updated with proposals on how to make this better and will keep on being updated while it is being worked on.

Updates

service::node - phabricator:T95533
blueprint patch for all things needed in ops/puppet - https://gerrit.wikimedia.org/r/#/c/205350/
https://phabricator.wikimedia.org/T97036 has been resolved and https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/utils/new_wmf_service.py is an automation script used to greatly reduce the amount of work needed to introduce a new service in the infrastructure.
https://phabricator.wikimedia.org/T97031 described a new step, added after the initial meeting. It has been resolved, the result is at: https://phabricator.wikimedia.org/project/profile/1305/