Wikimedia Services/Meetings/2015-03-19-Ops

From MediaWiki.org
Jump to navigation Jump to search

Deployment Process[edit]

Current steps for deploying a service on the SCA cluster:

No Item Owner
1 Get service owner to provide answers to some questions
  • T97031
  • This step is a strong dependency of all the others as it can provide a lot of information
Services/Ops
2 Define: Internal/External IPs, setup DNS
  • This step is as lean as it can be right now, no proposals in making it better
Ops
3 Create the service class and role.
  • Define and Implement an as much as possible automated process T97036
Ops
4 Create the init script, logrotate, etc templates
  • mostly done in service::node already
Services -> Ops
5 Add the repo to role::deploment for trebuchet.
  • Use the same process as for 3, merging into 3 T97036
Ops
6 Add the service role to role::sca.
  • Use the same process as for 3, merging into 3 T97036
Ops
7 Set up LVS. This currently means:
  • Puppet patches. This should possible use the same process as 3, merging into step 3 T97036

ii) Changes on the pybal configuration. could be automated via a configuration discovery system iii) Restarting pybal. could be automated via a configuration discovery system

Ops
8 Set up monitoring and ferm
  • Amend service::node to contain ferm as well, merging the step into 4
  • Improve and standardize monitoring T94821
Ops
9 Set up user access rights
  • Provide sane defaults, possibly via the same process as for 3, merging the step into 3 T97036
Ops
10 Add rules to varnish
  • Assuming Public IP is needed, the same process as for 3, merging the step into 3 T97036
Ops
11 Deployment Services -> Ops

The table shows the deployment steps and their respective owners (executors), revealing a pretty high overhead for Ops. However, by restraining this discussion to the SCA cluster, the workflow can be greatly simplified and streamlined, especially taking into account the Service team's effort to template and standardise service configuration and execution. Hence, except steps (1) and (2), most of the workflow could be easily owned by the Services team.

Since then, the table has been updated with proposals on how to make this better and will keep on being updated while it is being worked on.

Updates[edit]