Wikimedia Release Engineering Team/Deployment pipeline/2019-03-14

From MediaWiki.org
Jump to navigation Jump to search

2019-03-14[edit]

Last Time[edit]

Current Quarter Goals[edit]

  • Roughly 2 weeks left!
    • changeprop
      • Should we bump this?
      • marko: we have to fix the kafka driver depending on the node version and kafka version: how will we have different versions of different things?
      • alex: side-step the problem and build the image with node6

Next Quarter Goals[edit]

Services to migrate[edit]

  • cpjobqueue
    • marko: can use node6 image, but scaling is still a problem: sometimes it uses a lot of resources, sometimes it does nothing. I worry about scaling. How do we determine the resources needed so that service doesn't starve?
    • jeena: are we against autoscaling?
    • alex: autoscaler is not yet deployed. Could we work from current scb capacity?
    • marko: will have to continue conversation about number of workers per pod -- we don't want 100 pods, nor do we want to have 1 pod that is massive, so we'll have to find a balance
    • liw: are there means to perform benchmarks and capacity tests?
    • marko: we know current resource usage
    • alex: we have ways to perform benchmarks (jeena used that for blubberoid), but in this case we have prod services already
    • marko: the most important thing is to get everything correct for when surges happen
    • alex: I think we can accomodate, we're adding more capacity next quarter, we can also add more pods as needed. Provides more flexibility than the current environment
    • marko: still manual
    • alex: yes, manual, but we don't have any way to scale currently, so this is an improvement
    • marko: worse-case scenario is that cpjobqueue would "just" begin to lag
  • ORES

New Services[edit]

  • mobrovac: RESTBase?
    • marko: for next quarter Q4 we want to split RESTBase into 2 services: api routing layer and storage layer (is current thinking) -- storage on cassandra nodes (where resbase is) -- api routing on k8s
  • alex: termbox (wmde) -- renders javascript for wikidata; session storage for CPT -- moves sessions into cassandra; Discourse for Quim

General[edit]



TODOs from last time[edit]

  • In progress In progress TODO various attack vectors document to start
    • antoine and I started to talk about it
    • thcipriani to more thoroughly noodle
  • TODO: Joe & James_F to work on eventual 2019-04-01 email
    • Beware: announces on 04/01 can be considered an April's fool

RelEng[edit]

  • Dan starting work on .pipeline/config.yaml
    • The pipeline should provide a way to save artifacts from a stage
    • .pipeline/config.yaml Proposal The Latest™
    • marko: how do services relate to the blubber.yaml?
    • dan: you could use the same blubber file if you want, or you could specify a seperate file if that makes sense, I want to have sensible defaults for these things, but if you do have special requirements you should be able to specifiy those and control the execution and steps in the pipeline. You can specify variants that are built and run in parallel in addition to the sequential steps of the pipeline.
    • marko: if I have one service and I want to use this to tell jenkins what to do that could also be done?
    • dan: yep. This has come up since we have people who want to run helm test, but don't want to deploy to k8s. There are other use-cases that want test variants but not run helm test. This allows folks to specify which parts of the pipeline execute and in which order
    • brennen: what happens if wind up with CI tooling that conflicts with this?
    • dan: what we have now is written in groovy so we'll have to refactor unless we move to jenkins x -- it's possible that this could be a benefit -- perhaps there could be a translation layer
    • hashar: the groovy is very minimal at this point so should be easy to refactor -- let's migrate every year to ensure that we keep our code to a minimum! Point taken on potential of creating the next tech debt though.
  • Tyler: We're migrating stuff to v4 of Blubber.
  • Jeena: Talking with Greg about local dev environment, we're working on the mediawiki part whereas pipeline is working on services. However: Seems like it's not really useful for developers if they can't run services in the local env. We've been adding services like RESTBase and parsoid; Greg also mentioned Zotero. These aren't classified as a priority to move to the pipeline for various reasons. For example RESTBase.
    • marko: you can use SQLite for RESTbase.
    • Jeena: So there's not going to be an image built in near future...
    • marko: Shouldn't be too much an issue. Task becomes repetitive.
    • Jeena: My thought was: We're not officially putting them into k8s / prod pipeline... Is it ok if we build images in the pipeline that aren't going to production.
    • marko: We do serve the images... Well, we could.
    • alex: depends on the service we're talking about. RESTBase and parsoid? moving to the pipeline is a saner approach than building manually
    • dan: we could still use the same process
    • it's going to depend on the service whether or not we put things through the pipeline. I.e. some services (MediaWiki) are not going to fit through the pipeline currently and in those instances we'll have to build manually (e.g., with docker-pkg)
  • Antoine: track which version of Debian package are in which container image, weren't we talking about a system to track this? This is going to be an issue soon. How do we know what images we need to rebuild for update?
  • alex: adding support for this to debmonitor, but it is not resourced. We want to do *exactly that*. We're writing an image lifecycle document inside serviceops
  • hashar: if you have documents I'd be happy to read them

Serviceops[edit]

Services[edit]

As Always[edit]