Talk:Wikimedia Release Engineering Team/MW-in-Containers thoughts

About this board

Compiled configuration per wiki

14
TCipriani (WMF) (talkcontribs)

Currently Wikimedia creates a /tmp/ json file that it subsequently reads from until there is an mtime update to InitialiseSettings.php: I don't know how that works in a container. We could carry that forward; however, if it is not using a shared volume of some kind then each pod will have some startup cost: having to regenerate this configuration. Alternatively, regenerating via pipeline makes a long process even longer.

Jdforrester (WMF) (talkcontribs)

My vision is that we stop making the temporary JSON files on-demand on each server, and instead pre-generate them (per-wiki or a single mega file, not sure) on the deployment server and sync the compiled config out in the scap process instead of InitialiseSettings.php. Then, in the container universe, this JSON blob gets written into pods through the k8s ConfigMap system, rather than as a file tied to a particular pod tag.

TCipriani (WMF) (talkcontribs)

+1 -- makes sense to me, that's what I'd like as well. Some investigation needs to happen -- random server shows 941 json files totaling 74MB of config for each wiki version.

AKosiaris (WMF) (talkcontribs)

Note that ConfigMaps have a limit of 1MB (actually it's a bit more than that, but it's best to stick to 1MB mental model). That stems from etcd having a max object size of 1MB (again a bit more, like 1.2MB, but I digress). So we aren't going to be able to use that approach to inject that content into the pods (unless we split it into many many ConfigMaps).


We could alternatively populate the directory on kubernetes hosts and bind mount it to all pods (especially easy if it's read only from the container's perspective). But then we would have to figure out how to populate it on the kubernetes nodes, which is starting to imply scap.

Jdforrester (WMF) (talkcontribs)

Yeah. :-( Theoretically we could do one ConfigMap per wiki, but that means a new default setting would need 1000 ConfigMaps to be updated, which suggests a race condition/etc. as it rolls out.

DDuvall (WMF) (talkcontribs)

Does any one JSON file approach 1M in size? K8s has a "projected" volume feature that allows multiple volume sources (including ConfigMaps) to be mounted under the same mount point, so ostensibly we could have one ConfigMap per wiki but still have them under the same directory on a pod serving traffic for all wikis. Still a bit cumbersome from a maintenance perspective perhaps but it might work around etcd's technical limitation.

TCipriani (WMF) (talkcontribs)
Does any one JSON file approach 1M in size?

Nope. The biggest one currently is:

108K    /tmp/mw-cache-1.36.0-wmf.4/conf2-commonswiki.json
LarsWirzenius (talkcontribs)

What about doing the compilation in CI, and not during scap deployments? Would that be feasible? Maybe later?

Jdforrester (WMF) (talkcontribs)

What about doing the compilation in CI, and not during scap deployments? Would that be feasible? Maybe later?

Doable, but makes merges a mess unless we write a very special git merge driver, and it bloats the git repo with these files, which can get relatively big as Tyler points out. 🤷🏽‍♂️

TCipriani (WMF) (talkcontribs)

One thing I would like, regardless of CI compilation, is a way to roll these back quickly: this is the advantage of having them generated on demand currently, and one thing that generating them on the fly at deploy time would slow down (maybe).

JHuneidi (WMF) (talkcontribs)

I had some ideas on configuration provided each pod is dedicated to a single wiki (which might be nice if we wanted to scale based on traffic per wiki).

I thought it would be ideal if we could inject the configuration (overrides) into the pod at deploy time, but it seems like there's too much configuration to do that.

We could add the configuration with a sidecar container. That could assist with the size issue of the configuration and the rolling back issue as well, I think.

GLavagetto (WMF) (talkcontribs)

I don't think we will go with one wiki per pod, that would be extremely impractical as we would need to have 900 separated deployments.

When we start the migration, we will probably have one single deployment pod, and then at some point we might separate group0/1/2, but I don't see us going beyond that.

There are other practical reasons for this, but just imagine how long the train would take :)

JHuneidi (WMF) (talkcontribs)

Yeah, 900 is a lot, but I don't think that is such a problem for Kubernetes. I thought we could have an umbrella chart with the 900 wikis, and then update the image tags and install with helm once. I have never tested helm with such a large chart but since the release info is stored in configmaps/secrets(helm 3) I guess we could run into a size limit issue there so...maybe you are right :P


I think the sidecar container with configuration is feasible for a single deployment as well, though

DKinzler (WMF) (talkcontribs)

My current thinking is:

  1. Give MediaWiki a way to load settings from multiple JSON files (YAML could be supported for convenience, but we probably don't want to use it in prod, for performance reasons). This needs to cover init settings and extensions to load as well as the actual configuration.
  2. pre-generate config for each of the 900 wikis when building containers (maybe by capturing the result of running CommonSettings.php). Mount the directory that contains the per-wiki config files (from a sidecar?). Let MediaWiki pick and load the correct one at runtime.
  3. pre-generate config files for each data center and for each server group (will all the service addresses, limit adjustments, etc). Deploy them via ConfigMap (one chart per data center and server group). Let MediaWiki load these at runtime.
  4. Let MediaWiki load and merge in secrets from a file managed by Kubernetes.
  5. Use a hook or callback to tell MediaWiki to merge live overrides from etcd at runtime.
  6. Extract all the hooks currently defined in CommonsSettings.php into a "WmfQuirks" extension


How does that sound?

Reply to "Compiled configuration per wiki"

Objective: MediaWiki* is automatically packaged into a k8s pod**, which is semi-automatically deployed*** into Wikimedia production

3
AKosiaris (WMF) (talkcontribs)

I just noticed this, I wonder how my pedantic inner self did not complain already about terminology


Anyway, may I suggest we switch the wording to:

Objective: Mediawiki* is automatically packaged into (one or more) OCI container images, which are semi-automatically deployed** into Wikimedia Production as kubernetes pods


I am on purpose putting there the (one or more) as we shouldn't restrict ourselves and having >1 OCI container images might turn out to solve problems we still have not foreseen.

Jdforrester (WMF) (talkcontribs)

Oh, sure, good point. Will fix.

Jdforrester (WMF) (talkcontribs)

mediawiki-config size and Logos

2
AKosiaris (WMF) (talkcontribs)

I 've had a quick look into mediawiki-config and it's size. So, I see the following


61MB, 50 of which are images, 47 of which are project logos. Is there a reason we ship project logos as configuration?

And most importantly to stay on track, how are we going to ship all those logos in the brave new k8s world?


du -h --exclude=.git | sort -rh |head -30

61M .

50M ./static/images

50M ./static

47M ./static/images/project-logos

Jdforrester (WMF) (talkcontribs)

We ship them as config because they're non-variant on flavour of MediaWiki, we want them to be statically mapped, and we want to be able to change them swiftly. I think moving them into the containers (so new logos take a few hours / days to roll out) is probably acceptable, but it's a fair regression in flexibility of config.

Reply to "mediawiki-config size and Logos"

"On some trigger, the new pod is added into the production pool and slowly scaled out to answer user requests until it is the only pod running or is removed"

5
AKosiaris (WMF) (talkcontribs)

Scaling out happens in kubernetes by increasing the number of pods so this sentence needs some rewording. Deployments just spawn up a batch of new k8s pods (25% by default, configurable though) and kill the same amount of pods from the previous deploy. Rinse and repeat in an A/B fashion until the entirety of the pods have been replaced.

Jdforrester (WMF) (talkcontribs)

Ah, yes, will re-word.

Jdforrester (WMF) (talkcontribs)
AKosiaris (WMF) (talkcontribs)

Yes, I think so. I am still a bit unclear on the ** How does we tell the controller (?) to know to which deployment state to route a given request? part, but judging from the "controller" having a question we probably want to define that first

Jdforrester (WMF) (talkcontribs)

That's wrapped up in the decision about how we want the A/B split of traffic to the new deploy – do we just do it uniformly randomly across all request types (standard k8s behaviour), or do we want do something closer to what we do now (roll out by request type sharded by target wiki), or something else. I've left that open as I don't think it's been discussed.

Reply to ""On some trigger, the new pod is added into the production pool and slowly scaled out to answer user requests until it is the only pod running or is removed""

Exclusion of operations/mediawiki-config's InitialiseSettings.php from the "k8s" pod

2
AKosiaris (WMF) (talkcontribs)

What's the reason behind that? It feels kind of weird that CommonSettings.php is in there and InitialiseSettings.php isn't going to be there.


Furthermore, how do we plan on making available to the configuration from InitialiseSettings.php to pods?

AKosiaris (WMF) (talkcontribs)
Reply to "Exclusion of operations/mediawiki-config's InitialiseSettings.php from the "k8s" pod"

On number of versions and rolling back

2
LarsWirzenius (talkcontribs)

Here's an idea, possibly too crazy: we build new container versions, in sequential order somehow: v1, v2, ...


We deploy each a new version, and after it's run acceptably with production traffic for time T, we label is golden. Any non-golden version can be rolled back (possibly automatically, based on error rate or UBN; possibly manually be RelEng/SRE/CPT/....). If rolling back one version isn't enough, roll back further, until newest golden version.


Every roll back results in an alert to RelEng, SRE, CPT, and anyone with changes since the newest golden version until the rolled back version.

Jdforrester (WMF) (talkcontribs)

Definitely agreed on the "golden" label with possible auto-rollback, but sometimes golden labels turn sour over time, either temporary (e.g. a configuration setting for which endpoint the DBs are loaded from) or permanently (e.g. a feature is intentionall removed), so we may need more humans in the loop than ideal.

Reply to "On number of versions and rolling back"

Every commit vs branch cut

4
Summary by Jdforrester (WMF)
TCipriani (WMF) (talkcontribs)

One thing that is still undecided about the "automatically packaged" part of this document is how often we'll make a container. You point out that we have more commits already than could be deployed continuously -- currently that's handled via the train acting as pressure release. As a first iteration: that might be preferable/might be worth mentioning.

LarsWirzenius (talkcontribs)

If we need 45 minutes to build/test a new container, and we get a change every 17 minutes, we can't do this on every commit. I think we need to do things: a) do a time-based build/test: every hour, on the hour b) work on reducing the wall-clock time to build/test a container.

Jdforrester (WMF) (talkcontribs)

My starter-for-ten for this was maybe building an image every 24 hours, maybe at 04:00 UTC (our current trough of new commits). It'd be imperfect, but it'd reduce pressure significantly. However, this also reduces the ability to roll things out to only once a day, of course.

TCipriani (WMF) (talkcontribs)

Current limitation is once per week, so once per day seems like an improvement. We could iterate from there.

There are no older topics