LibUp/2.0

Feedback/comments requested on "LibUp 2.0". 87% of the code for this has already been implemented (e.g. see web dashboard), but I'm always open to changes/improvements. I would like to have this online by the end of July, so the tenative deadline for feedback is July 25 (if you need more time, just let me know). Thanks in advance :) Legoktm (talk) 19:39, 11 July 2019 (UTC)

Motivations

 * Reduce the dependency upon Legoktm to run trigger stuff manually, and instead be fully self-service.
 * Abstract all of the PHP/composer specific parts to add in npm support, and theoretically support other package managers in the future (pip/ruby/...)
 * Merge in other related/similar tooling (e.g. User:Legoktm/ci) into one codebase
 * Continuous integration (specifically, the continuous part)

Architecture
Repositories are scanned once a day, to collect the current state of the repository (listed dependencies, npm audit report, etc.).

After collection, we look at which, if any, libraries need an upgrade. The latest version must be whitelisted in Libraryupgrader/Good releases.json (sysop-protected) before libup will consider upgrading it. This whitelist does not apply to security fixes from.

Canary repositories will be immediately updated to the latest version, while normal repositories will only be upgraded after all the canaries have the update. In practice this means it'll take at least a day for updates to roll out everywhere. (first cron will update the canaries, the next day it'll update everything else). If necessary, this can be sped up by manually running the script once all the canaries are updated.

More technically
A systemd timer triggers the run.py script, which gathers a list of repositories, and queues jobs for them in our celery instance (backed by rabbitmq). celery is a job runner (currently running with a concurrency of 2), and will spawn docker containers that executes ng.py. This clones the repository, runs. It then goes through the libraries, bumps the onces it wants to, re-runs tests, preps the commit, and saves it as a patch. Then a different docker container is spawned, with access to a ssh-agent, imports the patch, and then pushes it, potentially applying +2 if only a safe subset of files were touched.

The entire log will be kept forever/publicly viewable (still to be implemented).

Security concerns
Currently legoktm is the only one with access to the libraryupgrader Cloud VPS VMs, but as part of the 2.0 effort, I'm hoping to recruit new maintainers.

There are two main security considerations: 1) code injection into our repositories 2) protection of the libraryupgrader Gerrit account.

Addressing the first:


 * Libraries will only be upgraded if they have been whitelisted - it's up to the person doing the whitelisting to review diffs since the previous version. This also means reviewing transitive dependencies.
 * Is there any tool that allows viewing a full diff between package versions, including diffing dependency updates?
 * Difference from 1.0: None really, except Legoktm also usually did a quick spot check when someone else was asking for upgrade.
 * libup will only +2 patches that touch specific files (e.g. composer.json, package.json, package-lock.json). An attacker would need to inject content into these files OR subtly trick humans into +2'ing changes to other files.
 * Difference from 1.0: Usually Legoktm spot checks the random patches, and would theoretically catch stuff like this. With it being fully automated, this kind of check won't happen.

And the second:


 * As a bit of background, previously the we used HTTP auth to push patches. The password was stored in Legoktm's password manager, and he'd type that into the script every time it needed to run. It was passed into the docker container as an env variable for the script to use when pushing (prior discussion).
 * Instead, I'd like to have an SSH key loaded in the ssh-agent of the libup user on the host machine. It'll pass the agent's socket into the docker container for pushing patches. From an attack surface standpoint:
 * If you compromised the container (not too unlikely given we're executing random internet code from npm), you have no access to the Gerrit account, nor the private key. Previously, you could theoretically exfiltrate the HTTP password since it was effectively available to everything as a part of the environment.
 * The container that does have access to the ssh-agent only runs fully trusted code (libup), not npm/composer/stuff off the itnernet/etc.
 * If you compromised the libup user on the host, you'd have access to the private key material, but not the passphrase. Also the libup user has access to the docker socket, so they're root anyways. But they still don't have the passphrase.
 * (thanks to thcipriani for discussing key management with me, and helping figure the security concerns)