Requests for comment/Streamlining Composer usage

Proposal
Most of the problems are the result of dealing with a build step (running composer) in the same process step as patch validation/merge. Instead of a separate step before or during deployment.

The master branches (core, extensions and so on) will be changed to run composer during CI like what is done currently for Wikidata.

The following is likely contentious, but at least somewhat unclear. Some proposed solutions:


 * 1) Automatically build and commit mediawiki/vendor (see ).
 * 2) Run Composer during deployment, i.e. during scap.
 * 3) Manually update mediawiki/vendor during branch cut. Do not use that repository for anything else. Not for master nor for any other installations of Mediawiki. This does not solve the circular dependency problem for wmf deployment branches,  we would just bypass CI and fix it in the next step (which causes problems with zuul, see ).

To improve availability Satis may be of use (see ).

Review of dependencies needs to happen before a commit that adds them to composer.json is merged.

Nobody is currently working on solving the package integrity problem for composer, but solving it in a generic way that scales for the whole community of people creating composer packages would be a good idea and has the highest value for Mediawiki.

Open questions
How do we update mediawiki/vendor.git automatically? Do we instead do this during scap? Would we then stop using mediawiki/vendor.git?

What do we do for Mediawiki release branches?

The person creating the wmf branches and deploying the train is likely affected the most by this. What do they think about this?

Do we think we should come up with threat models? What is the minimum that we want to have addressed over all software in production, on deployers machines, on developers machines? Is there enough support to improve the Composer package integrity for all our used dependencies?

What about the problems that are not directly solved by only automating the vendor update?

Background
MediaWiki core and its extensions depend on libraries that are managed via composer. This RFC intends to continue from where Requests for comment/Composer managed libraries for use on WMF cluster and Requests for comment/Extensions continuous integration left off. Library infrastructure for MediaWiki will increase the use of Wikimedia maintained libraries hugely. To not hinder this effort we need to streamline the process for adding and upgrading composer dependencies and for building and deploying with composer.

Besides library dependencies composer can be used to build an autoloader for parts of core and parts of extensions. To properly make use of that we need to ensure that our build and deployment process works with that.

For development purposes, most people currently run composer in the root of Mediawiki. This loads the composer-merge-plugin which merges all dependencies from other specified composer.json files, usually from extensions. For Wikimedia production deployment from them wmf branches, we do not use composer directly but instead an intermediate repository mediawiki/vendor which is manually updated. In between we have the master development branches, continuous integration jobs and the beta cluster environment which all currently use mediawiki/vendor. Each of the branches might need to use a different strategy for continuous integration in the future.

Neither Composer nor its autoloader should be used to register an extension, that should be done by Mediawiki extension registration which gets triggered by something in LocalSettings.php. See Requests for comment/Improving extension management for improving installation of extensions and dependencies of extensions on other extensions.

Wikidata (Wikibase, related extensions and dependencies)
Wikidata will bring in 19 more components, only maintained by people trusted with Wikimedia merge rights (i.e. not components from outside Wikimedia). Its CI uses one composer run like during development instead of mediawiki/vendor. Once per day a virtual machine builds the Wikidata "extension". It contains all extensions needed for Wikidata.org and dependencies. The build output is proposed as a patch to Gerrit for the mediawiki/extensions/Wikidata repository. It is then +2ed by a human. The Composer generated autoloader is in use in these extensions and libraries.

Wikidata dependencies (already outdated, there are now more):

 * 1) composer/installers @v1.0.21 (already in core)
 * 2) data-values/common @0.2.3
 * 3) data-values/data-types @0.4.1
 * 4) data-values/data-values @1.0.0
 * 5) data-values/geo @1.1.4
 * 6) data-values/interfaces @0.1.5
 * 7) data-values/javascript @0.7.0
 * 8) data-values/number @0.4.1
 * 9) data-values/serialization @1.0.2
 * 10) data-values/time @0.7.0
 * 11) data-values/validators @0.1.2
 * 12) data-values/value-view @0.14.5
 * 13) diff/diff @2.0.0
 * 14) serialization/serialization @3.2.1
 * 15) wikibase/data-model @3.0.0
 * 16) wikibase/data-model-javascript @1.0.2
 * 17) wikibase/data-model-serialization @1.4.0
 * 18) wikibase/internal-serialization @1.4.0
 * 19) wikibase/javascript-api @1.0.3
 * 20) wikibase/serialization-javascript @2.0.3
 * 21) propertysuggester/property-suggester @2.2.0 (MediaWiki extension, would become submodule of core)
 * 22) wikibase/wikibase @dev-master (MediaWiki extension, would become submodule of core)
 * 23) wikibase/Wikidata.org @dev-master (MediaWiki extension, would become submodule of core)
 * 24) wikibase/wikimedia-badges @dev-master (MediaWiki extension, would become submodule of core)

wmf deployment branches
The wmf branches of extensions and so on are added as submodules to the wmf branch of mediawiki/core when a new wmf branch is created. Merges to a wmf branch in an extension automatically result in a commit in mediawiki/core that updates the respective submodule. (Introducing automatic submodule updates was a recent changes.)

When a new wmf branch is created care is being taken to trigger the CI (see ). Example from 1.26wmf9: * 20c7219 - Submitting branch for review so that it gets tested by jenkins. refs T101551 (7 days ago)  * 11015b2 - Creating new WMF 1.26wmf9 branch (7 days ago)  * 6521b36 - Creating new WMF 1.26wmf9 branch (7 days ago) 

Double Review
Upgrading a dependency of e.g. the extension Wikibase if it were included in mediawiki/vendor.git: This is work that could be automated. Now a human might not notice when something doesn't match even though they are the magic prevention mechanism for problems that are not specified in enough detail to know what automatic mechanisms could prevent them instead. This is extra manual review work while Wikimedia can't even keep up with the normal influx of reviews :-(.
 * 1) A patch to the dependency (e.g. wikibase/data-model) is proposed.
 * 2) It is reviewed and merged by a Wikimedian.
 * 3) A release for the dependency is done.
 * 4) A patch that updates the requirement in mediawiki/extensions/Wikibase.git is proposed, reviewed and merged.
 * 5) An update of mediawiki/vendor is proposed, causing a second review!

mediawiki/vendor creates a circular dependency
The update of a source composer.json and mediawiki/vendor would need to happen at the same time.

If the CI uses medawiki/vendor it fails because vendor was not updated. Bypassing CI breaks beta.

If the CI uses composer, beta may fail because mediawiki/vendor was not updated.

If mediawiki/vendor is changed first, beta breaks because the rest is not prepared for the new versions in vendor.

(A broken beta also means any development system updated at that time is broken)

This not only applies to the master branches but also to the wmf deployment branches as the submodule is updated automatically, so there is no chance to prepare mediawiki/vendor ahead of time and update composer.json and the vendor submodule in one commit.

Currently this is dealt with by overriding the CI and causing a temporary breakage that is then fixed by updating mediawiki/vendor or the other way around. (Overriding the CI causes problems with zuul, see .)

Generated autoloader needs frequent updates
mediawiki/vendor.git needs the updated class map for the Composer generated autoloader. We use the optimized variant and thus adding a class means updating the autoloader. So adding a class to core and/or an extension may need an update to vendor.

Usage of libraries without depending on them
Extensions start to use classes from libraries that are pulled in via composer without declaring their dependency in composer.json (see for this problem related to Wikidata).

version check in update.php is not sufficient
It currently only handles a very narrow case of libraries with an exact version specifier to be bundled with core via mediawiki/vendor. We can use the new composer/semver library to make this less terrible.

operations/mediawiki-config
It uses composer. Currently only to pull in wikimedia/cdb. wikimedia/cdb uses an composer generated autoloader. The dependencies are embedded in the git same repo under multiversion/vendor. In theory a namespace/version conflict with mediawiki/core and/or mediawiki/vendor could happen.

Usage of github
Some of the parts here might be on github. Manual:Developing_libraries suggests it is ok. The merge-plugin is hosted there.

Availability
Packagist and whatever other parts involved in downloading all packages might be down or slow. (See related about our CI not being able to run composer because of throttling.)

Package integrity
Like composer, most other software package managers do not implement all of the below mentioned mitigations. Notable exceptions are probably Debian and RedHat based distributions. I heard maven optionally has support for some (or was that some other Java package manager?), but I didn't look in detail. npm has public key pinning for HTTPS to its default repository. Having an overview of these and which threat they mitigate by what technical means would be interesting but out of scope for this RFC.

Blocking creation and use of PHP libraries for this reason, at the same time as not blocking other languages that have the same package manager issues and not blocking further development unless these are solved does not actually achieve security as the weakest chain is your overall security.

Threats
(The above list is adapted from a comment by User:CSteipp_(WMF) found at T105638.)
 * 1) MITM between composer and packagist - in my estimation, this has both High likelihood and High impact. It's a trivial technical attack to pull off without composer checking certificates, and allows the attacker to link their own, modified version of any library into the composed codebase.
 * 2) * Risk: (using high = 3, med = 2, low = 1, and risk = likelihood x impact ), then 3 x 3 = 9
 * 3) * Mitigation: Currently: none; Possible: Transfer integrity, End to end integrity
 * 4) MITM between composer and packagist with valid packagist certificate - In the event that composer start validating the https connection's certificate, is that enough? In my estimation, this is a fairly difficult attack to pull off, for a someone attacking the WMF's infrastructure, so I put the likelihood at Low. For normal developers running `composer update` on their laptop, this is still moderate, since buying an SSL MITM proxy that contains a code-signing certificate is fairly expensive still, so I'm going to say likelihood is Low to Medium. This has the same impact as #1.
 * 5) * Risk: Low-Medium (1.5) x High (3) = 4.5
 * 6) * Mitigation: Currently: none; Possible: Transfer integrity with e.g. key pining, End to end integrity
 * 7) Github is compromised (the entire organization) - in my estimation, compromising github would be a difficult. They have a security team that seems to be making reasonable choices currently. So I'd guess the likelihood is Low. The impact, aiui, would be High for normal composer usage (where just the tarball of a repo at a certain commit is downloaded, and does not appear to be integrity checked). If composer does get a checksum that it checks, or composer is setup to clone the repo and checkout a sha1-hash (which seems to be hard to forge), the the impact would be reduced.
 * 8) * Risk: Low (1) x High? (3) = 3?
 * 9) * Mitigation: Currently: none, Possible: End to end integrity
 * 10) Packagist is compromised (the entire organization) - Packagist concerns me a little more, since I don't know anything about their operational security. I'll do some quick research on that, or if anyone has a citation, happy to evaluate. The impact would again be High, because a attacker could add any code to any library by pointing to their own version of it.
 * 11) * Risk: ? x High (3) = ?
 * 12) * Mitigation: Currently: none; Possible: End to end integrity
 * 13) Github repo is compromised by owner having their password stolen - Github mitigates this with things like mandatory https with HSTS, using OAuth to interact with other services, optional 2-factor authentication for accounts, review and expiration of potentially unsafe ssh keys. However, assuming many library owners are not going to take advantage of the optional security features, we can probably call the likelihood Medium, and the impact High, since it would allow the attacker to add any code to the repo.
 * 14) * Risk: Medium (2) x High (3) = 6
 * 15) * Mitigation: Currently: Updates to vendor are code reviewed (high probability of error, not always done, point of this RFC is to not do that at that time); Possible: End to end integrity
 * 16) Github repo is "compromised" by owner accepting hostile pull request - The likelihood seems Low to me, since if we've determined the library is of sufficient quality to include in our codebase (A MediaWiki developer has decided this is a good library to include, and the library has passed security review by my team), then I (hope) it's unlikely they would accept a malicious pull request. If it did happen, impact would be High, however.
 * 17) * Risk: Low (1) x High (3) = 3
 * 18) * Mitigation: Currently: Vetting of libraries included in MediaWiki (developer review, security review); Updates to vendor are code reviewed (see above)
 * 19) * Only relevant for this RFC for comparison. (Main use case for this RFC is where the owners of the component are a subset of the owners of Mediawiki repositories.)
 * 20) Server hosting the repository declaring the dependency is compromised (gerrit.wikimedia.org)
 * 21) * Risk:
 * 22) * Mitigation: Possible: End to end integrity
 * 23) * Only relevant for this RFC for comparison.
 * 24) Server hosting the code review is compromised (gerrit.wikimedia.org) - Might be used to trick the one merging a pull request that contains something that is not what they reviewed in the web interface. Might be used to trick the one creating a tag in tagging and signing something that they didn't intend to.
 * 25) * Risk:
 * 26) * Mitigation: Possible: End to end commit integrity together with only reviewing on the local machine.
 * 27) * Only relevant for this RFC for comparison.
 * 28) Owners development machine and/or GPG key is compromised
 * 29) * Risk:
 * 30) * Mitigation: Possible: Use only software with a verified chain to people who wrote it. Hardware key that does not output the private key for GPG and SSH. ...
 * 31) * Only relevant for this RFC for comparison.
 * 32) Wikimedia is compromised (the entire organization)
 * 33) * Risk:
 * 34) * Mitigation: Possible: There are ways.
 * 35) * Out of scope for this RFC. Lets tackle the rest first.

This only discusses integrity, not confidentiality, the mitigation explained under Transfer integrity would also server to gain some confidentiality.

Mitigations
Composer does not check if the downloaded package is what we expected it to be. There are multiple ways that could help mitigating the above threats: (Currently support by satis and the software running on packagist.org is in practice equivalent to composer.)
 * 1) Fail closed: For any mitigations to work composer needs to fail closed, i.e. when integrity can not be proven stop with an error. This is required by the below mitigations. (not implemented in composer)
 * 2) Transfer integrity:
 * 3) * Only use HTTPS instead of HTTP. (Not implemented in composer. Supported by packagist.org and github.com.)
 * 4) * properly implement HTTPS, e.g. check certificates (Not implemented in composer, see composer MITM proof of concept. Supported by packagist.org. Supported by github.com.)
 * 5) * possibly implement a mitigation against CA compromise, like key pinning (Not implemented in composer, nor packagist.org. Supported by github.com.)
 * 6) End to end release integrity:
 * 7) * Verify and only use signed tags from git repositories (Implemented in git:,  . Not implemented in composer.)
 * 8) ** An inconvenient replacement for this would be per dependency maintain a list of trusted cryptographic hashes.
 * 9) * Verify and create signatures for tar/zip archives. (Not implemented in composer.)
 * 10) ** Use the same keys as for the git tag and distribute the detached signatures via the package repository. (There may be reserved fields for this, but not implemented in packagist nor satis.)
 * 11) *** This is made difficult by the fact that the archives composer downloads from github.com change content each time they are requested.
 * 12) ** Or use some way to verify a download with the signature from the tag.
 * 13) * Build a web of trust and for each recursive dependency maintain a set of keys that are trusted to create releases. (Not implemented by composer. Not fully implemented in git nor GPG: need to parse GPG output to know.)
 * 14) End to end commit integrity. (I.e. signing every commit, see http://mikegerwitz.com/papers/git-horror-story for some details. This is not fully explained here.)
 * 15) * To get commits into ones repository that were not signed before requires one to fetch them, review them fully on the local machine, then sign them.
 * 16) * An approximately equivalent would be to never use anything from remote except things that you fully reviewed locally.

Suggestion
Implement Transfer integrity using GPG based on signed git tags. Also use something equivalent for code not distributed through composer (like mediawiki/core wmf branches) to avoid a weak link.