Requests for comment/Streamlining Composer usage

Background
MediaWiki core and its extensions depend on libraries that are managed via composer. This RFC intends to continue from where Requests for comment/Composer managed libraries for use on WMF cluster left off. Library infrastructure for MediaWiki will increase the use of Wikimedia maintained libraries hugely. To not hinder this effort we need to streamline the process for adding and upgrading composer dependencies and for building and deploying with composer.

Besides library dependencies composer can be used to build an autoloader for parts of core and parts of extensions. To properly make use of that we need to ensure that our build and deployment process works with that.

For development purposes, most people currently run composer in the root of Mediawiki. This loads the composer-merge-plugin which merges all dependencies from other specified composer.json files, usually from extensions. For Wikimedia production deployment from them wmf branches, we do not use composer directly but instead an intermediate repository mediawiki/vendor which is manually updated. In between we have the master development branches, continuous integration jobs and the beta cluster environment which all currently use mediawiki/vendor. Each of the branches might need to use a different strategy for continuous integration in the future.

Wikidata (Wikibase, related extensions and dependencies)
Wikidata will bring in 19 more components, only maintained by people trusted with Wikimedia merge rights (i.e. not components from outside Wikimedia). Its CI uses one composer run like during development instead of mediawiki/vendor. Once per day a virtual machine builds the Wikidata "extension". It contains all extensions needed for Wikidata.org and dependencies. The build output is proposed as a patch to Gerrit for the mediawiki/extensions/Wikidata repository. It is then +2ed by a human. The Composer generated autoloader is in use in these extensions and libraries.

Wikidata dependencies (already outdated, there are now more):

 * 1) composer/installers @v1.0.21 (already in core)
 * 2) data-values/common @0.2.3
 * 3) data-values/data-types @0.4.1
 * 4) data-values/data-values @1.0.0
 * 5) data-values/geo @1.1.4
 * 6) data-values/interfaces @0.1.5
 * 7) data-values/javascript @0.7.0
 * 8) data-values/number @0.4.1
 * 9) data-values/serialization @1.0.2
 * 10) data-values/time @0.7.0
 * 11) data-values/validators @0.1.2
 * 12) data-values/value-view @0.14.5
 * 13) diff/diff @2.0.0
 * 14) serialization/serialization @3.2.1
 * 15) wikibase/data-model @3.0.0
 * 16) wikibase/data-model-javascript @1.0.2
 * 17) wikibase/data-model-serialization @1.4.0
 * 18) wikibase/internal-serialization @1.4.0
 * 19) wikibase/javascript-api @1.0.3
 * 20) wikibase/serialization-javascript @2.0.3
 * 21) propertysuggester/property-suggester @2.2.0 (extension, would become submodule of core)
 * 22) wikibase/wikibase @dev-master (extension, would become submodule of core)
 * 23) wikibase/Wikidata.org @dev-master (extension, would become submodule of core)
 * 24) wikibase/wikimedia-badges @dev-master (extension, would become submodule of core)

wmf deployment branches
The wmf branches of extensions and so on are added as submodules to the wmf branch of mediawiki/core when a new wmf branch is created. Merges to a wmf branch in an extension automatically result in a commit in mediawiki/core that updates the respective submodule. (Introducing automatic submodule updates was a recent changes.)

When a new wmf branch is created care is being taken to trigger the CI (see ). Example from 1.26wmf9: * 20c7219 - Submitting branch for review so that it gets tested by jenkins. refs T101551 (7 days ago)  * 11015b2 - Creating new WMF 1.26wmf9 branch (7 days ago)  * 6521b36 - Creating new WMF 1.26wmf9 branch (7 days ago) 

Double Review
Upgrading a dependency of e.g. the extension Wikibase if it were included in mediawiki/vendor.git: This is work that could be automated. Now a human might not notice when something doesn't match even though they are the magic prevention mechanism for problems that are not specified in enough detail to know what automatic mechanisms could prevent them instead. This is extra manual review work while Wikimedia can't even keep up with the normal influx of reviews :-(.
 * 1) A patch to the dependency (e.g. wikibase/data-model) is proposed.
 * 2) It is reviewed and merged by a Wikimedian.
 * 3) A release for the dependency is done.
 * 4) A patch that updates the requirement in mediawiki/extensions/Wikibase.git is proposed, reviewed and merged.
 * 5) An update of mediawiki/vendor is proposed, causing a second review!

mediawiki/vendor creates a circular dependency
The update of a source composer.json and mediawiki/vendor would need to happen at the same time.

If the CI uses medawiki/vendor it fails because vendor was not updated. Bypassing CI breaks beta.

If the CI uses composer, beta may fail because mediawiki/vendor was not updated.

If mediawiki/vendor is changed first, beta breaks because the rest is not prepared for the new versions in vendor.

(A broken beta also means and development system updated at that time is broken)

This not only applies to the master branches but also to the wmf deployment branches as the submodule is updated automatically, so there is no chance to prepare mediawiki/vendor ahead of time and update composer.json and the vendor submodule in one commit.

Currently this is dealt with by overriding the CI and causing a temporary breakage that is then fixed by updating mediawiki/vendor or the other way around.

Generated autoloader needs frequent updates
mediawiki/vendor.git needs the updated class map for the Composer generated autoloader. We use the optimized variant and thus adding a class means updating the autoloader. So adding a class to core and/or an extension may need an update to vendor.

version check in update.php is not sufficient
It currently only handles a very narrow case of libraries with an exact version specifier to be bundled with core via mediawiki/vendor. We can use the new composer/semver library to make this less terrible.

operations/mediawiki-config
It uses composer. Currently only to pull in wikimedia/cdb. wikimedia/cdb uses an composer generated autoloader. The dependencies are embedded in the git same repo under multiversion/vendor. In theory a namespace/version conflict with mediawiki/core and/or mediawiki/vendor could happen.

Usage of github
Some of the parts here might be on github. Manual:Developing_libraries suggests it is ok. The merge-plugin is hosted there.

Proposal
Most of the problems are the result of dealing with a build step (running composer) in the same process step as patch validation/merge. Instead of a separate step before or during deployment.

The master branches (core, extensions and so on) will be changed to run composer during CI like what is done currently for Wikidata.

The following is likely contentious, but at least somewhat unclear:

One proposed solution was to automatically build and commit mediawiki/vendor.

Another possibility would be to run Composer during deployment, i.e. during scap.

Open questions
How do we update mediawiki/vendor.git automatically? Do we instead do this during scap? Would we then stop using mediawiki/vendor.git?

The person creating the wmf branches and deploying the train is likely affected the most by this. What do they think about this?

What about the last three problems that are not directly solved by only automating the vendor update?

TODO, reference:

https://www.mediawiki.org/wiki/Requests_for_comment/Extensions_continuous_integration

https://www.mediawiki.org/wiki/Requests_for_comment/Extension_registration

https://www.mediawiki.org/wiki/Requests_for_comment/Improving_extension_management