User:Aron Manning/RfC: Evaluate alternative Node package managers for improved package security

Prepared by Demian aka. Aron. Initial message to wikitech-l.

This RFC (descoped from T199004) is about evaluating alternative package managers in regards of:
 * security benefits
 * impact on development and deployment workflows
 * migration workload

The purpose is to test these PMs without changing or disrupting established workflows. One or both PMs can be introduced for testing by developers as an optional alternative in the codebase, without impacting production. This testing phase might take months, involving compatibility updates to wikimedia projects and upstream packages. If such test is successful, it can be considered to migrate production workflows from NPM to a chosen alternative.

The package managers in scope (details below, feel free to expand):
 * Npm (currently in use)
 * Yarn 2 (berry)
 * Pnpm

Motivation
The issue of package security has been concerning the Node community for years. Packages in the NPM repository are vetted by the community only after publication, leaving a chance for attackers to publish a malicious package version, which then gets deployed to developer boxes and possibly servers under the radar when  is run. Such malicious packages are detected only after some time, possibly reaching a number of users. This is a possibility to be prepared for, but constant fear would be unjustified: only a few cases happened in the years and those were quickly removed. The community is resilient and constructive, leaving little room for bad intent.

Nonetheless, even a short breach at a content provider of the size of Wikimedia could affect a great number of users, therefore it's appropriate to have prevention measures in place.

It's important to note that all public, open package repositories are subject to such malice, not just NPM (for ex. PyPI). The difficulty with NPM is that there are many small packages, resulting in a bigger dependency tree (T199004) and significantly more packages to review than with PHP packages for ex.

A few examples of NPM incidents

 * https://portswigger.net/daily-swig/new-npm-scanning-tool-sniffs-out-malicious-code : Mar, 29 2019
 * https://www.bleepingcomputer.com/news/security/npm-pulls-malicious-package-that-stole-login-passwords/ : Aug 21, 2019 - 'bb-builder'
 * https://medium.com/intrinsic/compromised-npm-package-event-stream-d47d08605502 : Nov 27, 2018 - 'event-stream'
 * https://snyk.io/blog/malicious-code-found-in-npm-package-event-stream/ : Nov 26, 2018 - 'event-stream' (downloaded 8 million times in 2.5 months)
 * https://www.bleepingcomputer.com/news/security/compromised-javascript-package-caught-stealing-npm-credentials/ : Jul 12, 2018 - 'eslint-scope'
 * https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes : Jul, 2018 - 'eslint-scope' (detected in 37 minutes, disabled 1h 10min later)
 * https://www.securityweek.com/malicious-eslint-packages-steal-software-registry-login-tokens : Jul 16, 2018 - 'eslint-scope' (stealing '.npmrc' - NPM access tokens)
 * https://www.bleepingcomputer.com/news/security/somebody-tried-to-hide-a-backdoor-in-a-popular-javascript-npm-package/ : May 3, 2018 - 'getcookies'->'express-cookies'->'http-fetch-cookies'->'mailparser' (ineffective)
 * https://www.bleepingcomputer.com/news/security/javascript-packages-caught-stealing-environment-variables/ : Aug 4, 2017 - 'hacktask.net'
 * https://iamakulov.com/notes/npm-malicious-packages/ : Aug 2, 2017 - 'crossenv'
 * https://nakedsecurity.sophos.com/2020/01/15/malicious-npm-package-taken-down-after-microsoft-warning/ : Jan 15, 2020 - '1337qq-js' (32 downloads)

Status quo
Currently Wikimedia is somewhat protected by pinning the version number of major libraries (the top-level dependencies of projects) and vetting those versions. The list is in libraryupgrader2. However, the version of dependencies of dependencies are not controlled with this solution. An existing attack vector is to gain control of the credentials of a minor library deep in the dependency tree and upload a new version with malicious code injected, masquerading behind the declared functionality, or to create one with future malicious intent and get some established libraries to use it before injecting the malicious code. Such code could be released to production with a train deployment if the 'package-lock.json' file on the master branch was updated by libraryupgrader2 or a developer after the release of the malicious version and before it gets removed from NPM. It's a short timeframe and a difficult attack to carry out with a small, but non-zero possibility.

Package managers
Without aiming for completeness, there are 2 recent solutions that target specifically the question of security:

Yarn 2

 * 1) Introductory article
 * 2) Yarn 2 was released in 2020. It is a recent, fundamentally different version from (Classic) Yarn 1 (offline mirror), which made only minor improvements to npm.
 * 3) A central offline cache can be synced to individual nodes through a git repository, or presumably other means, thus guaranteeing complete control over outside source-code entering the Wikimedia ecosystem.
 * 4) With Yarn 2, package files are loaded directly from the offline cache (doc), thereby eliminating the  'node_modules' folders, the duplicated packages and speeding up the install step. This breaks many packages assuming the presence of 'node_modules'.
 * 5) Packages using resolve 1.9.0 (published 2 years ago) are compatible (popular packages).
 * 6) Some upstream packages and wikimedia projects need to be upgraded to use 'resolve'. In the meantime an adapter called 'pnpify' can be used to execute these. If 'pnpify' fails then the package can be 'unplugged' traditional 'node_modules' can be created with the nodeLinker: node-modules setting (migration).
 * 7) In strict mode only direct dependencies are reachable by packages. This is the correct behavior, but NPM's flattening of node_modules allowed developers to look for indirect dependencies too. Those packages need to be upgraded or the pnpMode: loose setting set.
 * 8) Migration: CLI, Q&A
 * 9) Yarn supports workspaces, but uses a confusing vocabulary where Project > Worktree > Workspace. Our git repos (skins, extensions) would be called "workspace". The "core" repo would be the "project" or "worktree".
 * 10) Script names containing ':' are global, can be executed in any "workspace" of the "project". Scripts like 'svgmin' need to be defined only in the root package.json.
 * 11) '.yarn/cache' is stored only in the root.
 * 12) The classic version was created by Facebook, Exponent, Google, and Tilde (article), GitHub lists 511 contributors. The active developers now are not Facebook employees (ref: Q&A). Being released just half a year ago, it's popularity is beneficial when it comes to its drawback: incompatibility.
 * 1) The classic version was created by Facebook, Exponent, Google, and Tilde (article), GitHub lists 511 contributors. The active developers now are not Facebook employees (ref: Q&A). Being released just half a year ago, it's popularity is beneficial when it comes to its drawback: incompatibility.

Pnpm

 * 1) Pnpm package store.
 * 2) Pnpm was released in 2017, after Classic Yarn.
 * 3) Pnpm deduplicates packages using symlinks, recreates node_modules, but only including direct (top-level) dependencies without flattening the dependency tree.
 * 4) Packages can load only direct dependencies. Other than that there aren't many complications and setup is easier: if a tool is missing a package, add it to 'devDependencies'.
 * 5) Supports workspaces with a simple vocabulary: Workspace > Project. Node packages are linked to the instance in the workspace root if the version constraints allow it. This is the case with libraryupgrader2 keeping dependency versions consistent along projects.
 * 6) Drawback of its workspace implementation: 'pnpm install' anywhere in the workspace attempts to create/fix the 'node_modules' folder of all projects in the workspace, even if the developer only needs to do that for one project. If installing a package fails in any of the repos, the install process fails for all repos, creating an unusable state. This might be a reason to not use workspaces, or find a way around this behavior.
 * 7) Pnpm was created by one person, it's sponsored by 2 companies, used by quite a few know companies and GitHub lists 61 contributors.
 * 8) An article briefly mentioning why Pnpm workspaces were simpler to set up than Yarn 2's.
 * 1) An article briefly mentioning why Pnpm workspaces were simpler to set up than Yarn 2's.

npm

 * For completeness it should be mentioned that full control over package versions is achievable with plain old NPM by generating 'package-lock.json' from libraryupgrader2, or using a 'package.json' generated with the exact version of all dependencies, direct and indirect. If developers use `npm ci` instead of `npm i`, their installed version will match too.

Common
The configuration and lock files of all 3 PMs can be checked into the same repository without conflict. The PM to use on any box can be chosen independently, allowing the testing of an alternative PM while NPM is used in production. Depending on usability and personal preference, it's also an option for developers to chose their favored PM to use. The workspace management features and simple setup might be reasons to choose on PM over the other.


 * 1) 'package.json' is shared, the package versions installed are expected to be the same if run at the same time as NPM (same state of NPM repo).
 * 2) 'package-lock.json' is NOT shared, there are individual 'yarn.lock' and 'pnpm-lock.yaml' files, these need to be generated separately.
 * 3) Plugins often use 'peerDependencies' to refer to their main package (eg. eslint-plugin-*, stylelint-plugin-*). Both PMs are stricter than NPM and require that the main package is properly declared as dependencies besides the plugins. Some wikimedia 'package.json' files need to be updated to satisfy this constraint, most notably by adding 'eslint', 'stylelint' and their plugins. This strictness might be unwelcome at first, but quickly adapted.
 * 4) Grunt currently can't load plugins with either PM. Issue submitted upstream. Grunt has lost momentum in last years, so I'll be looking into submitting a PR.

Scope
This ticket is only concerned about the tool distributing already audited packages. Code auditing practices and tools are out of scope. It is assumed that libraryupgrader2 (or similar) would be used to track all packages in our dependency trees, down to the last leaf. If the matter of tracking vetted package versions need to be discussed, please open a separate ticket.

Questions

 * 1) Security:
 * 2) How much control it gives over package versions and code integrity?
 * 3) How to integrate with libraryupgrader2 ?
 * 4) How much delay is between a version is published on NPM -&gt; delivered to 1) developers, 2) CI, 3) pre-deploy build?
 * 5) How much time a version is exposed to CI before being delivered to 1) developers, 2) pre-deploy build?
 * 6) How much time a version is exposed to developers before being used for pre-deploy build?
 * 7) Progressive transition:
 * 8) Is the PM usable in parallel with NPM to allow preparation without disrupting the established processes?
 * 9) What features require committing to changing those processes, what's the impact?
 * 10) What's the impact of changing PM on CI nodes, developer machines?
 * 11) Changes to package loading:
 * 12) How to &quot;unplug&quot; (Yarn terminology) packages incompatible with alternative package store (node_modules)?
 * 13) What packages need to be unplugged or upgraded to load dependencies (those using custom loaders, not  )?
 * 14) What steps are necessary to make wikimedia projects compatible with the PM, in detail:
 * 15) The install process works without warnings:
 * 16) Installed packages work (load) as expected.
 * 17) Scripts work as with NPM:
 * 18) What's the resource usage:
 * 19) Storage space required for package store (central), local cache (each CI node and developer box), installed packages (in workspace and each project).
 * 20) Network usage vs cache usage.
 * 21) Install and update time.
 * 22) Usability:
 * 23) What are the changes to developers?
 * 24) What are the common problems encountered by users?
 * 25) Is the developer experience improved?
 * 26) Workspaces:
 * 27) Is it suitable for our multi-repository setup?
 * 28) What benefits it gives for managing multiple projects that make up a MediaWiki developer instance?