User:Roan Kattouw (WMF)/ResourceLoader and build steps

History note
ResourceLoader was developed in 2010. It predates modern bundlers written in JavaScript, like Webpack, Rollup, etc. Webpack is sometimes described as "Makefiles for the web", and in fact, we used to use Makefiles for concatenating and minifying JavaScript in 2009-2010. We wrote ResourceLoader because that was a pain. An explicit design goal of ResourceLoader was to avoid needing a build step and to do everything on the fly. This was partly for developer productivity reasons: versioning build output in the repo is a pain, and at the time most front-end engineers didn't know how to use command-line build steps (because that wasn't a thing in front-end land back then).

Because of the integration with MediaWiki, ResourceLoader is written in PHP. Nowadays, software that processes JavaScript code and CSS (like bundlers, transpilers, etc) is almost always written in JavaScript. But in 2010, nodejs was this crazy new idea that only some people had heard of (and npm was only a few months old when we started planning out RL). It seemed reasonable at the time to think that we wouldn't need very advanced transformations, and that we could write them all in PHP (in fact, we ported CSSJanus from Python to PHP). Nowadays, that isn't a reasonable assumption anymore, because transformations have gotten way more advanced, and they're all written in JS.

Transformation and bundling
ResourceLoader bundles assets and does some transformation on them, all implemented in PHP:


 * Concatenating/bundling JS/CSS files within a module
 * Bundling different types of resources (JS, CSS, i18n messages, HTML templates) in the same request
 * Bundling all modules loaded on a page in the same request
 * Minification of JS (using jsminplus) and CSS (using CSSMin, a homegrown minifier)
 * Remapping of relative URLs in CSS (to account for everything being served from ) and embedding of small images as data URIs (both part of CSSMin)
 * RTL flipping of CSS when needed (using a PHP port of CSSJanus, which was originally a Google project but has now been adopted by WMF)
 * LESS -> CSS compilation (using less.php, which was also started by someone else then adopted by WMF)

Notably, ResourceLoader doesn't do transformations that there aren't (good) PHP implementations for. These include ES6->ES5 transpilation (babel) and more advanced CSS transformations (postcss); the only good implementations for those things are in JavaScript. ResourceLoader also doesn't analyze JS code to e.g. automatically discover dependencies or do tree shaking, like Webpack does. Instead, it requires the developer to list each file in the module definition. (However, RL does do automatic dependency tracking for images and imports in CSS/LESS.)

Dependency management
ResourceLoader modules can express dependencies on each other in their module definitions. This dependency graph is sent from the server to the client-side runtime, which then uses it to ensure that the right modules are loaded (when loading a module, all of its dependencies are loaded too) and to ensure the right execution order (a module is only executed after its dependencies have been executed). The client-side runtime also manages the execution (and relative file path resolution) for per-file  calls.

Caching
ResourceLoader aggressively caches the contents of modules on the client side. HTTP responses to  requests have headers instructing the user's browser to cache the response for 30 days. When a module does change, we invalidate this cache using a cache busting query parameter. The module manifest in the startup module contains the version hash of each module. When the client sends requests to, it includes a   parameter, set to a hash of the hashes of the modules it's requesting. The manifest itself is cached for a short time (5 minutes), so when a module changes, the client will receive the new version hash at most 5 minutes later, and use a different version parameter in the URL the next time it requests the module. The browser's HTTP cache considers that a different URL, and ignores the previous cache entry.

ResourceLoader also stores the contents of every module it loads in localStorage (except for very large modules), and tries to load modules from there before making an HTTP request. This cache is more granular than the HTTP cache (per-module rather than per-request), and does not expire. Cache invalidation is done using the same module version hashes: if the version hash of the module as stored in localStorage doesn't match the one in the manifest, the localStorage entry is purged and the new version is downloaded from the server.

Because of unhelpful quota management behavior in Firefox, we don't use localStorage for ResourceLoader caching in Firefox. There are (vague) plans to move away from localStorage in the future, and instead use the  API in conjunction with a service worker. This would essentially allow us to use the browser's HTTP cache as a per-module cache, avoid cluttering localStorage, and re-enable per-module caching in Firefox.

How the version hashes used for cache invalidation are computed varies between module types. The simplest strategy is to generate the contents of the module (as it'd be shipped to the client), then hash that. This is used for some types of modules, but for performance reasons, most modules (including simple, file-based modules) instead hash the contents of the files that form the module. This avoids having to perform expensive steps like LESS compilation just to see if the module has changed.

Varying on skin and language
For caching reasons, ResourceLoader doesn't allow module contents to vary on the user, the page title, or almost anything else (because any variation fragments the cache). The only things  responses vary on are the UI language and skin the user is using. Varying on UI language is necessary for the bundling of i18n messages with modules to work, and to know when RTL flipping should be applied. Varying by skin is not strictly necessary, but is useful for loading skin-specific styles.

MediaWiki integration and non-file content
RLWikiModule, RLSkinModule, config vars, user settings, dynamic JSON files, LESS import path (in extensions, without needing to know the relative path to core), i18n messages as LESS vars, user scripts/styles, Gadgets

Module loading and registration
MW integration to gather module registrations across core and extensions, gather the list of modules to load on each page (from core and extensions), allow lazy-loading.

ResourceLoader's strengths and limitations
Strength: integration with MW allows for wiki pages and config as content

Weakness: on-the-fly processing must be fast and be in PHP

Weakness: doesn't analyze JS to do automatic dependency detection / tree shaking; instead have to manually define a module's files and dependencies

Getting a trusted build output
The build output is usually unreviewable. Committing built output to repo is annoying, but also a security risk (building dev's laptop is not a very secure environment, could be compromised, CR wouldn't notice compromised build output). Running build on deployment server is scary because executing arbitrary code from npm, and e.g. webpack could be compromised in npm. Need a controlled build env that runs only trusted code, sandboxed, whose output we can trust. (Not needed for reviewable output.)

Duplicated bundler runtime
Bundlers like webpack are intended to produce the entire bundle, not one of several sub-bundles. If multiple extensions use Webpack to generate their own bundle, then if multiple bundles are loaded, Webpack runtime is loaded twice.

Non-global tree shaking leads to duplication
If two extensions use Webpack to tree-shake the same library, they'll duplicate parts of it.

Debugging is harder
Without source maps, or other special support, you won't be able to see the original (non-compiled) code in the browser debugger, or be able to tell which file it came from. RL has some of these problems already, but in-browser debugging is pretty workable even in non-debug mode thanks to deminification (stack traces are pretty useless in non-debug mode though). Loading a built file through RL makes these issues worse. This is partly solvable by loading files differently on dev setups (from a local nodejs server), but we also need to be able to debug in production.

You're going to use ResourceLoader anyway
For things like CSSJanus, i18n messages. You can't pre-build the actual output in most cases, because it varies by language and skin, and there are too many of those. Your module is probably only loaded on some pages, or it may need to be lazy-loaded in some cases.

File concatenation and management
This is not as good in RL as in Webpack, but fairly reasonable

Tree shaking
Trying to solve a global problem locally doesn't work

The way forward for ResourceLoader
I don't really know. Using modern transformations written in JS would be nice. It would get us ES6 support, TypeScript, pre-compiled Vue templates, better JS minification, better RTL flipping (I believe postcss's RTL plugin is better than CSSJanus), more interesting CSS transformations with postcss, and who knows what else. For many reasons, I think we'll probably want/need to continue to avoid build steps and prefer on-the-fly processing. One way to do this might be to move most of RL into a nodejs service, which talks to a PHP endpoint for some of the MW-specific things it needs. Another way could be to keep most of RL in PHP, but have it call out to JS implementations of the transformation steps, either by shelling out or through the v8js PHP extension. For not-fast processing steps we might need caching with pre-population.