User:Roan Kattouw (WMF)/ResourceLoader and build steps

History note
ResourceLoader was developed in 2010. It predates modern bundlers written in JavaScript, like Webpack, Rollup, etc. Webpack is sometimes described as "Makefiles for the web", and in fact, we used to use Makefiles for concatenating and minifying JavaScript in 2009-2010. We wrote ResourceLoader because that was a pain. An explicit design goal of ResourceLoader was to avoid needing a build step and to do everything on the fly. This was partly for developer productivity reasons: versioning build output in the repo is a pain, and at the time most front-end engineers didn't know how to use command-line build steps (because that wasn't a thing in front-end land back then).

Because of the integration with MediaWiki, ResourceLoader is written in PHP. Nowadays, software that processes JavaScript code and CSS (like bundlers, transpilers, etc) is almost always written in JavaScript. But in 2010, nodejs was this crazy new idea that only some people had heard of (and npm was only a few months old when we started planning out RL). It seemed reasonable at the time to think that we wouldn't need very advanced transformations, and that we could write them all in PHP (in fact, we ported CSSJanus from Python to PHP). Nowadays, that isn't a reasonable assumption anymore, because transformations have gotten way more advanced, and they're all written in JS.

Transformation and bundling
ResourceLoader bundles assets and does some transformation on them, all implemented in PHP:


 * Concatenating/bundling JS/CSS files within a module
 * Bundling different types of resources (JS, CSS, i18n messages, HTML templates) in the same request
 * Bundling all modules loaded on a page in the same request
 * Minification of JS (using jsminplus) and CSS (using CSSMin, a homegrown minifier)
 * Remapping of relative URLs in CSS (to account for everything being served from ) and embedding of small images as data URIs (both part of CSSMin)
 * RTL flipping of CSS when needed (using a PHP port of CSSJanus, which was originally a Google project but has now been adopted by WMF)
 * LESS -> CSS compilation (using less.php, which was also started by someone else then adopted by WMF)

Notably, ResourceLoader doesn't do transformations that there aren't (good) PHP implementations for. These include ES6->ES5 transpilation (babel) and more advanced CSS transformations (postcss); the only good implementations for those things are in JavaScript. ResourceLoader also doesn't analyze JS code to e.g. automatically discover dependencies or do tree shaking, like Webpack does. Instead, it requires the developer to list each file in the module definition. (However, RL does do automatic dependency tracking for images and imports in CSS/LESS.)

Dependency management
ResourceLoader modules can express dependencies on each other in their module definitions. This dependency graph is sent from the server to the client-side runtime, which then uses it to ensure that the right modules are loaded (when loading a module, all of its dependencies are loaded too) and to ensure the right execution order (a module is only executed after its dependencies have been executed). The client-side runtime also manages the execution (and relative file path resolution) for per-file  calls.

Caching
ResourceLoader aggressively caches the contents of modules on the client side. HTTP responses to  requests have headers instructing the user's browser to cache the response for 30 days. When a module does change, we invalidate this cache using a cache busting query parameter. The module manifest in the startup module contains the version hash of each module. When the client sends requests to, it includes a   parameter, set to a hash of the hashes of the modules it's requesting. The manifest itself is cached for a short time (5 minutes), so when a module changes, the client will receive the new version hash at most 5 minutes later, and use a different version parameter in the URL the next time it requests the module. The browser's HTTP cache considers that a different URL, and ignores the previous cache entry.

ResourceLoader also stores the contents of every module it loads in localStorage (except for very large modules), and tries to load modules from there before making an HTTP request. This cache is more granular than the HTTP cache (per-module rather than per-request), and does not expire. Cache invalidation is done using the same module version hashes: if the version hash of the module as stored in localStorage doesn't match the one in the manifest, the localStorage entry is purged and the new version is downloaded from the server.

Because of unhelpful quota management behavior in Firefox, we don't use localStorage for ResourceLoader caching in Firefox. There are (vague) plans to move away from localStorage in the future, and instead use the  API in conjunction with a service worker. This would essentially allow us to use the browser's HTTP cache as a per-module cache, avoid cluttering localStorage, and re-enable per-module caching in Firefox.

How the version hashes used for cache invalidation are computed varies between module types. The simplest strategy is to generate the contents of the module (as it'd be shipped to the client), then hash that. This is used for some types of modules, but for performance reasons, most modules (including simple, file-based modules) instead hash the contents of the files that form the module. This avoids having to perform expensive steps like LESS compilation just to see if the module has changed.

Varying on skin and language
For caching reasons, ResourceLoader doesn't allow module contents to vary on the user, the page title, or almost anything else (because any variation fragments the cache). The only things  responses vary on are the UI language and skin the user is using. Varying on UI language is necessary for the bundling of i18n messages with modules to work, and to know when RTL flipping should be applied. Varying by skin is not strictly necessary, but is useful for loading skin-specific styles.

MediaWiki integration and non-file content
ResourceLoader integrates with MediaWiki in a lot of ways. Obviously it uses MediaWiki's extension registration infrastructure to let extensions register modules, and uses MW's i18n system for the i18n messages it exports, but it goes deeper than that. Features that work thanks to RL magic that integrates tightly with MediaWiki include:


 * Modules can load JS/CSS from wiki pages rather than files. This is used for site JS/CSS ( and friends), user JS/CSS (  and friends) and Gadgets
 * Configuration settings and user preferences are available in JS through  and
 * Package modules can include virtual JSON files with dynamic content, including configuration variables or other server-side data. This dynamic content is generated by a PHP callback that runs inside MediaWiki
 * The LESS import path is set such that important mixin and variable definitions from MW core can always be imported using e.g.
 * A module can inject the contents of i18n messages as LESS variables; this is used for rules like

Module loading and registration
As mentioned before, MediaWiki allows both core and extensions to register modules. It also performs the mundane but important task of gathering the list of modules to load on each page. Both core and extension code can ask for a module to be loaded from various hook points. Most frequently, this happens in the execute method of a special page or in a BeforePageDisplay hook, but extensions can also indicate that they want to load a module in a parser hook: the set of modules needed to display content on a page (e.g. graphs or maps) is stored in the parser cache, and MediaWiki ensures those modules are loaded when the page is displayed.

ResourceLoader also supports lazy-loading and conditional loading from JavaScript, using  and.

ResourceLoader's strengths and weaknesses
Strengths:


 * Tight integration with MW allows for many convenient features, which help bring client-side and server-side code closer together
 * No build step is needed, so you can't forget to run it, you don't have to configure it, you don't have to deal blobs in your repo, and code updates on reload without needing to rebuild or run
 * The caching system is designed to handle module contents changing at unpredictable times, which is necessary for our wide range of use cases (e.g. i18n messages can be changed by users on-wiki at any time)
 * Using modules rather than bundles let us deal with many different "kinds" of page views that need to load different mixes of JS/CSS assets

Weaknesses:


 * Because processing and transformation steps are done on-the-fly, they have to be relatively fast, and they pretty much have to be implemented in PHP; most modern transformation tools are written in JS
 * Files, dependencies and i18n messages must be listed explicitly in a module definition: there's no automatic detection of what depends on what, or what uses which message. If you forget to list something, it'll just break at runtime
 * Tree shaking isn't supported: you can't load only a small piece (or a single file) from a module, it's all or nothing
 * Having a lot of modules (thousands) negatively impacts the performance of the overall system (because it bloats the module manifest), so you can't approximate tree shaking by splitting libraries into small modules either

Getting a trusted build output
The build output is usually unreviewable. Committing built output to repo is annoying, but also a security risk (building dev's laptop is not a very secure environment, could be compromised, CR wouldn't notice compromised build output). Running build on deployment server is scary because executing arbitrary code from npm, and e.g. webpack could be compromised in npm. Need a controlled build env that runs only trusted code, sandboxed, whose output we can trust. (Not needed for reviewable output.)

Duplicated bundler runtime
Bundlers like webpack are intended to produce the entire bundle, not one of several sub-bundles. If multiple extensions use Webpack to generate their own bundle, then if multiple bundles are loaded, Webpack runtime is loaded twice.

Non-global tree shaking leads to duplication
If two extensions use Webpack to tree-shake the same library, they'll duplicate parts of it.

Debugging is harder
Without source maps, or other special support, you won't be able to see the original (non-compiled) code in the browser debugger, or be able to tell which file it came from. RL has some of these problems already, but in-browser debugging is pretty workable even in non-debug mode thanks to deminification (stack traces are pretty useless in non-debug mode though). Loading a built file through RL makes these issues worse. This is partly solvable by loading files differently on dev setups (from a local nodejs server), but we also need to be able to debug in production.

You're going to use ResourceLoader anyway
For things like CSSJanus, i18n messages. You can't pre-build the actual output in most cases, because it varies by language and skin, and there are too many of those. Your module is probably only loaded on some pages, or it may need to be lazy-loaded in some cases.

File concatenation and management
This is not as good in RL as in Webpack, but fairly reasonable

Tree shaking
Trying to solve a global problem locally doesn't work

The way forward for ResourceLoader
I don't really know. Using modern transformations written in JS would be nice. It would get us ES6 support, TypeScript, pre-compiled Vue templates, better JS minification, better RTL flipping (I believe postcss's RTL plugin is better than CSSJanus), more interesting CSS transformations with postcss, and who knows what else. For many reasons, I think we'll probably want/need to continue to avoid build steps and prefer on-the-fly processing. One way to do this might be to move most of RL into a nodejs service, which talks to a PHP endpoint for some of the MW-specific things it needs. Another way could be to keep most of RL in PHP, but have it call out to JS implementations of the transformation steps, either by shelling out or through the v8js PHP extension. For not-fast processing steps we might need caching with pre-population.