ResourceLoader/Architecture

This page documents the architecture and features of ResourceLoader. ResourceLoader is Wikipedia's delivery system for JavaScript, CSS, interface icons, and localisation text.

See also Presentations for recorded tech talks and slide decks that explain these features in audio-visual form.

Principles
ResourceLoader principles, in order of their relative importance:

These are inspired by the W3C Design Principles.
 * 1) Users.
 * 2) Developers.
 * 3) Servers.

User experience
First and foremost is the end-user's experience. The user experience here is defined as the perceived performance of the overall system. This includes key metrics for the loading of content and interactive features.

Developer experience
Once a goal has been set for the user experience, the developer experience comes right after that. This means we strive for the best possible developer experience, as long as it doesn't compromise the user experience or make it less likely for that experience to be realised.

The public interface presented to developers should:


 * not expose its internals,
 * be easy to understand and comprehend,
 * provide good defaults,
 * make it easy to do the right thing for users,
 * make other things possible.

Backend performance
A better backend latency directly improves the end-user experience. If that means the server can handle more load as a result, that's great. But, if it means using the same or more resources in a shorter amount of time (e.g. more RAM, CPU, or an extra server), that may be acceptable as well. The frontend costs (bandwidth, computation, and latency) must take precedence over server-side optimizations.

Having said that, ResourceLoader is highly scalable. Its server responses are always cacheable and applicable to all users, allowing it to scale to a large deployment (such as Wikipedia) with only a handful of backend servers. This means that unlike for MediaWiki page views, ResourceLoader assets fully utilize the CDN even for registered users.

In July 2011, Wikipedia's had about 400 servers (CDN edge servers and application servers). Our CDN served 90,000 requests per second at peak, of which 40,000 for ResourceLoader (e.g. JS and CSS resources). These 40,000 requests were served worldwide by only 9 Varnish frontend servers and 4 backend application servers. The cache hit ratio was 99.82%, resulting in only 73 req/s cache misses toward the backends.

In September 2019, the cache hit ratio over a 2-day period was 99.86% – with a hit-peak of 31,000/s and a miss-peak of 55/s (in total the edge saw 7,500,000 cache hits and 9,800 cache misses over the 2 days, data). Our on-going optimisations and cache defragmentation have thus lowered the overall request volume to ResourceLoader, despite Wikipedia's overall growth in traffic over the same decade.

Modules
ResourceLoader works with a concept of modules. A module is a bundle of resources identified by a symbolic name. They can contain any of the following types of resources:
 * Scripts
 * Styles
 * Messages

Aside from that a module may have several properties:
 * Dependencies
 * Group

All in all, this makes it possible to enqueue or load a module bundle by just using its name (instead of listing out all the resources and/or dependencies etc.).

Multiple module bundled are delivered to the client in a single request. More about this follows in the resource sections below. The response is unboxed by the Client.

Wrapping
When one or more module bundles are sent to the client, the script components are wrapped in a closure. They are not immediately executed when the browser parses the script response. Instead, the closure is passed to the ResourceLoader client. This allows it to control the order in which the closures execute independently from the order in which they arrive from the server, with regards to the dependency tree (which the client has in memory). See the Client section for more information about the loading procedure and a walkthrough of a example scenarios.

Minification
All scripts are minified before being put in the bundle. For this we use the JavaScriptMinifier library. In case of a cache-miss, the minification is done on-the-fly on the web server. See also the Caching section for more about the performance of packaging and the caching infrastructure around it.

Conditions
Scripts can be conditionally included in a module based on the context of the requesting client (e.g. language code and skin ID). This keeps responses relatively small by only including components relevant to the client context.

Example uses of conditions:
 * The bundle containing a language grammar parser includes a different implementation based on the user language.
 * The bundle containing the logic for rendering Notification includes an extra stylesheet file optionally provided by the user's currently preferred skin. The Vector skin component can register this custom stylesheet which is picked up by the Notification bundle owned by a different component.
 * Moment.js has regional definitions for 62 different languages. Only one of the regional definition files will be included at run-time. Language chaining and fallbacks are handled by MediaWiki's localization framework.

Embedding


In order to reduce the number of HTTP requests for images used in the interface, ResourceLoader makes use of Data URI embedding. When enabled, images will be automatically base64-encoded and embedded into the stylesheet. While it will make the stylesheet larger (due to base64 inflation), it improves performance by removing the overhead associated with requesting all those additional files. The actual server response (which contains the minified result of all concatenated stylesheets and all embedded images in a single request) uses gzip compression. This enables the response to function a bit like a "super sprite" (more on this later). Regardless of the expansion caused by base64 encoding, the gzipped ResourceLoader response is still smaller than the sum of the individual CSS and image binary files.

To enable embedding for an image, use the " " annotation in a CSS comment over the relevant CSS declaration. For example:

Using this technique makes traditional sprites obsolete. While the motivation behind sprites is good (less HTTP requests, better compression) it does come with a few caveats:
 * Maintenance. If an image needs to be updated, one has to regenerate the sprite file, update the background positions in the CSS output, etc.


 * Produces overly complex CSS.

These caveats aren't the end of the world (sprites are in wide use, clearly they do work). Some other resource delivery systems do use sprites, some even perhaps some of the maintenance in an automated fashion. The automated embedding, however, provides the best of both worlds – without any of the caveats.
 * Imposes restrictions on image usage. Properties background-repeat, background-size or background-position may not be used due these leaking other images in the same sprite.

The advantages of sprites still hold up:
 * Reduced number of HTTP requests.
 * Improved compression by combining images in one file.

In addition to that:
 * No maintenance.
 * Clean CSS.
 * No restrictions or sprite "leakage" bugs.
 * Even smaller number of requests. The CSS and the images are now in the same request.
 * No download delay flash. Once the stylesheet is there, all the images are as well. This improves perceived performance. Browsers normally don't download files referenced in CSS until their selectors are active (e.g.. ). This would cause a flash when the user first hovers a button. Embedding makes the image instantly available from the data URI.
 * Better gzip compression. Embedding allows unrelated images and stylesheets to be freely combined and compressed together - a "super sprite", if you will. Also, PNG headers enjoy better compression.

Remapping
For icons that are not embedded, ResourceLoader transforms the relative file path into an absolute one. This is necessary because file references in CSS are meant to be relative to where the stylesheet is served from, which changes meaning when these files are bundled and served from a different URL.

The URLs are also made immutable by appending a truncated content hash as query parameter. This can be used by a cache proxy to disambiguate between multiple versions of the same file during a deployment, and to prevent a multi-server cluster from having the response from an "old" server populate the URL a browser got from a "new" server whilst mid-deployment (T102578, T47877)

Flipping

 * See also Directionality support for more information about directionality support in MediaWiki.



With the Flipping functionality it is no longer necessary to manually maintain a copy of the stylesheet for right-to-left languages. ResourceLoader automatically changes direction-sensitive CSS declarations (and more). Internally, the CSSJanus library provides that smart "flipping" logic.

Aside from flipping direction-oriented values, it also converts property names and shorthand values. And it converts references to filenames ending in  into filenames ending in , thereby loading direction-specific iconography,

Consider the following example:

When loaded by ResourceLoader, without any additional changes or configuration, it is automatically turned into the following for users with a right-to-left interface language set:

Sometimes you may want to exclude a rule from being flipped. For that one can use the  annotation. This instructs CSSJanus to skip the next CSS declaration. Or, when used in the selector part, it skips the entire following CSS ruleset.

For example:

Output will be: Note: When using Less CSS and nested selectors, the noflip annotation must be placed above each individual rule, not above the selector.

Bundling
As mentioned, all resources are combined in a single bundle. The loader response from the server bundles both scripts and styles from the requested module(s) in the same request. The Client receives this and loads the stylesheet in the DOM at the right time, so they are in memory when the relevant scripts that use these CSS classes, execute.

This means that neither the JS nor the CSS will run if JS is disabled. However, if you need the CSS to still run, you can add one or more CSS-only module with.

Minification
All stylesheets are minified before being put in the bundle. For this we use the CSSMin library, which was especially developed for ResourceLoader.

Conditions

 * See the Conditions section under Scripts for more information.

Similar to scripts, style bundling also features the ability to compose the module dynamically based on the context.

Resource: Messages
Messages are exported as a JSON blob, mapping the message keys to the correct translation. They're fetched on the server from MediaWiki's localization framework (including its language fallback logic). Only message keys used by the module are included in the bundle.

Bundling
Again, all resources are bundled in the same request. The Client then takes the messages and registers them in the localization system on the client side, before the javascript body is executed.

Conditions
As with the other two resource types, the messages component is also optimized to load only what is necessary for the requesting context. This is especially important considering that MediaWiki is localized in over 300 languages. Only 1 unique set of messages is delivered to the client.

Front-end
So, how does all this play out in the front-end? Let's walkthrough a typical page view in MediaWiki, focusing on the ResourceLoader Client.

Startup Module
The startup module is the first and only hardlinked script being loaded on every page from a  tag. It is a lightweight module that does three things:

It starts by performing a quick sanity check that bails out if the current browser cannot support the base environment This avoids incomplete interfaces and script errors, by preserving the natural non-javascript fallback behavior. For incompatible browsers, the startup module is the first and last script to be loaded. (view source) It exports the module manifest. This contains the dependency information of all modules, request groups (if any) and the current version hash for each module. (see  in the console) It defines the ResourceLoader Client. The use of this manifest allows ResourceLoader to naturally avoid the Cascading Cache Invalidation problem that some other bundlers suffer from. It also allows for "perfect" cache fragmentation and cache re-use through a defragmented module store.
 * 1) Sanity check
 * 1) Module manifest
 * 1) Define the loader

Client
The ResourceLoader Client is a tiny JavaScript library in charge of loading and executing modules from the server. It reads the module manifest and dependency tree as its input. This client is instructed by the HTML to load modules for the current page.

The client defines mw.loader which can be given a list of module names to load. It automatically handles dependency resolution using the internal dependency map. It also naturally de-duplicates and will not start loading or executing any module more than once.

The loading process is fully asynchronous, and requests modules in batches from the server.

Store

 * See also Research:Module storage performance on Meta-Wiki.

The ResourceLoader client caches the contents of individual modules within the web browser (i.e. HTML5 LocalStorage). This drastically helps reduce cache fragmentation.

For example, imagine two unrelated modules A and B that both make use of a third module C that is exceptionally large. Module A is used on page "Foo", and module B on page "Bar". Without a module store, the following would happen:
 * 1) User views article Foo. Browsers makes network request for
 * 2) User views article Bar. Browsers makes network request for   . (Thus downloading a second copy of C)
 * 3) User views article Foo. Browser uses cache for.
 * 4) User views article Bar. Browser uses cache for.

On the second page view, the browser was unable to use C from its cache, because it is stored under a a batch request url. In the above scenario, the user would fully download the big "C" module multiple times, despite it not having changed, and it already being in the cache somewhere as part of "A+C".

With ResourceLoader's module store, the client caches each part of the response to the batch request separately in a local cache (backed by a localStorage blob). This is not affected by other modules in the same batch request. Let's reconsider the same scenario with these improvements:


 * 1) User views article Foo. Browsers makes network request for   . These two are unpacked on arrival and locally stored separately, as "A" and "C".
 * 2) User views article Bar. Browser executes "C" from local store, and makes network request for   . Then, "B" is also added to the store.
 * 3) User views article Foo. Browser executes "A" and "C" from store. No network request.
 * 4) User views article Bar. Browser executes "B" and "C" from store. No network request.

Execution

 * This section is incomplete


 * Execution separated from loading/parsing.
 * Direct or delayed execution as appropriate based on module dependencies.
 * Insert messages and styles into memory before script execution.

Back-end

 * See also: § Caching

Bundle request validation
The backend encourages HTTP caching in web browsers and cache proxies.

For bundle responses it does this by ensuring all urls are effectively immutable, allowing browsers to cache and unconditionally re-use their responses on subsequent page views (through far-future expires, or nowadays the "max-age" Cache-Control directive).

Startup request validation
For the startup module, only a relatively short cache age is tolerated. This relates to a number of key features and guarantees that ResourceLoader offers:


 * Deployments must take effect globally within 10 minutes.
 * The version of scripts and styles loaded for a user must not vary from page to page.
 * The page HTML must be highly-cacheable and served from a CDN.

Together this has led us to an approach where the startup module is the only script linked from the HTML and served from a URL that is version-agnostic.

To reduce repeat downloads of the startup module within the first few minutes of a browsing session, we give the response a 5 minute expiry that starts when they first download it (as opposed to when the CDN cached it from a backend). The CDN also caches its copy from the backend server for 5 minutes. These two sliding windows together ensure a 10 minute effective max-age.

The startup module also allows conditional requests (via the E-Tag and If-None-Match headers) which means after the 5 min expiry, browsers generally only need to download a small  response to renew their existing copy.

Response
GET /load.php?modules=foo|bar|quux&lang=en&skin=vector&version=…

Balance
This section is incomplete


 * Batching.
 * Alphabetical order.
 * Combined version hash (to allow long-term static caching by CDNs, proxies, and web browsers).

Groups
The module request "group" can be used to optimise cache fragmentation. By default any two modules are allowed to be loaded together in the same batch request. The client store prevents most cache fragmentation automatically, which is why in general you do not need to use this option.

If fine-tuning is needed, then one or more module bundles can be forced to be split in a dedicated request group. Use these sparingly as they naturally cause additional HTTP requests, and thus reduce compression effiency.

Any freeform string can be used as a group name. Modules with the same request group assigned may be loaded in the same request.

It is conventional to use lowercase dashed name, typically derived from a substring of a the related modules names (e.g. "jquery-ui" or "ext.foo").

Beware of the below reserved names. The reserved groups have as special added behaviour that they disqualify for client store optimisations and also have additional behaviour:
 *  . Reserved for modules that vary by username (e.g. user scripts). These HTTP requests get an extra " " query parameter. This parameter is available in the ResourceLoaderContext object passed to content methods (e.g. getScript, getStyles). Due to the extra parameter, they don't share cache with other users or logged-out users. The cache will be public. The stylesheets in this module group are loaded after all other modules (last cascading order), through the DOM's " " marker.
 * . Reserved for modules that are not allowed to be loaded from the public  endpoint (e.g. for CSRF tokens). Modules in this group are automatically embedded by OutputPage in the HTML when loaded. They cannot be loaded on demand.
 *  . Reserved for stylesheets that are user-generated content, but are not user-specific (rather for the entire site). The stylesheets in this module group are loaded after all other modules (last cascading order), using the "ResourceLoaderDynamicStyles" marker as separation.
 *  . Reserved for stylesheets that are user-generated content, but are not user-specific (rather for the entire site). The stylesheets in this module group are loaded after all other modules (last cascading order), using the "ResourceLoaderDynamicStyles" marker as separation.

On-demand package generation
ResourceLoader features on-demand generation of the module bundles. The on-demand generation is very important in MediaWiki because cache invalidation can come from many places. Here's a few examples: Core and extensions generally only change when a wiki is upgraded. But especially on large sites such as Wikipedia, deployments happen many times a day (even updates to core). Wiki users granted certain user rights (interface administrators by default) have the ability to modify the "site" module (which is empty by default and will be loaded for everybody when non-empty). This is all without servers-side access, these scripts/styles are stored as wiki pages in the database. On top of that, each user also has its own module space that is only loaded for that user. The interface messages are shipped with MediaWiki core and are generally considered part of core (and naturally update when upgrading/deploying core). However wikis can customize their interface by using the MediaWiki message namespace to modify interface messages (or create new ones to use in their own modules).
 * Core
 * Extensions
 * Users
 * Translators

Cache invalidation
Every module has a version hash. This version hash is is how decide whether to bust the cache (when a module would generate a different response than before), or to allow re-use (if it remains the same). For static file modules, it is generally based on the following factors:

When a stylesheet references an image, such as, these are embedded or expanded into immutable urls. In either case, the stylesheet would vary if these icons change (see also: Remapping). The definition includes the order in which files are included, and other metadata that can influence the output of the response.
 * Content of JavaScript and CSS files
 * Content of indirect file references in CSS
 * Content of interface messages
 * Module definition

We track all these factors because it is considered too expensive to on-demand generate the actual module contents for all registered module bundles whilst computing the startup manifest (Wikipedia has over 1000 registered modules).

It is also considered infeasible to "build" all modules ahead of time due to the large number of variants supported. For example, as of August 2019 Wikimedia Foundation's deployment has 900+ wikis, 400+ languages, 5 skins, and ~ 1100 modules. Generating all these variants during deployment could take hours. In addition, the wiki's are always in flux with many modules having the capacity to vary their response based on the content of a publicly editable wiki page. (See also § On-demand package generation).

Disable on a single page
To make it easier to debug a specific page without the influence of site-wide or user-specific gadgets, scripts, or styles; it is possible to temporarily disable them by setting the " " query parameter on any page, e.g. https://www.mediawiki.org/w/index.php?title=Project:Sandbox&safemode=1.

Debug mode

 * This section is incomplete

To make development easier, there is a debug mode in ResourceLoader.

Differences:
 * Script resources: No longer minified, concatenated, or loaded from load.php. Instead, load.php will instruct the client to request each source file directly. This makes debugging scripts easier with your browser's developer tools. But, transformations such as the wrapping closure also don't apply; so scripts may execute in the global scope.
 * Style resources: No longer minified, concatenated, or loaded from load.php. Instead, load.php will instruct the client to request each stylesheet directly. Transformations such as URI embedding and RTL-flipping don't apply.

Toggle mode
The mode can be toggled in several ways. In order of precedence:


 * Query parameter  (string): Set to "true" to enable, to "false" to disable. When absent, falls back to next step. https://example.org/wiki/Main_Page?debug=true
 * Cookie  (string): Set   to enable, delete the cookie to disable (setting to "false" does not work). When absent, falls back to next step.  There is a user script available to simplify toggling this.  Add:   to Special:MyPage/common.js on the wiki where you're debugging.  A link to enable or disable debug mode will be added to the toolbox.  In some browsers, you may need to hard refresh before the change takes effect.
 * (boolean): The default mode is determined by this configuration setting. Unless overridden in LocalSettings, this will be set to . A production wiki should never set this to true as debug mode will then be served to everybody. Thus being inefficient and likely introducing bugs due to the nature of debug mode.

Conclusion


In conclusion we'd like to think of ResourceLoader as creating a development environment that is optimized for:


 * Happy developers Easy to work with modules without worrying about optimization, maintenance, building, or what not.
 * Happy servers The application itself scales well, and is optimized to run on-demand.
 * Happy users Faster pages!

JavaScriptMinifier
Although the re-generation of a module bundle should be relatively rare (since cache is very well controlled), when it does happen it has to perform well from a web server.

For that reason it doesn't use the famous JSMin.php library (based on Douglas Crockford's JSMin) because it is too slow to run on-demand during a request response. Although JSMin.php only takes about 1 second for (which is okay if you're on the command-line), when working on-demand in a web server response (with hundreds of large files needing to be minified) waiting that long is unacceptable, especially if potentially thousands of requests could come in at the same time, all finding out that the cache isn't up to date (to avoid a cache stampede).

Instead ResourceLoader uses a custom minifier called JavaScriptMinifier, contributed by Paul Copperman. This runs up to 4X faster than JSMin. In addition to the speed, time has told that JavaScriptMinifier interprets the JavaScript syntax more correctly and succeeds in situations where JSMin outputs invalid JavaScript. The output size of JavaScriptMinifier is slightly larger than JSMin (about 0.5%, based on a comparison by minifying jquery.js, where the difference was 0.8KB). The reason this is not considered a loss is because it is put in the bigger picture. ResourceLoader doesn't aim to compress as small as can be no matter the cost. Instead it aims for a balance, getting large gains in a wide range of areas while also featuring instant cache invalidation, fast module generation, a transparent "build"-free environment for the developer, etc. The fact that it could be a little bit smaller then becomes an acceptable trade off.

CSSMin
Features:
 * Minification
 * Remapping
 * Data URI Embedding