Design Systems Team/Code splitting

This is a proposal for how to approach code splitting in Codex, focused mainly on the impact that would have on the developer experience of using Codex in MediaWiki.

Current situation
Most features using Codex are encouraged to use the  ResourceLoader module. This module contains the entire Codex library, which is fairly large: 156 KB of JavaScript and CSS (transmitted over the network as 32.2 KB of compressed data); and this number will only grow as more components are added to Codex. Most features use only a subset of Codex components, so a substantial portion of this code is unused.

Some features use CSS-only components, and only load the  module, which contains the CSS without the JavaScript (68.8 KB of CSS, compressed to 9.6 KB). This module doesn't contain any JS, but it does contain the styles for all components in the library, including components that the feature might not use, and including styles that are only needed for the JS version of the components.

For the search feature in Vector, the Web team was very concerned about limiting the size of the code that is loaded, since the search feature appears on every page. To support this, the Design Systems Team created a special build of Codex, and made it available as the  and   modules in ResourceLoader. These modules only contain the TypeaheadSearch component and its dependencies. It's about half the size of the full library: the styles module is loaded at page load time and is 29.4 KB of CSS (4.5 KB compressed); the JS module is loaded when the user interacts with the feature, and is 36.7 KB of JS (12.6 KB compressed).

These search-specific modules ensure that no unused code is loaded for users who use the search feature. However, unused styles are still loaded for users who don't interact with the feature (because the  module contains styles for components that only appear after the user types something). This is also a one-off way of addressing the problem that requires special configuration in the Codex library and publishing a separate NPM package, which doesn't scale well if we want to provide this treatment for multiple features.

Another problem with these search-specific modules is that they duplicate part of the full Codex library. If both the search feature and another feature load on the same page, causing both  and   to be loaded, the search-specific components are loaded twice. Our current system is not smart enough to deduplicate this double-loading of components.

Proposal
Features that use Codex would list the components they need in their ResourceLoader module definition. ResourceLoader would then embed the JS for these components (and the components they depend on) in the contents of that module as a packageFile, and add the CSS for these components to the module's styles. This ensures that each feature loads exactly the components it needs, and no more.

Simple example
In extension.json (or Resources.php), use the CodexModule class for the RL module that uses Codex, and list the Codex components the module uses: In App.vue, get the components from  instead of from , but otherwise use Codex normally: See also this merge request in CodexExample for another usage example.

Deduplication
This approach doesn't address deduplication: if two features that are constructed this way load on the same page, any Codex components that are used by both features would be double-loaded. We propose solving this problem in a targeted way rather than a general way. We expect that most features that use Codex will fall in one of two categories: they're either used on a very limited number of pages (e.g. the UI on a special page, or the contents of a Wikifunctions page), or they're used on almost all pages (e.g. the Vector search bar, or a future Codex implementation of UniversalLanguageSelector or Echo). It should be rare for two features from the former category to be loaded on the same page, because their scopes are generally non-overlapping. If two features using Codex are loaded on the same page, it's safe to assume at least one of them is something that appears on (almost) every page. For this reason, we focus on addressing duplicate loading of the Codex components that are used by features that appear on every page.

We propose manually curating a list of core components that are likely to overlap between every-page features and limited-scope features, and creating a ResourceLoader module that embeds these core components. ResourceLoader modules that use Codex would then depend on this core components module. For ease of use for the developer, these modules would still request embedding of all the components they use, and use them the same way in JavaScript as they would non-core components, so that consumer code doesn't have to be updated if the list of core components changes. But internally, ResourceLoader would get these components from the core components module, rather than embed them.

Deduplication example
In Resources.php, we might do something like this: A feature that uses Codex would then define a ResourceLoader module like this: In this example,  would embed the Card component (which is not in the core components module), but would not embed the Button component (it would instead get it from the core components module). It would also embed Thumbnail (which is needed by Card and is not a core component), but it would not embed Icon (also needed by Card, but it's in the core components module).

CSS-only modules
A feature that uses Codex CSS-only components could set, like this: This would embed only the CSS for the Card and Message components (and the components they depend on).

A feature that uses CSS-only components initially, but then replaces them with Vue components when JS loads, could create a style-only module and a JS module, like this: The JS module would embed the JS of the TypeaheadSearch component, but not its CSS, because it would detect that that is already provided by the style-only module.

Proof of concept implementation
The Design Systems Team has written the following proof of concept patches. These are not full implementations, but just serve to demonstrate the concept:


 * In Codex: a patch that makes the build system output the library as many small JS files that require each other (rather than one large JS file), as well as a  file describing the dependency graph between these files.
 * In MediaWiki core: a patch that implements part of the CodexModule functionality described above, by reading the manifest file from Codex and embedding the appropriate files. This only implements the simple example, not the dependency smartness or style-only handling.
 * Examples of how to use CodexModule in VueTest and CodexExample

Style-only modules
Style-only modules can't have dependencies, see T191652 (in particular this comment explaining why this restriction exists). This causes a problem for the deduplication strategy: we would like to create a  module and tell CSS-only feature modules to depend on it, but they can't. This means that CodexModule can't deduplicate them (unless we instruct it to do so in a different way), and that developers loading these modules have to manually remember to load both their module and. Working around this is probably doable, but the developer experience wouldn't be great. If we ever had multiple layers of style dependencies (e.g. because we have multiple modules with shared components that depend on each other), this would become a much bigger problem.

Naming
This proposal proposes the following new names, but we're not very attached to these names and welcome ideas for better ones:


 * CodexModule: The subclass of ResourceLoader\Module that is used by modules that embed Codex components. This class already exists, but currently serves a different purpose (it's used for the  and   modules, and houses the   function)
 * codexComponents: The key in the module definition that lists the components used in the module. We could rename this to reflect the fact that things that are not components (composables and utility functions) can also be listed here; unfortunately we don't yet have a good generic term that covers "component, composable or utility function".
 * codexStyleOnly: The key in the module definition that indicates that this is a style-only module, and only the CSS of the requested Codex components should be embedded.
 * codex-subset.js: The name of the virtual file generated by CodexModule that contains the requested Codex components (in practice, this is a wrapper file that requires the requested components from other files

Migration
Once this feature is introduced, we should deprecate and then remove the current  and   modules. But should we also deprecate and remove the main  and   modules, and force all uses of Codex in MediaWiki to use this system?

Magic behavior
Does it make sense for the  call in these modules to be  ? Or would it make more sense to use ? We chose the former because it seemed confusing to require from  when there is already an RL module by that name (and it would have required subverting some RL internals).

Does it make sense for the  file to magically appear, without being listed in  ? Should it always appear in the root directory of the module? Or should we automatically detect the right path for it, by making it a sibling of the entry point file? Or should we allow (or require?) the developer to specify the name/path of this file?

Should modules using CodexModule have to explicitly specify a dependency on, or should this be added automatically, since the embedded Codex code already depends on Vue?

One module per component
If we made every Codex component its own ResourceLoader module, everything would be a lot simpler: features could just use ResourceLoader module dependencies to pull in exactly those components they want, and ResourceLoader's module loading system would ensure reuse and prevent duplicate loading. However, there are currently 29 components in Codex, 7 composables, and 4 other chunks of code that are shared between components, so we would need to create 40 modules. To support CSS-only use of Codex, each component would to be split into two modules, a style-only module and a JS module that depends on it; this would increase the total number of modules to 69.

ResourceLoader is not designed to be used this way: there is a performance impact associated with creating this many modules, and much work has gone into reducing the number of modules. For this reason, we didn't think that creating 69 new modules (and more over time, as more Codex components are created) would be acceptable. The style-only modules would also have complex dependency relationships between each other, which ResourceLoader does not support.

Fundamentally, code splitting presents an iron triangle-style trade-off (a triple constraint). There are three desirable properties: tree-shaking (not loading unused code), deduplication (not loading code twice), and a low module count. Any two of these can be satisfied perfectly, but only by completely discarding the third. The "one module per component" approach achieves perfect tree-shaking and perfect deduplication, but requires the highest number of RL modules. Embedding components in the module that uses them achieves perfect tree-shaking and requires zero additional modules, but does not achieve deduplication at all. The status quo of every feature loading the entire Codex library achieves perfect deduplication and requires only two modules, but does not achieve tree-shaking at all. The proposed solution attempts to find a middle ground where all three properties are mostly but imperfectly satisfied: tree-shaking is achieved mostly but not perfectly (some features may load a core component but not use it), deduplication is achieved mostly but not perfectly (if two features appear on the same page and share a non-core component, that component will be loaded twice), and the number of additional RL modules required is low but not zero. We propose this solution because we think the theoretical imperfections will rarely come up in practice, and are an acceptable price to pay for significantly reducing the number of modules required.

More feature-specific builds within Codex
The current  module is built by Codex's build system, and published as a separate NPM package. It's designed to serve a particular use case where Codex appears on every page. We could expand Codex's build system to build more packages like this, with various subsets of the library needed for various use cases. We don't propose this because it scales poorly; because MediaWiki-specific usage details should not be embedded in Codex; and because these builds would duplicate parts of each other and of the full library.

The duplication issue could be addressed by making Codex build subsets of the library that  each other for deduplication, but this would substantially increase the number of ResourceLoader modules required, even for a relatively small number of subsets. This is because each subset needs 2 modules (one for JS, one for CSS-only), and because deduplicating chunks of code that are shared between subsets would require additional modules to be created.

Build step in MediaWiki
We could do tree-shaking of the Codex library in MediaWiki itself (and/or in extensions that use Codex), using a build tool like Rollup or Vite. But this is equivalent to the "embed components in the modules that use them" approach, with the same lack of deduplication; to avoid deduplication, some sort of coordination between extensions that use Codex has to take place. Introducing a build step in MediaWiki has also run into other problems and objections when proposed in the past.

Related efforts
Once the Vue 3 migration is completed and we can switch from the migration build of Vue to the regular build, this will reduce the size of Vue from ~57 KB compressed to ~50 KB compressed.

If we were able to use a build step or some other mechanism to compile Vue templates to JavaScript (or at least do so in performance-sensitive places), we could load the runtime-only build of Vue. This would reduce the size of Vue further, from ~50 KB compressed to ~33.5 KB compressed.