ResourceLoader/Requirements/Michael Dale

Existing Documentation
It will help the discussion for people to read the existing documentation / summary of stuff.
 * The high level overview: (might be slightly outdated but overall communicates the high level goals) JS2_Overview
 * Documentation for API usage as extension developer: /extensions/JS2Support/README
 * mwEmbed library (JavaScript side to the script-loader) MwEmbed

High level goals
Basically we want a JavaScript framework that promotes modular reusable components, with clean separation of core libraries, configuration, invocation binding and interface code. We want to make it easy to develop "this way" both for extension developers and gadget authors. The framework will promote development of code that is easy for performance engines to optimize for client side performance. The script-loader should enable timed payload delivery for high "perceived' or "real" client side performance as the set of javascript interfaces continue to grow. The framework should work with reasonable server side resource consumption and ultimately assist in vastly reducing server load.

Balance considerations

 * Balance considerations can in many cases be data driven decisions based on industry norms and real world information about the present usage. The design should balance "high performance" initial views, with high performance of repeat views via more common cache hits. Script grouping for low number of initial requests vs many requests for distinct cacheable resources for many contexts.
 * Preset design leans towards low number of requests


 * The design should balance cluster wide multi-language script cache vs extra round trips for interface language messages.
 * Preset design assumes little benefit multi-language script cache promotes interface msgs packaged with js code to reduce number of requests


 * The design should balance ease of modular JavaScript development and debugging with scalability and ease of deployment.
 * Present design includes minimal javascript parsing to ascertain named javascript class paths, remove debug statements and swap in localized messages

Features / Technical requirements
For example:
 * Tools for debug, ability to run the entire package with direct reference to all underling JavaScript, css and image assets.
 * JS transformation and packaging.
 * Removal of debug statements
 * Minification (either php-min or Google closures java compiler )
 * Include localized messages in requested language.
 * CSS transformation and packaging -
 * CSS grouping for page views (where the user may have JavaScript disabled, CSS with proper paths and content type is served)
 * CSS for use in dynamic interface loading (here CSS is a line of JavaScript so that it can be part of the JavaScript package and reduce round-trips / requests)
 * CSS includes should be registered in the JavaScript name-space so the loader can easily identify if a defined style-sheet has been included.
 * Script Buckets - includes the concept of script-buckets for php side on-page included scripts. This makes it easy to separate php generated page script grouping by core / all-page scripts, page, page set, user scripts, site configuration and user configuration. This avoids per-page, per-user, scripts mangling the cache for "all-page" scripts.
 * JavaScript Modules - a concept of a module that includes some minimal "module loader code" that can be invoked in conjunction with the script-loader to dynamically load a set of JavaScript, CSS and interface messages. For example inline section editing, or the add-media-wizard button in the editor that loads only once users invoke it by pressing the edit section or "insert-image" respectively. A minimal bit of "loader" code is always included that can be called to "load" a given module.
 * Modules provide points of encapsulation for localization and a means for easy enabling or disabling. An extension could include multiple "JavaScript modules"
 * JavaScript localization - a JavaScript localization system that parallels the PHP based MediaWiki system for ease of use. This includes use of the NaN undefineds template transforms, and support for inline jQuery bound link substitution for conventional MediaWiki [$1 link name] msg text.
 * Stand Alone usage - the script-loader should be able to used in 'stand-alone-mode' for reusing these modules outside of Wikimedia.
 * Work with and without a reverse proxy in-front. Script-loader should support "fast-cache-hits". The entire MediaWiki code-base and localization system should not be invoked every time a client wants to display a pre-defined script group cache key. This is especially important for poor mans gzip support on shared host deployments of MediaWiki.

Assist in vastly reducing server load / future work
Currently logged in users are served dynamically generated page content this adds anywhere from a half second to 2 in "before transport" costs. Sending cache pages to everyone with default settings and dynamically requesting the "dynamic" part of the page ( user gadgets and user menu ) via JavaScript cookie, could vastly improve performance and reduce server load for logged in users. This could enable a much larger set of users to be logged-in with less server side resources. A special cookie would have to be recognized at the squid level, to enable serving logged in users the cached "anonymous page"

Summary
Recommended features:
 * Concatenation
 * Event-driven script loading
 * File transformations
 * Server-side caching (even with no valid $wgCacheDirectory)
 * Short Squid expiry time and optimised server-side cache hits.
 * Timestamp-based version numbers

Optional features:
 * Footer script tag placement
 * Support for JavaScript modules which work without a MediaWiki installation

Concatenation
Concatenating objects is unequivocally good for the client as soon as the total number of objects exceeds the browser's concurrent connection limit. It's very likely that if we're not in this regime already, we soon will be. Concatenation is useful for scaling up the complexity of our client-side ecosystem.

However, concatenation must be balanced against:
 * 1) The tendency to include excessive amounts of rarely-used code.
 * 2) The overhead incurred when different modules concatenate the same code into different buckets.
 * 3) The need to serve items with different Vary headers (gen=js etc.).

My analysis suggests that to begin with, the number of buckets should be fairly small: say, one for page view, one for edit, and one bucket for each special page that needs its own JS. The page view bucket should contain code that is required for all page views. Scripts like OggPlayer.js that are required on occasional page views should be served separately. Duplication of code (issue 2 above) should be avoided even if it means adding additional requests. Remember that browsers do have mechanisms to mitigate the connection setup overhead.

It makes sense to include skin-specific JS and CSS in the page view bucket. Thus the skin name should be part of the URL, and when the user changes their skin, they will have to reload the common JS and CSS. We should have an associative array of such parameters, and allow them to be added by any module. Some parameters I've identified are:


 * User logged in: only then do they get ajaxwatch.js
 * disablesuggest user option: suppresses mwsuggest.js
 * editsectiononrightclick user option: sends rightclickedit.js

For performance reasons, it makes sense to have these user options be passed from the HTML back to the script loader via the URL. This allows us to send the data with no Vary header.

The other option would be to send out these scripts along with dynamically generated content like user script subpages. But all requests with Vary:Cookie must be forwarded back to Florida, damaging performance for people in Europe. By striving to keep Vary:Cookie requests small, we reduce the number of Florida RTTs and thus the overall latency.

This approach can be extended as long as the length of the parameter blob is not too large. It may even be possible to remove the Vary:Cookie requests completely.

Deferred loading
There are two ways in which the loading of scripts can be deferred:


 * 1) By placing script tags in the page footer.
 * 2) By loading required JS in response to UI events such as button clicks.

For example, Drupal have chosen to move all their script tags except jQuery to the page footer. This improves the time it takes for the webpage to be displayed on the user's first visit. However, my analysis suggests that there are a large number of scripts which we would need to serve from the header:


 * For backwards compatibility and ease-of-use, user-defined scripts such as: MediaWiki:Common.js, MediaWiki:Skinname.js, User:name/skinname.js, JS subpage preview.
 * The result of the AjaxAddScript hook probably also needs to be there for b/c.
 * diff.js adjusts the display pre-render and so needs to be in the header.
 * ajax.js, some parts of gen=js (skin, stylepath variables), jQuery and wikibits are required to be loaded before other scripts, potentially including header-loaded user scripts.
 * common/IEFixes.js, edit.js, metadata.js, search.js and upload.js contain variables that are referenced from the HTML, such as event attributes, so loading them from the footer would cause JS errors if the user interacted with those elements before the script finished loading. This is fixable, but this may well be the initial situation.

Using footer script tag placement for just a few scripts, when the bulk of script loading is done in the header, may reduce the amount of concatenation which can be done, with little performance benefit to offset it.

So, I suggest making footer script tag placement an optional, low-priority part of the initial project.

Event-driven loading can be useful where a large amount of code is required to support a rarely-used feature. However, it needs to be implemented carefully. In particular, it is important to give feedback to the user to indicate that the module is loading. This reassures the user that their click did actually do something, and tells them that they are expected to wait. Footer script tag placement gives some automatic feedback via the browser's UI. With event-driven loading it needs to be entirely implemented by us.

Even with appropriate feedback, event-driven loading should only be used when the improvement in initial page view time outweighs the perceived reduction in responsiveness to events. The trade-off is not simply 1:1, the user expects to have to wait for an initial page view, and an extra 500ms there will not be so keenly noticed as a 500ms delay in response to a button click. Similarly, there is an argument for loading subfeatures of a dynamically-loaded feature immediately, instead of waiting for subsequent click events.

Several MetavidWiki modules utilise event-driven loading, and I recommend supporting them in the initial project. Where possible, event-driven loading should support concurrent and pipelined downloads, it should not serialise requests. It should also support concatenation and minification.

CSS
My analysis suggests that CSS needs to be included in the script loader project, as well as JavaScript. We are currently sending a large number of stylesheets to clients, this would benefit from concatenation.

It may be possible to use downlevel-revealed conditional comments to concatenate browser-specific CSS with general CSS. Instead of this:



We could have this:



Open source code for CSS minification exists.

Transformations
We would like to support the following transformations:


 * JavaScript minification
 * CSS Janus
 * CSS minification

These can all can be done in pure PHP and cached in $messageMemc or in a new table in the database. Caching in $wgCacheDirectory could be benchmarked also. The usual gzip output handler can be in front (wfGzipHandler).

I assume Michael's demand that MediaWiki startup time be avoided is based on a misconception about how fast MediaWiki is to start up. Startup time for a default installation with no APC is 32ms on my laptop, as measured with a simple entry point and ab -c1. This should be fast enough, as long as our concatenation strategy is sufficiently aggressive. It doesn't load the localisation system unless it is requested, let alone the entire code base. MediaWiki has several high-traffic lightweight entry points, it has already been optimised for this role.

On Wikipedia, the performance loss due to the large number of extension setup files is offset by gains from APC and faster processors, giving a startup time of around 13ms. This overhead will be reduced by having Squid in front.

The registration interface
For PHP callers, there is some value in registering and grouping scripts. For event-driven loading, there is even more value in it.

I propose splitting up registration of core and extension scripts, similar to what we do with special pages and autoload classes, to avoid excessive performance overhead when loading DefaultSettings.php. Core scripts should be registered in a static member variable of the script loader class, and extension scripts should be registered in a global variable with the same format.

Like in AutoLoader.php, core filenames should be relative to $IP, and extension filenames should be absolute.

A possible format would be to have files and groups of files sharing the same namespace. A file could be registered with:

'jquery.ui.draggable' => array( 'file' => 'js/jquery/ui/draggable.js' ),

Or with shortcut notation:

'jquery.ui.draggable' => 'js/jquery/ui/draggable.js',

The type can be guessed from the filename. Type classification only needs to be done for requested files so the performance overhead would not be too onerous. However, dynamically generated content would need a type option:

'loader' => array( 'type' => 'js', 'callback' => array( 'ScriptLoader', 'getLoader' ) ),

Files can have dependencies which need to be loaded before them:

'uploadPage' => array( 'file' => 'js/uploadPage.js', 'deps' => array( 'loader' ) ),

Dependencies need not be of the same type. A complex module, then, can be defined as a list of dependencies with no file member:

'jquery.ui' => array( 'deps' => array( 'draggable', 'droppable', 'resizable', ... ) ),

Having registered all this data, the calling code to include both CSS and JS becomes very simple:

$wgOut->addResource( 'jquery.ui', array( 'bucket' => 'all' ) );

For non-concatenated requests, the bucket option would be omitted:

$wgOut->addResource( 'jquery.ui' );

The client side interface has no buckets, so adding resources would generally require only the resource name.

I suggest the terminology "resource", since it's rarely been used in MediaWiki before, so it's suitable as a new jargon word. It's general enough that it can apply to CSS, JS, and things that we might support in the future like CSS sprites. The configuration global could be $wgResourceList or $wgResourceConf, something like that. ResourceLoader would be a good class name. "Script" implies JavaScript, which is potentially confusing. The JS2 terminology "class" is ambiguous and confusing, since that word too is used for other things.

Localisation
As previously discussed with Trevor and Michael, localisation should be done by having a message key list in some special format at the top of the source file. The presence of such localisation in a file should be noted in the file registration, to avoid the overhead of scanning unlocalised files:

'foo' => array( 'file' => 'js/foo.js', 'l10n' => true ),

This would add a dependency on a message resource. Message resources would need a special caching and invalidation system. The cache should be in the database. Two tables are necessary:

* A table called msg_resource, which stores JS blobs for each resource name * A table called msg_resource_links, which has a row per message per resource and an index each way.

MessageCache::replace should load the list of message resources of which the given message is a part. Then each relevant message resource should be loaded, modified, and pushed back into msg_resource. Locking selects can be avoided using a blind conditional update followed by affectedRows. I can explain what that means if you need me to.

The message resource cache should also store the last modification timestamp of the script file. When the script file changes, the message resource can be rebuilt. There should be no need for this cache to expire.

Compared to Michael's JS2 scheme, this scheme makes it more difficult to construct scripts which are useful both with and without MediaWiki being present. I assume this is what Michael means by "stand alone usage". Wikimedia's role is to support Wikimedia websites and MediaWiki users. I don't think non-MediaWiki users of Metavid scripts should be on our list of priorities at all.

The problem is still tractable, an interested developer could write a maintenance script which outputs the localised JavaScript for use by external installations. I just don't think that's something that Wikimedia should pay developers to do, unless we have contractual obligations.

Versioning and HTTP proxy caching
Version numbers like $wgStyleVersion have limited utility. They are useful when you have simultaneous updates of the HTML and the linked resource, and they are useful for discarding the client-side cache of a logged-in user. However, they are not useful to discard the cache of a logged-out user, since these users will have the version number cached for as long as possible in the HTML. This is because generating page view HTML is very expensive. It doesn't make sense to expire expensive HTML, when we can just expire cheap script loader requests instead.

Script loader requests will be cheap as long as the script loader supports 304 responses and cache hit ratios are high. I think we should aim to meet those conditions, and then set expiry headers on the order of one hour.

The highest-timestamp version scheme in JS2 was good, I think we should keep that general idea. However, using UNIX timestamps would make it very challenging for sysadmins to do a forced purge of a file from the squid cache.

To assist sysadmins, I think the timestamps should be human-readable, say ISO 8601, and I think they should be rounded down to the nearest 10 seconds. We should have a maintenance script "purgeResource.php", similar to purgeList.php, which can purge every possible variant of a given URL within a given time period. Such a script should be able to do 10k req/s or so, so it would be able to purge a whole month of URLs in about 25 seconds. Without the 10s rounding, it would take 250 seconds, which would be much less convenient.