User:Brion VIBBER/EmbedScript 2019

This is a work in progress. Page is incomplete.

This is a re-conception of older ideas which went partially in this direction, with smaller more tractable subprojects listed as a work plan.

Background
There are two main target areas:


 * a safe way to embed interactive HTML/JavaScript "widgets" inside wiki articles alongside images and videos
 * a safer alternative to shared user scripts and Gadgets that splits plugins between trusted and less-trusted code

The common thread is the use of sandboxed elements and Content-Security-Policy to create a JavaScript sandbox for less-trusted code. There are further avenues of exploration for truly untrusted code that needs to limit CPU usage and memory allocation.

Terms

 * host: the main web page of the embedder, such as MediaWiki
 * sandbox: an isolated JavaScript context which has no direct access to host objects or data
 * trusted code: code that has access to host data -- can do almost anything the user could do
 * less-trusted code: code that is restricted from host data, but is able to trigger recoverable problems like hanging the main thread or crashing the browser tab
 * untrusted code: code that should be further prevented from hanging the main thread, crashing the browser tab, or over-allocating memory
 * host API: an asynchronous message-passing interface between trusted host code and less-trusted or untrusted widget/plugin code
 * widget: a bundle of less-trusted HTML, CSS, and JavaScript code that can be embedded in a sandboxed, optionally with a trusted host API
 * plugin: a bundle of less-trusted or untrusted JavaScript code loaded via an + Worker combination that communicates with a trusted host API

Widgets
Embedded content "widgets" have a visible area they can populate with HTML, style with CSS, and manipulate with JavaScript. They run in a sandboxed element, which causes the browser's same-origin restrictions to prevent any direct access to the host page from guest code.

Further abilities such as fullscreen, camera/microphone access, etc can be prevented through the browser's Content-Security-Policy mechanism.

It's important to ensure that old browsers cannot load the iframe contents in the parent domain if it directly contains guest HTML or JavaScript, as that would turn it into host code! For safety it is best to use a separate domain and/or to have the frame HTML only contain a stub loader than accepts its input from the host over postMessage.

Cross-origin communication channels can be blocked off with Content-Security-Policy.

A message-passing host API may be used in the setup of the sandbox, and for some host<->guest interactions like clickable links, but is not required for guest code to operate within the iframe.

Widget components
A widget definition is similar to a jsfiddle.net "fiddle" -- three main components holding HTML, CSS, and JS. They can be separately stored as strings in the host environment (wiki pages, or slots in a single wiki page, for the MediaWiki embedding). Additionally there should be a fourth metadata component with descriptions, localizable string definitions, images and common libraries to load, etc.

These can be presented in a viewer/editor as a 4-up panel (HTML, CSS, JS, and a running example) with metadata in a sidebar.

For small screens / mobile, a tabbed view may be better.

Plugins
"Plugins" are intended for the user-interface side of MediaWiki, as a way for custom code to manipulate the host page's UI through a safe, restricted API.

The code for a plugin is split into two components: a trusted host API module which has full access to the host page, and a less-trusted or untrusted guest module which communicates with the host over a secure, asynchronous message-passing API. Multiple plugin guest modules may hook into the same host API.

For instance, an editor plugin may use a host API that sends in a representation of a text selection, then waits for a response with modified text or requests to update a dialog box state.

Plugins will come with metadata for what host APIs they require and what pages or actions they should be loaded on. Operations requiring trust, like cross-site network access, can be mediated by the host API for opt-in security permissions. Plugins are reliant on the host API to present a UI for them, depending on the hooks being plugged into.

Plugin components
With no HTML or CSS interface, plugins have only two components: the JavaScript code, and a metadata component with descriptions, localizable string definitions, references to the host API hooks and any common libraries to load.

This can be presented in a viewer/editor as a single JS panel with metadata in a sidebar.

For small screens / mobile, a tabbed view may be better.

Host APIs
Host APIs are based on asynchronous message-passing, using the browser's postMessage facility between parent pages, iframes and (where needed) Worker threads. This allows sending structured messages, allowing not only the JSON data types but also ArrayBuffers, Blobs, and some other low-level types, so should be suitable for sending both text and binary data (but not HTML DOM elements).

Note it is very important that host API implementations must not insert HTML strings into the host DOM as this would allow injection!

The guest sandbox can set up a safe message-passing wrapper API on top of postMessage in cases where it's desirable to further restrict the guest API from what web or Worker contexts see.

API components
Host APIs, like plugins, have two components: JavaScript code and metadata. They can be presented similarly as a 2-up panel or tabbed view.

Threat model
"Less-trusted code" running in an cannot be prevented from causing some trouble, such as blocking the main thread for a few seconds with long-running code, or allocating a ton of memory which can crash the tab or even slow down the operating system.

Thus it's vital that widgets and plugins be recoverable in a fairly straightforward way: if you activate one and it causes trouble, you must be able to turn it off.

For content widgets, using a "click-to-play" model is safest as this will avoid running any dangerous code when an article is being viewed or edited, and reloading the page will clear things up by resetting state.

For plugins, ideally they won't be loaded until some interactive action happens, and there should be a way to turn it back off from preferences (or a context menu on the action trigger itself) without running it on that page view. Things like menu and button setup can be done in a declarative way that doesn't require pre-executing JS code.

Cross-site communication
With suitable CSP sandboxing, exposure of output data to other sites is not possible through "web bugs" (offsite image loads) or other direct techniques. This prevents any data leakage there might be from reaching an attacker. There is some possible danger of side-channel leakage through things that an attached host API permits, however -- for instance if a host API allows fetching a particular on-wiki image, media view counts might go up when the file is loaded by the browser, and this could be checked by the attacker.

Additionally, host API opt-in permissions might be abused by "Trojan horse" malicious code, such as an editor plugin that offers to do formatting cleanup but actually hides data in seemingly invisible adjustments to whitespace and formatting characters. This could then be retrieved by the malicious actor from afar.

As always, open-code and some sort of review system would help.

Requirements and compatibility
At a minimum, modern-ish sandboxing, structured clone on postMessage, and CSP support are required. Don't know the exact minimum version requirements for these yet.

Compatibility needs to be checked at runtime before injecting any untrusted HTML, CSS, or JS into an. Recommend not instantiating the until it's ready to roll; source documents should probably show an with a clickable thumbnail.

Resource limits
Because a malicious or auto-generated wiki page might include hundreds or thousands of widgets, it's best not to instantiate the s until user activation even if there's no less-trusted code in them.

The ability to safely run fully "untrusted code" without blocking the CPU or over-allocating memory would be nice too, but it's difficult/impossible to enforce memory limits on JS. WebAssembly modules running in a Worker thread could be locked down further (long loops can be terminated from the main thread, and memory can be statically limited) but this requires a lot more investigation as well as more tooling to compile easy-to-write modules to Wasm.

Note that long-running JavaScript loops in a Worker could be terminated from the parent if it doesn't respond to pings during a long loop, but high sustained CPU usage in an async/await loop cannot be detected from the main thread.

Planned work areas
There are a few areas to work on:


 *  loader setup code and host API
 * There is existing prior art such as Oasis, which can be used for inspiration
 * bundling system for taking HTML, CSS, and JS "files" and injecting them into the iframe safely
 * Lots of prior art to examine. Would like to support native JS module syntax, but will probably start simpler.
 * sample content widgets
 * update the Mandelbrot fractal generator using
 * write a "Turtle World" interactive Logo interpreter using SVG graphics
 * sample UI plugins
 * pick something clever that hooks into the editor or page views and implement it
 * MediaWiki extension storing widget and plugin code as editable wiki pages
 * I hope to use multi-content revisions to bundle widget HTML, CSS, and JS together in an "atomic" page.
 * Plugins will have both a JS code and a JSON module registration definition.
 * Host API implementations can be shared by multiple plugins.
 * There should be a common permissions-granting UX for host APIs to use.

Additional productization requirements
If this is to be ever used on Wikipedia and Commons, a good code-review system would be strongly, strongly recommended. A simple way to import/export between a git repo checkout and an on-wiki widget or plugin or host API definition would likely help for "serious maintenance" as well.

Centralized definitions that can be pulled on any wiki, and export of widgets over InstantCommons as well as locally will be required.

Localizable string definitions and a system for translation, or hooking into MediaWiki's existing translation systems, is a requirement for well-maintained tools in our context.

Further areas to examine
There are some more things to do later:


 * investigate Worker isolation further for avoiding main-thread jank
 * investigate global JS namespace cleanup for reducing the attack surface further
 * investigate WebAssembly-based isolation for truly untrusted code
 * investigate ways to automatically create thumbnail images

I'm really excited about the idea of WebAssembly sandboxing, because you can strictly limit memory usage. If in a Worker you can also terminate long-running loops from the parent thread, and restrict the ability to schedule additional execution via timers or async functions.

But it requires a custom API for the message-passing, and tooling for compiling scripting languages to standalone Wasm modules isn't good yet. Implementing a full JS engine in Wasm is an idea I've considered, but it's a big project that isn't tractable at this time.

Memory over-allocation in JS can be limited somewhat if typed arrays can be hidden from view of guest code, depending on whether the browser engine handles large strings and arrays better than they handle typed arrays... but it's a shame to lose typed arrays, which are useful sometimes. Careful initialization of global state could help mitigate this and other issues with considered-unsafe native objects inside a Worker context.

Running code in a headless browser to make thumbnails for widgets should be possible, and mostly requires server-side tooling. It's less clear whether clients could create their own thumbnails via etc.

Appendix 1: and CSP details
blah

Appendix 2: de-fanging JavaScript intrinsics
blah

Appendix 3: alternate scripting engines
I did a couple research spikes on custom-build scripting engines and how that might work, with potential for additional security checks beyond what the browser restrictions do.

In mid-2018 I examined implementation of a JS-like runtime in WebAssembly, confirming that NaN-boxing works for a compact value representation, and that a manually-maintained garbage collection root stack for locals and temporaries can function for automatic memory management within the WebAssembly linear memory region (which itself can be strictly limited). While the basics are sound, it would be a lot of work to implement a mostly spec-compliant JavaScript runtime and then maintain it in production.

In January 2019 I looked at a JavaScript-in-JavaScript implementation, investigating both interpreter and transpiled modes. This allows you to leverage the native JavaScript engine for primitives, objects, strings, arrays, GC etc but makes it nearly impossible to limit memory usage. It also becomes more fraught with danger the closer you get to native-like transpiled code, meaning it can't be relied upon to enforce safety or memory limits.

I think the JS-in-JS case is not worth trying to implement fully, though JS-based interpreters inside the sandbox for custom languages in non-performance-critical applications would be fine -- for instance an interactive Logo interpreter I want to do as a widget demo.

The JS-in-WebAssembly case is more interesting because it can enforce both safety and memory limits; and as long as kept in a Worker thread, execution time limits can be enforced by the host. I plan to do some proof of concept poking on the side, merging ideas from the two projects into the Wasm version, but don't expect to be able to bring it to production quality without a lot more investment.

If it ever appears worth it, we can try to get more dev support; if it's not worth it, it'll stay a research project.