User:Brion VIBBER/EmbedScript 2019
This is a work in progress. Please feel free to provide feedback on the talk page or directly to me via mail/irc/etc!
- 1 Background
- 2 Terms
- 3 Widgets
- 4 Plugins
- 5 Host APIs
- 6 Threat model
- 7 Planned work areas
- 8 Appendix 1: <iframe> and CSP details
- 10 Appendix 3: alternate scripting engines
There are two main target areas:
- a safer alternative to shared user scripts and Gadgets that splits plugins between trusted and less-trusted code
- host: the main web page of the embedder, such as MediaWiki
- trusted code: code that has access to host data -- can do almost anything the user could do
- less-trusted code: code that is restricted from host data, but is able to trigger recoverable problems like hanging the main thread or crashing the browser tab
- untrusted code: code that should be further prevented from hanging the main thread, crashing the browser tab, or over-allocating memory
- host API: an asynchronous message-passing interface between trusted host code and less-trusted or untrusted widget/plugin code
Further abilities such as fullscreen, camera/microphone access, etc can be prevented through the browser's Content-Security-Policy mechanism.
Cross-origin communication channels can be blocked off with Content-Security-Policy.
A message-passing host API may be used in the setup of the sandbox, and for some host<->guest interactions like clickable links, but is not required for guest code to operate within the iframe.
A widget definition is similar to a jsfiddle.net "fiddle" -- three main components holding HTML, CSS, and JS. They can be separately stored as strings in the host environment (wiki pages, or slots in a single wiki page, for the MediaWiki embedding). Additionally there should be a fourth metadata component with descriptions, localizable string definitions, images and common libraries to load, etc.
These can be presented in a viewer/editor as a 4-up panel (HTML, CSS, JS, and a running example) with metadata in a sidebar.
For small screens / mobile, a tabbed view may be better.
"Plugins" are intended for the user-interface side of MediaWiki, as a way for custom code to manipulate the host page's UI through a safe, restricted API.
The code for a plugin is split into two components: a trusted host API module which has full access to the host page, and a less-trusted or untrusted guest module which communicates with the host over a secure, asynchronous message-passing API. Multiple plugin guest modules may hook into the same host API.
For instance, an editor plugin may use a host API that sends in a representation of a text selection, then waits for a response with modified text or requests to update a dialog box state.
Plugins will come with metadata for what host APIs they require and what pages or actions they should be loaded on. Operations requiring trust, like cross-site network access, can be mediated by the host API for opt-in security permissions. Plugins are reliant on the host API to present a UI for them, depending on the hooks being plugged into.
This can be presented in a viewer/editor as a single JS panel with metadata in a sidebar.
For small screens / mobile, a tabbed view may be better.
Host APIs are based on asynchronous message-passing, using the browser's postMessage facility between parent pages, iframes and (where needed) Worker threads. This allows sending structured messages, allowing not only the JSON data types but also ArrayBuffers, Blobs, and some other low-level types, so should be suitable for sending both text and binary data (but not HTML DOM elements).
Note it is very important that host API implementations must not insert HTML strings into the host DOM as this would allow <script> injection!
The guest sandbox can set up a safe message-passing wrapper API on top of postMessage in cases where it's desirable to further restrict the guest API from what web or Worker contexts see.
"Less-trusted code" running in an <iframe> cannot be prevented from causing some trouble, such as blocking the main thread for a few seconds with long-running code, or allocating a ton of memory which can crash the tab or even slow down the operating system.
Thus it's vital that widgets and plugins be recoverable in a fairly straightforward way: if you activate one and it causes trouble, you must be able to turn it off.
For content widgets, using a "click-to-play" model is safest as this will avoid running any dangerous code when an article is being viewed or edited, and reloading the page will clear things up by resetting state.
For plugins, ideally they won't be loaded until some interactive action happens, and there should be a way to turn it back off from preferences (or a context menu on the action trigger itself) without running it on that page view. Things like menu and button setup can be done in a declarative way that doesn't require pre-executing JS code.
With suitable CSP sandboxing, exposure of output data to other sites is not possible through "web bugs" (offsite image loads) or other direct techniques. This prevents any data leakage there might be from reaching an attacker. There is some possible danger of side-channel leakage through things that an attached host API permits, however -- for instance if a host API allows fetching a particular on-wiki image, media view counts might go up when the file is loaded by the browser, and this could be checked by the attacker.
Additionally, host API opt-in permissions might be abused by "Trojan horse" malicious code, such as an editor plugin that offers to do formatting cleanup but actually hides data in seemingly invisible adjustments to whitespace and formatting characters. This could then be retrieved by the malicious actor from afar.
As always, open-code and some sort of review system would help.
Requirements and compatibility
At a minimum, modern-ish <iframe> sandboxing, structured clone on postMessage, and CSP support are required. Don't know the exact minimum version requirements for these yet.
Compatibility needs to be checked at runtime before injecting any untrusted HTML, CSS, or JS into an <iframe>. Recommend not instantiating the <iframe> until it's ready to roll; source documents should probably show an <img> with a clickable thumbnail.
Because a malicious or auto-generated wiki page might include hundreds or thousands of widgets, it's best not to instantiate the <iframe>s until user activation even if there's no less-trusted code in them.
The ability to safely run fully "untrusted code" without blocking the CPU or over-allocating memory would be nice too, but it's difficult/impossible to enforce memory limits on JS. WebAssembly modules running in a Worker thread could be locked down further (long loops can be terminated from the main thread, and memory can be statically limited) but this requires a lot more investigation as well as more tooling to compile easy-to-write modules to Wasm.
Planned work areas
There are a few areas to work on:
- <iframe> loader setup code and host API
- There is existing prior art such as Oasis, which can be used for inspiration
- bundling system for taking HTML, CSS, and JS "files" and injecting them into the iframe safely
- Lots of prior art to examine. Would like to support native JS module syntax, but will probably start simpler.
- sample content widgets
- update the Mandelbrot fractal generator using <canvas>
- write a "Turtle World" interactive Logo interpreter using SVG graphics
- sample UI plugins
- pick something clever that hooks into the editor or page views and implement it
- MediaWiki extension storing widget and plugin code as editable wiki pages
- I hope to use multi-content revisions to bundle widget HTML, CSS, and JS together in an "atomic" page.
- Plugins will have both a JS code and a JSON module registration definition.
- Host API implementations can be shared by multiple plugins.
- There should be a common permissions-granting UX for host APIs to use.
Additional productization requirements
If this is to be ever used on Wikipedia and Commons, a good code-review system would be strongly, strongly recommended. A simple way to import/export between a git repo checkout and an on-wiki widget or plugin or host API definition would likely help for "serious maintenance" as well.
Centralized definitions that can be pulled on any wiki, and export of widgets over InstantCommons as well as locally will be required.
Localizable string definitions and a system for translation, or hooking into MediaWiki's existing translation systems, is a requirement for well-maintained tools in our context.
Further areas to examine
There are some more things to do later:
- investigate Worker isolation further for avoiding main-thread jank
- investigate global JS namespace cleanup for reducing the attack surface further
- investigate WebAssembly-based isolation for truly untrusted code
- investigate ways to automatically create thumbnail images
I'm really excited about the idea of WebAssembly sandboxing, because you can strictly limit memory usage. If in a Worker you can also terminate long-running loops from the parent thread, and restrict the ability to schedule additional execution via timers or async functions.
But it requires a custom API for the message-passing, and tooling for compiling scripting languages to standalone Wasm modules isn't good yet. Implementing a full JS engine in Wasm is an idea I've considered, but it's a big project that isn't tractable at this time.
Memory over-allocation in JS can be limited somewhat if typed arrays can be hidden from view of guest code, depending on whether the browser engine handles large strings and arrays better than they handle typed arrays... but it's a shame to lose typed arrays, which are useful sometimes. Careful initialization of global state could help mitigate this and other issues with considered-unsafe native objects inside a Worker context.
Running code in a headless browser to make thumbnails for widgets should be possible, and mostly requires server-side tooling. It's less clear whether clients could create their own thumbnails via <canvas> etc.
Appendix 1: <iframe> and CSP details
"sandbox" attribute must include:
- allow-scripts (required to run scripts)
Must not include:
- allow-same-origin (would be unsafe to allow this!)
For the CSP header,
- default-src should be 'none'
- img-src must include 'data:' or 'blob:' to allow using images sent into the embedding
- font-src must include 'data:' or 'blob:' to allow using fonts sent into the embedding
- media-src must include 'data:' or 'blob:' to allow using audio or video files sent whole into the embedding
- style-src may need 'unsafe-inline'?
- script-src: 'unsafe-inline' is required to inject the CSS and styles, unless srcdoc or a separate domain with server-side bundling can be relied on
- script-src: 'unsafe-eval' is optional (allows eval and Function constructor)
- use sandbox: 'allow-scripts' in CSP also to enforce protections on compliant browsers
- allow any offsite anything! (cross-site communication dangers)
- must be able to eval scripts
Note that while it might be "safe" in practice to allow img-src to include images from ourselves (upload.wikimedia.org etc), it would complicate reusing things offline significantly. Plugins should be able to be used self-contained if the necessary resources are made available to them from the host.
Need to do more testing to confirm all the CSP settings do what I think they do.
Some past sandboxing projects like Caja have tried to enforce a sandbox world by creating custom prototype chains and rewriting code; this way lies madness and incompleteness, with many potential sandbox escapes.
However within a browser-based sandbox -- where the JS engine itself enforces separation -- some additional things can be cut out using the same techniques.
How JS object prototypes work
When looking up property values, if an object does not have its own property for a given key the engine will check another object called the "prototype", and return the value from the prototype (or its prototype, and so on). A prototype object and a constructor function are usually associated together, so that objects created with a constructor inherit from its associated prototype. By convention (but not required), prototypes point back to their constructors too.
In theory you can replace any (?) global value and even change prototype chains, even to the point of making your own custom constructors and prototype chains to replace the regular ones -- but it's important to note that native code in the browser engine may not respect your custom objects the way you think!
For instance when a string primitive is coerced to an object, it will have the "intrinsic" %StringPrototype% object as its prototype -- it will not necessarily have the current value of String.prototype as its prototype!
This means roughly that while you can replace global constructors with custom functions, and you can replace the properties and methods of global prototypes with custom ones, you can't necessarily replace the actual prototypes themselves.
As long as you're running in a separate "realm" from the host -- as we already are within the sandboxed <iframe> and/or Worker -- then this should be sufficient for fixing a lot of potential funky behavior.
One possible thing is replacing the ArrayBuffer and TypedArray constructors and accessor methods to track how much memory is being allocated (warning: there's no way to track frees, only allocations, because JS has no finalizers!) This could help with the poor behavior of Firefox and Chrome with respect to ArrayBuffer-backed memory allocations -- they don't seem to limit them, and a page can allocate memory until the machine OOMs as far as I can tell.
One could also simply clean up the global namespace to present a more consistent interface between browsers, if desired...
Appendix 3: alternate scripting engines
I did a couple research spikes on custom-build scripting engines and how that might work, with potential for additional security checks beyond what the browser restrictions do.
I think the JS-in-JS case is not worth trying to implement fully, though JS-based interpreters inside the sandbox for custom languages in non-performance-critical applications would be fine -- for instance an interactive Logo interpreter I want to do as a widget demo.
The JS-in-WebAssembly case is more interesting because it can enforce both safety and memory limits; and as long as kept in a Worker thread, execution time limits can be enforced by the host. I plan to do some proof of concept poking on the side, merging ideas from the two projects into the Wasm version, but don't expect to be able to bring it to production quality without a lot more investment.
If it ever appears worth it, we can try to get more dev support; if it's not worth it, it'll stay a research project.