Extension:WikiLambda/Jupyter kernel proposal


 * I've been doing some research on interpreters and debuggers which brought me right back around to Wikifunctions and how to implement them using multiple languages in a reasonable way that also lets function developers introspect their code as it runs. --Brion Vibber (WMF) (talk) 02:09, 26 February 2021 (UTC)

Wikifunctions is part of the Abstract Wikipedia family of projects, intended to create a programming library of functions, primarily aimed at being used to process and format data for display as raw values, tabular layout, or natural language description. There's a few pieces I'm looking at for the Wikifunctions backend and frontend story:
 * sandboxed function evaluator that supports shoving arbitrary sources into popular programming languages and returning output
 * that we can extend to launch calls to additional functions in other language runtimes, marshaling arguments
 * support stack traces with source, single-step debugging, and breakpoints where the runtime supports it
 * a debugger UI that integrates well into MediaWiki frontend
 * being able to use the same debugger UI for client-side interpreters (for other projects like interactive diagrams)

Jupyter
Jupyter is best known for the interactive "notebooks" system, where code fragments embedded in a web page are brought to life by executing them on the server and passing data to and from the web page, allowing interactive execution in a live programming environment using real programming languages and visualization of the resulting data.

I'm inclined to look at Jupyter specifically for the backend and the debug protocol; it's a project with years of experience sandboxing scripting languages for interactive execution from web pages.

A few things Jupyter does well:
 * already handles languages like Lua, Python, and JS (currently Python and C++ are the only languages with debugging support, which is new)
 * already deals with scaling and isolation/safety issues because it's often exposed to the web
 * has a pub-sub protocol between the kernels and the frontends, which we can probably adapt to allow code to invoke additional functions in a new kernel

How it generally works at the runtime/execution level:
 * "kernel" manages the programming language runtime, creating an interpreter context which can have source code fragments executed in it
 * sends messages back and forth to management & frontends over the Jupyter protocol to signal invocations, results, and errors
 * to support debugging, kernel interfaces with the runtime's internal debugging API and exposes events, sources, and data
 * there's a lot of support for scaling via kubernetes clusters etc, so we should be able to create a reasonable scaling and isolation model

The frontend in a regular Jupyter notebook would be replaced by the service layer we expose to MediaWiki, allowing functions to be invoked for use in a wiki page rendering or for an interactive debugging session. For the latter, we'd want to make our own debugger frontend that integrates well with MediaWiki, showing source code and variables and letting you set breakpoints and such.

Alternately we could try to integrate Jupyter's debugger frontend into MediaWiki, which will likely require helping make some changes to that codebase to support our house style, localization interfaces, etc.

Two things it might not handle well:
 * startup time of kernels might not be a current priority, since sessions are usually interactive and long-running
 * this can be mitigated by reusing kernels for additional invocations of a given function in the same execution session (such as the rendering of a single wikitext page, which may invoke many of the same functions at the cost of relaxing idempotency)
 * the message bus abstraction means we may not be able to have multiple kernels call into each other in-process; many calls will require an RPC with a context switch and data passed through a socket
 * this can be mitigated by at least avoiding network round-trips by running all kernels for a synchronous call session on one server
 * simple "pure functions" could be interpreted in-process by our runtime, with no RPC

Performance
Runtime performance concerns include:
 * there will likely be some constant latency to spinning up a new execution environment for each function invocation.
 * reusing a running kernel for multiple invocations would leak global state across calls with different parameters, so isn't safe. ;_;
 * ...unless we have a specialized library for pure functions that have no mutable state, and can execute multiple calls in-process!
 * relaxing determinism requirements to allow leakage of global state across invocations of the same function could greatly decrease overhead here.
 * source parsing/compilation may be slow on larger modules.
 * some language runtimes may be able to cache compiled bytecode, which might reduce this cost somewhat if we can invoke that path by storing source on the filesystem and then importing it from injected source
 * again, relaxing determinism allows us to run multiple invocations of the same function on the same process without re-parsing or re-compiling code.
 * Sending large data sets between functions will be slow, incurring serialization/deserialization time. Best practices for function authors should be to send references to data sources around when possible instead of raw data blobs.
 * Specific support for passing large buffers without a copy would be neat, but might be complex or at least would not interoperate transparently with common data types like strings, arrays, and dictionaries in that you might have to decide at the API layer whether it's possible to pass a reference or whether you want a native object that must be serialized and deserialized at function invocation boundaries.

Languages like JS that use JIT compilation and do on-stack replacement will still have a chance to optimize code that runs long loops -- a Mandelbrot fractal generator would run reasonably fast, for instance, but this compilation would happen every time the fractal function was spun up in a new kernel. So unless per-call overhead on repeated calls can be minimized, it's best to invoke one function that returns a frame buffer's worth of fractal imagery than to invoke an iteration counter function once for each pixel.

Debugging
The Jupyter debugger protocol is based on the language server / debugger protocol used by VS Code and other projects, extended a bit to cover Jupyter's model where there's a web service intermediary between the execution kernels and the web front-end. We could use this protocol with the existing Python and C++ kernels, and could help add debugging support to Node JS and Lua kernels perhaps, to get our primary work languages going.

We would have the additional complication that a virtual call stack may include functions running in multiple kernels, so we need to be able to stitch that into a cohesive debugging view of multiple sources and languages.

A MediaWiki-friendly debugger frontend could use our standard widget sets, pull sources from the wiki, etc, and manage the function invocation and debugging protocol via a persistent WebSockets connection. I'm not sure if we have anything else using WebSockets right now in primary production; this'll have a long-running connection to the client during the debug session, and similarly the kernel for the running function may stay open for minutes at a time or more as execution is paused & single-stepped, variables inspected, breakpoints set and cleared, sources and stack traces perused. So this has some additional resource usage of memory for the execution kernels over a much longer period of time than a normal function invocation, even if it's not using CPU continuously.

If we wanted, we could potentially rework Scribunto on this framework and allow interactive debugging of Lua modules. Something to consider.

The debugger UI would also be useful for client-side widgets using a sandboxed interpreter, which I'm researching for a future project. The interpreter would need to expose the low-level Jupyter debugging API, and the debugger would just connect to a virtual event bus instead of a WebSocket.

Pure function kernel and optimizations
Early planning thoughts on functions have centered on the pure application of functions as calls with lists of arguments, which may be further calls; this essentially defines a very simple programming language based on call expressions, with no variables (but the promise of caching/memoization speeding multiple calls). Many functions may be implementable this way, especially those that mainly call other functions and filter their data.

Because there's no global state that can be modified, this avoids some of the complications of general programming language implementations for the special case of calling another function that's implemented in the same way -- the same running interpreter process can simply load up an additional function definition and execute it in-process, without having to transfer arguments and results through the Jupyter protocol.

The language runtime extensions that expose the Wikifunctions service could also implement the pure-function interpreter in-process within a JavaScript or Python or Lua function, avoiding RPC overhead. This could be a large performance boost for common simple filter functions, for instance.

However this might complicate debugging; at the least it probably requires the intermediary that stitches debug info together into a coherent session to speak separately to the simple function interpreters and to the JS, Lua, Python debuggers.

Synchronous RPC calls
When making a call into another Wikifunction that requires spinning up a new execution state (possibly in another language):
 * caller's runtime sends an event up to the function service with the target function and the serialized arguments
 * caller blocks waiting for a response...
 * function service launches the callee kernel and invokes it with the serialized arguments
 * callee runs
 * function service receives the serialized result from the callee kernel, and shuts it down
 * optionally, can leave it running for reuse within the session if we relax determinism requirements to allow mutable global state across invocations
 * function service stores any cached data with suitable invalidation hooks registered
 * function service sends the  or   on to the caller's kernel
 * caller unblocks, translates the return value to a native programming language value, and returns.

Async code
It occurs to me that this system would allow calls to other functions to proceed as non-blocking async RPC calls rather than blocking for a reply.

For JS code, this would allow using  functions and the   operator, or  s with callbacks; other languages have similar structures. This might be useful for getting some parallelism, but that's also dangerous -- if you run ten thousand invocations to a sync function in a loop you've only fired up one kernel which runs one function in sequence, but if you fire off ten thousand async calls in a row and only later wait on them, you're going to instantly fill the function execution engine queue.

For this reason I would recommend either using synchronous function invocation only, or very carefully defining how much processor time and how many simultaneously active subprocesses can be spawned to reduce the impact of fork-bombs.

Callbacks and first-class functions
Another possibility to consider is whether to support sending first-class function objects as arguments and return values, which enables callbacks and user-defined control structures like  loops over custom data structures.

This would potentially be a big addition to the model: it would make certain types of programming MUCH easier that otherwise must be split over multiple functions, with the "callback"'s name and any necessary closure state manually passed in as parameters to the control function. This sort of pattern is common in today's MediaWiki templates, and isn't great. Being able to just pop a closure inline in your main function would be super nice.

Different from the async model where the runtime returns to a low layer of the stack and then waits on additional data for its event loop, the callback model would keep the stack where it is when making a call, but would accept both a  event or a   event. A call would be handled by the runtime calling the registered callback function object, then passing its value back to the other side of the connection and waiting for the next message. Unlike async, there's no danger of using more than your share of allocated CPU time since calls block on one side of the connection or the other.

Kernels passing function references around would keep some sort of reference count; this should integrate with the runtime language's garbage collector, so when it gets freed the registered UUID or whatever can be freed on the caller's end. If passed from one kernel to another, ownership will be preserved and the call can be made from a third or fourth function kernel just as well as the one it was originally passed to.

Note that callbacks would have significantly less invocation overhead than invoking a fresh Wikifunction; there would still be data serialization and the costs of the RPC (maybe even local network transit between servers) but it would not have to set up a new execution state or parse or compile any new code. So calling a callback in a loop would have less per-iteration overhead than calling forward to another function written in JS/Python/etc.

(This last changes if we relax determinism to allow invocations of the same function to have shared global state, which means we can reuse an execution kernel across multiple invocations in a loop even though they could return different values by say incrementing a counter.)

Not sure if this is a good idea, but I really like callbacks and I think a pure-function-based system will get a big usability boost from being able to use filter functions and custom control structures.

Alternatives: in-process calls, rolling our own
I think it would duplicate a lot of existing effort to try building our own similar language kernel and messaging bus from scratch, but a couple possibilities that one might do differently from Jupyter:

One could run multiple language kernels in the same process and thread, and have them invoke calls to each other directly via their respective kernels. This would still require spinning up fresh interpreter states for each non-cached function call but no RPC overhead to talk to another kernel, except for the marshalling of arguments.

In debug mode, all kernels running in the process would be mediated by a common control layer, exposing a consistent view.

This might turn out to be a big overhead savings, or it might not be that much. We might want to model and test this before investing a lot of effort.

A direct-call meta-kernel might be able to piggyback on existing Jupyter kernel work, or it might not.

Note that this model requires all kernels to execute code in the same process and thread, which means all supported runtime libraries would be linked together. This makes calls faster rather than communicating over a socket. However all runtimes must be built into one filesystem image and executable so can't be implemented by separate Docker images. Also, security vulnerabilities in one runtime could result in access to memory belonging to other functions in the call stack.

A middle ground is to use the Jupyter protocol between the kernels and the function server, but with a topology that keeps all kernels within a session talking to each other (presumably over sockets) on the same server, but using Kubernetes or other scaling to farm separate sessions out to separate CPU and server nodes. This involves serialization overhead and waiting on socket transfer for all calls, but avoids adding network round-trips unless the call has to go fetch something from a database. This involves a two-level routing system, with Jupyter language kernels managed directly by the function server and a higher-level meta-kernel that proxies invocations and debug calls up to something the MediaWiki code calls down to through the Jupyter web service.

I'm a bit unclear whether one language kernel process would handle multiple instances through threads or launching new processes, but if we have only one process per language the the per-function cost is only a new thread and interpreter state, the runtime itself is already in memory and ready to roll.

Wikidata queries and query building
Let's say we have two functions, one which returns a list of Q references based on a CSV page in Commons, and another which takes that list and filters it manually based on presence of some property:

This at least avoids round-tripping for every item in the filter, but if we were going to use this list and either pop it back into Wikidata to fetch some properties for display, or add some more filters to the query to avoid transferring data we didn't need, it might be nice to do a single round-trip and avoid having to send the long list of IDs to and from the query server multiple times. Especially if one were to refactor the data-file provider to drive out of Wikidata or another database.

Might be worth thinking about good interfaces for reading portions of large files, making queries of large CSV or RDF data sets, and being able to compose the filtering to minimize roundtrips while remaining both ergonomic and performant in a multi-language, multi-process RPC scenario!

Idempotency and non-determinism
The general idea of Wikifunctions is to have idempotent, deterministic functions that are based on their input state. But in practice, there are likely to be many sources of non-determinism which need to either be shrugged at and left as-is, plugged/neutered, or taken into proper account with a suitable cache invalidation system.

Sources of non-determinism:
 * language features that are deliberately non-deterministic, like
 * language features that return changing information about the world, like
 * the state of the language runtime itself, like checking for a feature that isn't available yet in the version of the kernel that's running when the function is deployed
 * reading a sandboxed filesystem, if the language allows it, might vary over time depending on the container details
 * reading data from a service, such as loading a file from Wikimedia Commons or making a query to Wikidata

The last (using services that we provide) are the easiest to plan for, because we'll control the API for accessing them and can treat the state of the world as an input to the function for caching purposes. A function invocation that caches its output after reading a Commons file could register a cache-invalidation hook for when that particular filename gets re-uploaded or deleted, which invalidates the cached function results and bubbles up to anything that cached data based on it.

World state might be avoidable by patching runtimes (eg to make Math.random use a consistent seed number, or replace the JS Date constructor and Date.now etc with functions that return a constant stub value, or with functional versions that register a timeout-based cache invalidation hook) but this could be error-prone in that it's easy to miss something and end up with bad cached data.

Wikidata generally though is tricky to cache on, since many things could change your result set. I don't know how hard it would be to devise a generic system for creating cache invalidation hook requests from a query, but it sounds hard.

If we're willing to budge on determinism enough to allow calls to a function to alter the global state of subsequent calls to it in the same session, then we could reuse kernels for, for instance, every invocation of a popular function used in a large wiki page rendering.

This will improve speed on three fronts:
 * Not having to create an interpreter context and parse/instantiate your function source code on every invocation
 * For JIT languages, more opportunity to run an optimized version of the function and less relative time spent compiling vs executing
 * Functions used multiple times can memoize/cache their own calculations to a global variable without the overhead of pushing cached data up to a service; this state would persist through the end of the invocation set, which might include not just one top-level function invocation, but multiple invocations throughout a wiki page parse operation.

This allows data leakage across invocations within a page, which essentially makes execution order and past inputs during a session inputs to each invocation for idempotency purposes. It does not introduce non-determinism if the order of execution is well defined, such as if each session is the processing order of a wikitext parse operation that will always be consistent for that page revision. However it does mean that any automatic caching needs to be aware that anything goes!

It would also be possible to run multiple functions that use the same language in the same interpreter context to avoid RPC cost on same-language calls, but that introduces more chances for cross-contamination. A JS function literally could have  redefined out from under it by another function that was used elsewhere in the page. And that would seem like a very bad thing.

Cache updates and getting behind
For MediaWiki templates, we went with a system that re-renders all templates on demand in case of page cache invalidation, but the way we invalidate pages' caches is to go through the list of all the pages registered as using the template and update their cache-invalidation time. For very large numbers of affected pages this can be done in many batches and can take quite a bit of time.

I don't know if we want to plan a more general improvement to this sort of invalidation technique, but we might want to specifically plan for what happens when a large invalidation happens of pages using a certain function. For a common function this could be millions of pages, and if we have to re-invoke every function on every page as they get visited and re-rendered there might be a glut of new invocation calls.


 * can we continue to use invalid data under certain circumstances?
 * can we partition a particular function or set of functions to use a "fair" amount of resources and avoid starving other work?
 * would throttling distinguish between top-level functions and the functions they call?
 * or invocations from cache invalidations versus invocations with no cached data?

Things to consider. I do kind of fear a giant out of control spike of usage from some rogue function that's spamming a bunch of loops -- or worse, a simple legit tweak to the source code of a fundamental function that's used on low-level templates everywhere, causing a re-render of every function and every page of Wikipedia. :D

Cache invalidation hook registration
One model to think about is the explicit management of data caches by complex functions, with their own opportunity to specify keys and register hooks for when a cached result is invalidated.

This might be good, or might be complex. Less certain about this so far. I don't like that it's complex to have to specify things -- but it may be a good under the hood model if we can derive invalidation hook points and suitable values automatically during execution.

Linking and explicit imports
The JS code examples above reference other Wikifunctions via  statements, rather than using an API call with a string to look up the function at runtime. This allows static analysis of the source code to determine which functions can be invoked. Linked function invocations can be stored as metadata.

This allows usage tracking: a Whatlinkshere for function definitions!

It also allows pre-fetching the definitions of any simple functions that can be implemented in-process without making an RPC call to another language kernel. All of the simple functions that could be reached directly via other simple functions could be preloaded to reduce round-trips (unless there's a combinatorial explosion in which case you might stop preloading at some point).