Extension:WikiLambda

The WikiLambda extension is a MediaWiki extension in early development that forms the core of the Wikifunctions software stack, as part of the work towards Abstract Wikipedia.

Architectural concept


Wikifunctions will be a MediaWiki installation on which function content, but not output, is stored. This content takes the form of programmatic descriptions of each available function ("Functions"), actual user-written code for these functions ("Implementations"), test suites for these functions ("Testers"), and human-facing documentation about these functions, alongside the usual wiki community management content like village pumps, discussion areas, and policy pages.

Requests for function calls can come in directly via a Web request, or via MediaWiki (probably through a parser function like, but this is not yet decided). The function orchestrator determines the specifics of the request, checks the cache to see if the output value has been recently provided, and either returns that or proceeds to trigger an evaluation. To do that, it fetches all relevant content from the cluster, be that published function content (or mid-edit as-yet-unpublished function content being "previewed") from Wikifunctions, or structured content on which functions can operate from Wikidata and Wikimedia Commons. The complete bundle of code to execute and inputs on which to execute are then passed to the appropriate function executor, which then securely tries to execute the input and returns the output (or a failure error). The orchestrator then writes the successful result to the cache and transmits it to the consumer.

The WikiLambda extension is responsible for content management of the content on the Wikifunctions wiki (like the WikibaseRepo extension), providing editing interfaces, restricting users from or warning about certain actions, and providing a reading and test execution interface, and the integration of content requests on all Wikimedia wikis (like the WikibaseClient extension). User-written code is never executed in the context of the production environment, and thus never has access to sensitive content, but is only run in isolated, disposable instances.

Big questions

 * Limits and throughput
 * What makes for a "good enough" service?
 * What limits (time, memory, inputs) are appropriate to ensure that individual requests aren't too burdensome?
 * What limits (time, memory, inputs, requesting user, use case) are appropriate to ensure that request load in aggregate isn't too burdensome?
 * Do we cache invalidate on input change (e.g. Wikibase item edited), or do we wait on a request before re-evaluating?
 * Integration
 * How can we reconcile the asynchronous nature of the function service with the synchronous nature of MediaWiki's parser and the ParserCache? Can we add a UNIQ--QINU marker and come replace it later? What do we show readers in that circumstance? Do we use a hybrid model for this, injecting inline if the response comes back within 50ms but otherwise waiting?
 * How can we reconcile the asynchronous nature of the function service with the synchronous nature of HTTP? Just reply with HTTP 202 (request accepted)? HTTP 503 (service unavailable) if it's not immediately in the cache?
 * How can we ensure our portion of the production memcached instance is limited to not over-load the main use case (painting MW pages)?
 * Will we want to cache input values for fetched content (Q1234, Z1234, M1234, etc.) inside the orchestrator, or should those also land in the main production memcached instance?
 * Will we want to cache input values for fetched content (Q1234, Z1234, M1234, etc.) inside the orchestrator, or should those also land in the main production memcached instance?