User:Daniel Kinzler (WMDE)/Frontend Architecture

This document describes how MediaWiki's user interface should function. It is intended to provide constraints and guiding principles for feature development.

The front-end architecture defines the flow of information and control between the client device and the application servers (or, more generally, the data storage layer). In practice, this mostly mostly means defining when, where and how HTML is generated, manipulated, and assembled.

Goals
''Note: this is a straw-man vision. It's here for the sake of discussion.''

User facing goal: Provide consistent modern user interface across platforms.

1) A (mostly) data driven single page web application that heavily relies on client side JS and uses a modern framework for state management and template rendering.This should be the default view for both desktop and mobile clients.

2) A (mostly) static view for use by client software without the necessary level of JS support for the single page App. This may still have some optional bits of "old school" JS. The static view is served as a full HTML page which is rendered server side based on the same templates also used on the client side by the single page experience.

Both views share the same URLs, and the appropriate view is selected by detecting client capabilities. This could perhaps be done by initially serving the static view, which then gets replaced with the single page app if possible.

Architecture goal: Turn MediaWiki from a monolith into a language-agnostic framework.

Through the use of API routing, dependency injection, server side template rendering and page composition at the edge, eventually allow APIs, HTML output and internal services to be implemented in PHP, JS, or some other language.

The freedom this provides has to be balanced against the overhead of crossing the language barrier, against the requirements of the installation environment, as well as the complexity of managing the deployed application. That is, we want to use this freedom, but use it wisely. While it should be possible to use an external standalone service to back most of the service interfaces used by MW core, this should only be done if worthwhile, considering all overhead in operations and maintenance.

Assumptions
These are straw-man assumptions, presented here to be challenged!
 * LAMP: The basic version of MediaWiki has to be installable on a plain LAMP stack without root access (shared hosting use case).
 * NO-V8. We cannot reply on being able to run PHP and JS code in the same process / without the need for communication via the network stack (no v8js in PHP). This implies that we can't call out from PHP to JS code fro template rendering, nor can we call PHP code from JS for localization.
 * NO-JS-SW: We cannot rely on Service Workers as a mature technology.
 * JS-WW: We can rely on Web Workers being available on the client. It would be OK to treated clients that do not support Web Workers as having no JS support at all.
 * ESI: we can use Edge-Side-Includes (ESI) for page composition the WMF cluster. We want to to use this to compose static pages from HTML fragments (see FRAG) to satisfy NO-JS. To also satisfy LAMP, we need an alternative composition mechanism (or ESI emulation) in index.php.
 * NODE: For optionaly functionality, we can execute JS on the server. In particular, optional API actions may be implemented in JS. However, calls between JS and PHP are expensive and should be avoided (because of NO-V8). Also, no core critical functionality must require JS execution on the server (because of LAMP).
 * NO-DATA: For the foreseeable future, some output (particularly of Special pages) may only be available in HTML, since it will take time for all extensions to be converted to a fully data driven approach. That is, some functionality will not be available via a JSON based REST interface, but instead as HTML (FRAG).
 * PREFER-JS: JS-enabled environments are preferred. Not all user workflows have to be supported on non-JS clients. However, see NO-JS.
 * NO-JS: For clients with no (sufficient) JS support, we still need to support a) consumption of all content, including meta-data, b) editing of text-based (bot not non-text) page content c) basic (but not all) curation

Needs and Desires

 * JS-UI: We want to allow data driven UIs, though it is acknowledge that some HTML will always be generated on the server (PARSER-API) and some HTML will be generated on the server for a while (NO-DATA, FRAG). Eventually, all functionality shall be available via an API.
 * REST: The web interface (JS client) should be using the same RESTful (in a borad sense) APIs as the native clients (apps). Client code should not need to know how or where a service is implemented. This should be hidden by the appropriate API routing (at the load balancer and/or the application server).
 * FRAG: Fragments of HTML (page content, skin parts, special page forms, etc) can be requested from a designated endpoint (may be part of or separate from the REST API). Needed by ESI, which is needed by NO-JS. Things that expose HTML fragments are mainly: the skin, ContentHandlers, input forms for Special pages and action handlers, results (listings) on for Special pages and action handlers, dynamic content for some page types (file pages, category pages).
 * PARSER-API: We won't render actual Content (like wikitext) on the client. We want a single source of truth for rendering wikitext (and any other content type) as HTML.
 * ONE-UI: server-side rendering shall use the same templates as client-side rendering (needs 2L10N, TEMPLATES). The web interface (JS client) should be the same for desktop and mobile devices. We expect the line between mobile and desktop to blur and finally disappear over the next 5 years.
 * MULTI-DC: We want full Multi-Data-Center support. This means all information that is needed to decide whether a request needs to be routed to the master DC needs to be in the request. For the application servers, this mostly means "don't write to the database in GET requests".
 * JS-IFACE: We want to expose narrow, stable interfaces for client-side customization (gadgets)
 * AGNOSTIC: It should become possible to implement an API module and an associated special page purely in JS. It should remain possible to implement an API module and an associated special page purely in PHP.


 * REACTIVE: We do not want to server different content to different kinds of clients, beyond the split between the single-page app and static pages. The content we serve should adapt to the device client-side.
 * MULTILANG: We want the ability to serve renderings/presentations in different target languages. URLs for different renderings (languages) of content should be different, to allow explicit linking to a specific rendering, and to simplify caching.
 * TEMPLATES: We'd like the ability to render declarative templates on the client as well as on the server. Template rendering needs to be available in PHP and JS. This probably means the template engine has to be implemented in PHP and in JS (implied by NO-V8).
 * 2L10N: We want consistent localization when rendering on the client or on the server, including message strings, parameter substitution, plural handling, and formatting for numbers and dates. This probably means maintaining a full JS port of the relevant formatting code, since proper data-driven UIs are blocked on this (they need client side formatting of data values, and NO-V8 implies no JS based server side L10N).

Components
Output synthesis (page composition/rendering) can be organized into the following layers: New system components / high-level services:
 * 1) Full HTML page / DOM
 * 2) HTML snippets (exposed by API, used JS and ESI)
 * 3) Pre-formatted data (view-model / annotated HTML, exposed by API, used for template rendering)
 * 4) external data model, exposed by API
 * 5) For meta-data: JSON, exposed by API, needs L10N-aware formatting
 * 6) For page content: annotated HTML
 * 7) internal data model
 * 8) For meta-data: PHP, not exposed
 * 9) For page content: native format, e.g. wikitext, exposed by export. Exposed by API for text-based formats.
 * COMPOSE: Page composition at the network edge (see ESI), backed by an API for serving HTML fragments (see FRAG), needed to satisfy NO-JS while also supporting JS-UI.
 * This should be used at least to combine the parts of the skin with the page content.
 * Usage of this mechanism is optional (to satisfy LAMP). The index.php entry point still needs to deliver a fully composed page per default, so MW is usable without an ESI capable caching layer.
 * This requires a dependency tracking and puring engine (using Kafka as a bus and some graph database for storing the dependency graph). Used to re-generate HTML snippets (and other derived artifacts).
 * It would perhaps be useful to support additional massaging/hydration, beyond what ESI supports. This would allow use to do localization here, as well as adapt for client devices.
 * ESI requires that the caching layer needs to be able to predict what kind of content it will be getting by looking at the request. This means e.g. that Special pages that serve non-HTML output would not be possible or would have to be especially registered.
 * FRAG: Endpoint for serving HTML snippets (for use by ESI), see.
 * Must at least serve all bits needed for the skin and rendered page content (including special pages, action handlers, etc)
 * Could also serve bits of composite page content (e.g. infoboxes)
 * Blurry distinction from template rendering API
 * PURGE: Unified Dependency Tracking and Purging based on an Event Bus.
 * This makes it easy to introduce new kinds of artifacts or change granularity, without having to implement a tracking and updating mechanism for each use case. This is particularly important for the HTML snippets / ESI, as well as for allowing caching for the APIs that suppoort a data driven UI.
 * TEMPLATE: Template engine with a JS and a PHP implementation.
 * Only if we have localization implemented in both JS and PHP, templates can be fully data driven.
 * If we have PHP code as the single source of truth for L10N handling, templates can:
 * A) Use pre-formatted data: this would probably require an API mode that doesn't return abstract JSON nor rendered HTML, but some kind intermediate form (which may be annotated HTML). This could be interpreted as being a "view-model" in the sense of an MVVM architecture.
 * B) Call back to PHP for formatting. That's probably too slow, but might be possible in some cases with appropriate batching.
 * Rendering Special pages can be HTML based (legacy) or data driven (modern):
 * HTML based special pages should generate annotated "semantic" HTML, somewhat similar to the output of Parsoid, that allows easy massaging for different target devices.
 * Data driven Special pages are just glue that applies template rendering to data returned by an API call. The template rendering would happen on the server or the client, as need be.
 * REST: A common REST API interface for functionality implemented in PHP, JS, or whatever other language.
 * Clients should not be aware of where and how an API is implemented.
 * Routing should be possible in the CDN layer / load balancer.
 * Routing should also be possible inside MW core, so it is available without a CDN layer.
 * The new API should map to the existing action API for most if not all cases, to we don't have to re-implement all API functionality.
 * MVC: JS framework that maps between a data model and the DOM, and manages API calls to the backend (MVC/MVVM). Needed for JS-UI.
 * This kind of framework is designed for a fully data driven environment, with all rendering done in JS. However, we will still have HTML snippets coming from the backend. At the very least for rendered page content. These snippets need to be integrated into the DOM, and they may need massaging/hydration.
 * This requires JS template rendering, see there for L10N issues.
 * PARSER-API: A single REST API for accessing rendered page content (most importantly for for wikitext), which yields output similar to the output of Parsoid.
 * This should probably be backed by a PHP-based parsoid port, to avoid calling back from JS to PHP code for each template, parser function, etc.
 * ASSET: A REST API for asset delivery (next generation ResourceLoader)
 * For JS and CSS resources, associated icons
 * For localization resources (message bundles)
 * Should make aggressive use of caching on all levels.
 * Not for embedded media (probably)