User:Daniel Kinzler (WMDE)/Frontend Architecture

From mediawiki.org

This document describes how MediaWiki's user interface should function, if Daniel was king. It is intended to provide constraints and guiding principles for feature development.

The front-end architecture defines the flow of information and control between the client device and the application servers (or, more generally, the data storage layer). In practice, this mostly mostly means defining when, where and how HTML is generated, manipulated, and assembled.

Goals[edit]

Note: this is a straw-man vision. It's here for the sake of discussion.

Provide consistent modern user interface across platforms.

1) A (mostly) data driven single page web application that heavily relies on client side JS and uses a modern framework for state management and template rendering. This should be the default view for both desktop and mobile clients. [Alternative: provide the APIs needed for 3rd parties to build such an experience, and use the same APIs to drive our native mobile aps].

My implicit assumption: custom internet facing APIs will be needed for distribution channels other than classic (desktop) and mobile web UAs. ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
I don't like single-page applications for things as simple (in visual/UX terms) as wikis. They tend to be heavy, slow, and break down badly when the connection is not optimal. I would go as far as to say that we would probably benefit from having separate frontends for separate functions (mobile, desktop, app, etc). GLavagetto (WMF) (talk) 16:50, 28 February 2018 (UTC)
Agree we can expect multiple client implementations, though a JS-assembled client-side page should be more robust against intermittent connections than a multi-page CRUD app if it's properly written to handle when things fail. If the 'smarts' are oriented around service workers, then the two styles also theoretically combine well. --Brion Vibber (WMF) (talk) 17:49, 28 February 2018 (UTC)
I like this a lot, not just because it provides a good Web interface, but also because it enforces a level of isolation between presentation and business logic that allows other UIs (native, voice, VR/AR, telepathy) to work as well as the Web app does. --EvanProdromou (talk) 17:50, 18 December 2018 (UTC)

2) A (mostly) static view for use by client software without the necessary level of JS support for the single page App. This may still have some optional bits of "old school" JS. The static view is served as a full HTML page which is rendered server side based on the same templates also used on the client side by the single page experience.

Both views share the same URLs, and the appropriate view is selected by detecting client capabilities. This could perhaps be done by initially serving the static view, which then gets replaced with the single page app if possible.

In practical terms, this seems to suggest that minimally there will be two edge cache variants for a given page - one HTML and one HTML inside of a JSON payload. (I'm glossing over user-specific variants for the moment, as that's separable and discussed elsewhere in this document). ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
This would mean we will cache an initial page served to any new request, which will be the same, rendered server-side. The javascript code in that basic html version should then determine if the SPA approach would be taken. I do see such an approach as problematic in terms of caching though, as we would surely see a multiplication of cached objects and this should surely be taken into consideration when designing such an approach. GLavagetto (WMF) (talk) 16:50, 28 February 2018 (UTC)
What if the edge cache always contains precomposed pages, while the client page & service worker JS bypasses the cached pages if/when possible? We should be able to have the user see a "/wiki/Foobar" URL without that being what actually gets fetched by catching the navigation attempt on-page, or the request in the service worker, and returning the lightweight page harness plus client-side fetching and composing. --Brion Vibber (WMF) (talk) 17:49, 28 February 2018 (UTC)
(By bypass cache, I mean 'fetch the separate components, which themselves will be edge-cached separately'.) --Brion Vibber (WMF) (talk) 18:54, 28 February 2018 (UTC)
That's what I had in mind, yes -- Daniel Kinzler (WMDE) (talk) 11:23, 5 March 2018 (UTC)
One thing to keep in mind here is the User specific data that needs to be rendered on the client (i.e. Notifications). Currently we serve the entire page uncached for authenticated users. So if we want to serve more content from the cache, then that does necessitate decomposing the page into generic and user specific components. Which does lead to the need to edge cache more objects - I am assuming. CFloyd (WMF) (talk) 17:31, 9 March 2018 (UTC)

Desires[edit]

Straw-man list of desired capabilities:

  • JS-UI: We want to allow data driven UIs, though it is acknowledge that some HTML will always be generated on the server (PARSER-API) and some HTML will be generated on the server for a while (NO-DATA, FRAGMENTS). Eventually, all functionality shall be available via an API.
  • REST: Client code should not need to know how or where a service is implemented. We offer a comprehensive REST API. The web interface (JS client) should be using the same RESTful (in a borad sense) APIs as the native clients (apps).
  • PARSER-API: We want to offer access to the rendered content of page content (for all revisions and slots, in all target languages)via the REST API.
  • ONE-UI: server-side rendering shall use the same templates as client-side rendering (needs 2L10N, TEMPLATES). The web interface (JS client) should be the same for desktop and mobile devices. We expect the line between mobile and desktop to blur and finally disappear over the next 5 years.
Granted, mobility hardware for web consumption will advance. This said, I perceive lightweight markup and lightweight initial bootstrapping static assets to continue to be important for the foreseeable future. I see in ASSET a consideration for modernizing ResourceLoader. I support that, and from various discussions it seems to be desired and achievable. ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
This needs a bit of clarification/discovery… what is meant by "one UI"? Also it seems to be dependent on features. One line of thinking is that some features only make sense in one context and not the other. If we have "one UI", what does that mean for those features? Also one of the reasons that we are building REST as a way to support many UIs that are decoupled from the business logic. This can be seen as slightly in contradiction with that goal CFloyd (WMF) (talk) 16:29, 12 March 2018 (UTC)
The point here was that we use the same templates on the server side as on the client side. The standard UI should look the same, whether it was rendered on the server or the client. This is not intended to say that we casn't have different entry points (apps, whatever) for different audiences or purposes.
I suppose the ONE-UI handle is misleading. Could change it to CONSISTENT-LOOK, though that'S a bit long and clunky... -- Daniel Kinzler (WMDE) (talk) 16:10, 13 March 2018 (UTC)
  • MULTI-DC: We want full Multi-Data-Center support. This means all information that is needed to decide whether a request needs to be routed to the master DC needs to be in the request. For the application servers, this mostly means "don't write to the database in GET requests".
  • JS-IFACE: We want to expose narrow, stable interfaces for client-side customization (gadgets)
This to me also goes hand in hand with the REST API. If we have a JS API in the browser, I assume it maps to REST APIs. CFloyd (WMF) (talk) 16:29, 12 March 2018 (UTC)
But it's not just about gadgets interacting with the server side API. It's also about offering a clean interface (MVC/MVVM) for gadgets to interact with client side content, instead of messing with the DOM directly. -- Daniel Kinzler (WMDE) (talk) 16:10, 13 March 2018 (UTC)
We should also be talking about extensions here. Narrowing the interface for extensions is also important. Ensuring extensions expose their functionality as APIs is also important. How we do this needs discussion CFloyd (WMF) (talk) 16:29, 12 March 2018 (UTC)
Right. This, in my mind, follows from REST. Though perhaps it would make sense to mention extensions explicitly. -- Daniel Kinzler (WMDE) (talk) 16:10, 13 March 2018 (UTC)
  • AGNOSTIC: It should become possible to implement an API module and an associated special page purely in JS. It should remain possible to implement an API module and an associated special page purely in PHP. See also SOME-SOA.
To clarify, is this suggesting that extension authors can choose which server side language to use from a choice of JS (presumably, Node.js) or PHP (or whatever supported languages), provided they follow a particular specification? ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
Yes. However, for anything to be deployed on the WMF cluster, the choice needs to consider the overhead of communicating with other services across the network, and the overhead of deploying and running standalone services, as per SOME-SOA. In practical terms, for WMF, there would be a strong bias towards PHP, though JS may be selected in some cases. -- Daniel Kinzler (WMDE) (talk) 13:50, 2 March 2018 (UTC)
  • REACTIVE: We do not want to serve different content to different kinds of clients, beyond the split between the single-page app and static pages. The content we serve should adapt to the device client-side.
In other words, is this presuming the client side ServiceWorker tech will apply transforms for classic (desktop) and mobile web? ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
It's assuming that the transformation happens on the client. ServiceWorkers/WebWorkers are one option for this. -- Daniel Kinzler (WMDE) (talk) 13:50, 2 March 2018 (UTC)
  • MULTILANG: We want the ability to serve renderings/presentations in different target languages. URLs for different renderings (languages) of content should be different, to allow explicit linking to a specific rendering, and to simplify caching.
  • L10N-JS: we want to be able to use MediaWiki's L10N mechanisms (messages, parameter substitution and formatting, plural handling, etc) in JS on the client (and on the server). See also Messages API support in JavaScript.
  • SOME-SOA: Through the use of API routing, dependency injection, server side template rendering and page composition at the edge, eventually allow APIs, HTML output and internal services to be implemented in PHP, JS, or some other language.
    • The freedom this service oriented architecture (SOA) provides has to be balanced against the overhead of crossing the language barrier, against the requirements of the installation environment, as well as the complexity of managing the deployed application. That is, we want to have this freedom, but use it wisely. While it should be possible to use standalone services to back most of the service interfaces defined in MW core, this should only be done if it's actually worthwhile, considering all overhead in operations and maintenance.

Assumptions[edit]

These are straw-man assumptions, presented here to be challenged!

  • LAMP: The basic version of MediaWiki has to be installable on a plain LAMP stack without root access (shared hosting use case).
I am not sure counting on shared hosting needs in the mid- to long-term future really makes sense, IMHO. Mobrovac-WMF (talk) 19:42, 28 February 2018 (UTC)
I'm actually hoping that assumption will fall, but so far, there seems no consensus for that. So I have included it here as the status-quo assumption. -- Daniel Kinzler (WMDE) (talk) 14:02, 2 March 2018 (UTC)
  • PHP: The backbone of MediaWiki will remain PHP for the foreseeable future, and we will need to support a zoo of PHP based extensions for quite some time.
  • NO-V8: We cannot rely on being able to run PHP and JS code in the same process / without the need for communication via the network stack (no v8js in PHP). This implies that we can't call out from PHP to JS code from template rendering, nor can we call PHP code from JS for localization.
This seems to mostly depend on the mid-term roadmap for PHP, and given how client-centric web technologies have become, this might be a non-issue Mobrovac-WMF (talk) 19:42, 28 February 2018 (UTC)
I don't quite follow - are you saying that we will be able to rely on v8js, or that we won't need it? If the latter, what is the alternative? -- Daniel Kinzler (WMDE) (talk) 14:02, 2 March 2018 (UTC)
In other words - I don't think we'll get rid of a PHP based backbone for MediaWiki in the next 5 years. Probably never. We may end up moving more and more functionality into the client, or into standalone services. But that weill be a gradual process, and we cannot assume that it will end with nothing being left in PHP land. I suppose this assumption is worth making explicit, I'll add an entry for it. -- Daniel Kinzler (WMDE) (talk) 14:54, 5 March 2018 (UTC)
  • ESI: we can use Edge-Side-Includes (ESI) for page composition the WMF cluster. We want to to use this to compose static pages from HTML fragments (see FRAGMENTS) to satisfy NO-JS. To also satisfy LAMP, we need an alternative composition mechanism (or ESI emulation) in index.php.
ESI is a mess, not properly implemented anywhere in caching software, and a nightmare for debugging. Been there, done that. I'm sure I express the sentiment of others in the operations team: if we need to create page compositions server-side from fragments, that should be done in a page composition service (which could also be internal to the frontend service/MediaWiki), not via the use of technology from the nineties that everyone has basically abandoned. GLavagetto (WMF) (talk) 16:55, 28 February 2018 (UTC)
I recall concern about ESI as the appropriate means of composition. Is there a reason to not use a more powerful technology independent of the edge caching architecture itself? If I understand correctly, the reason why ESI is considered for composition is to deal with variants principally related to authenticated user preferences (i.e., RL/gadgetry/origin-side variants). But it's conceivable you might want variance based on more sophisticated conditioning, in which case a more flexible technology may be sensible. I'm not saying ESI isn't an option, but was curious about the emphasis specifically on ESI and any corresponding architectural and maintenance risk. In a way this goes back to my question on the REACTIVE piece. ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
I have re-added the NO-SW assumption below, which got lost in the editing process. If NO-SW doesn't hold, Service Workers would be a more flexible and powerful alternative to ESI. -- Daniel Kinzler (WMDE) (talk) 14:02, 2 March 2018 (UTC)
Hm... a page composition service would have to sit behind the CDN, not in front of it, like ESI, right? In my mind, page composition should happen in a way that allows cached components to be composed so quickly that caching of the result is not needed. -- Daniel Kinzler (WMDE) (talk) 14:12, 6 March 2018 (UTC)
  • NO-SW: We can't rely on Service Workers as a mature technology for page composition or template rendering on the server or network edge.
I may be confused about this - Service Workers are mainly a client side technology, but I though we were considering a server side implementation of them. That's what I am referring to here.
My point is that as far as I know, we do not have a mature JS based technology for doing template rendering or DOM massaging on the fly at the network edge. With the available tech, server side rendering only scales behind a CDN, not at the edge of the CDN. How should that be phrased? -- Daniel Kinzler (WMDE) (talk) 12:15, 6 March 2018 (UTC)
  • NODE: For optionally functionality, we can execute JS on the server. In particular, optional API actions may be implemented in JS. However, calls between JS and PHP are expensive and should be avoided (because of NO-V8). Also, no core critical functionality must require JS execution on the server (because of LAMP).
Currently, MW's start-up is much more expensive than the price of spawning processes or doing network I/O Mobrovac-WMF (talk) 19:42, 28 February 2018 (UTC)
Yes, that's a big part of the problem: while calling a node service from PHP has some overhead, calling api.php from within JS has a LOT of overhead. So node.js services are currently only sensible if they don't need to call back to PHP all the time (in particular for localization, see L10N-JS). -- Daniel Kinzler (WMDE) (talk) 14:02, 2 March 2018 (UTC)
  • NO-DATA: For the foreseeable future, some output (particularly of Special pages) may only be available in HTML, since it will take time for all extensions to be converted to a fully data driven approach. That is, some functionality will not be available via a JSON based REST interface, but instead as HTML (FRAGMENTS).
Is it possible to send the core HTML payload, bundled inside a JSON payload? I ask mainly for the simplicity of clients. Although clients may adapt their rendering strategy based on a registered property of the response's backing special page, should they have to know these details in order to support the default workflow? ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
Yes, that could be done, but the main problem with special pages is that they assume they are in control in all of the output, so while sticking it inside JSON just not to break the contract is doable, the question is whether that is useful to clients at all. I also believe that Daniel here implicitly meant that the page's output would not be broken out into pieces as it would be the case of normal pages. Mobrovac-WMF (talk) 19:42, 28 February 2018 (UTC)
No, I'm assuming that we can serve special pages's HTML as a fragment separate from the skin (wrapping in JSON, or "naked").
Special pages that control *all* output, like Special:Export, are a special case. They are essentially separate endpoints, and should probably use a different URL path. Daniel Kinzler (WMDE) (talk) -- 14:02, 2 March 2018 (UTC)
Some special pages already do have their MW API counterparts, so they should be easily converted. As for the rest of them, it probably wouldn't be a bad idea to go through them and assess their utility as APIs. Mobrovac-WMF (talk) 19:42, 28 February 2018 (UTC)
  • PREFER-JS: JS-enabled environments are preferred. Not all user workflows have to be supported on non-JS clients. However, see NO-JS.
  • NO-JS: For clients with no (sufficient) JS support, we still need to support a) consumption of all content, including meta-data, b) editing of text-based (but not non-text) page content c) basic (but not all) curation. Among other things, this means we need to be able to serve complete static HTML pages from index.php.
Curation in this context I believe refers to simplistic talk page and page revision workflow interactivity from a noscript/JS-impaired context. Do I understand correctly? Also, I think I heard someone say that search ought to be supported (I think the basis of this is for hi-sec browser configurations where the wiki is accessed directly instead of through an external discovery (e.g., search engine) service). ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
Yes, basic search should be supported.
"Curation" here should cover things like watch/unwatch, protect/unprotec, undo/rollback, and perhaps also user blocks. -- Daniel Kinzler (WMDE) (talk) 14:02, 2 March 2018 (UTC)
  • SOME-JS: We can require either server side JS support (NODE) or client side JS support. We do not have to provide full functionality in a situation where neither the server nor the client can execute JS.
  • NO-CLIENT-PARSE: We won't render actual Content (like wikitext) on the client. We want a single source of truth for rendering wikitext (and any other content type) as HTML.

Needs[edit]

Straw-man list of needed high level components to fulfill the above goals and desires under the given assumptions:

  • COMPOSE: Page composition at the network edge (see ESI), backed by an API for serving HTML fragments (see FRAGMENTS), needed to satisfy NO-JS while also supporting JS-UI.
    • This should be used at least to combine the parts of the skin with the page content.
    • Usage of this mechanism is optional (to satisfy LAMP). The index.php entry point still needs to deliver a fully composed page per default, so MW is usable without an ESI capable caching layer.
    • This requires a dependency tracking and puring engine (see PURGE).
    • It would perhaps be useful to support additional massaging/hydration, beyond what ESI supports. This would allow use to do localization here, as well as adapt for client devices.
    • ESI requires that the caching layer needs to be able to predict what kind of content it will be getting by looking at the request. This means e.g. that Special pages that serve non-HTML output would not be possible or would have to be especially registered.
    • If NO-SW wouldn't block it, we could also combined page composition and template rendering in a layer below CDN.
  • FRAGMENTS: Fragments of HTML (page content, skin parts, special page forms, etc) can be requested from a dedicated endpoint (may be part of or separate from the REST API). Needed by COMPOSE, which is needed by NO-JS. Things that expose HTML fragments are mainly: the skin, ContentHandlers, input forms for Special pages and action handlers, results (listings) on for Special pages and action handlers, dynamic content for some page types (file pages, category pages).
    • Must at least serve all bits needed for the skin and rendered page content (including special pages, action handlers, etc)
    • Could also serve bits of composite page content (e.g. infoboxes)
    • Could also be used as a template rendering API, if it accepts complex data structures as input
  • TEMPLATES: ONE-UI requires the ability to render declarative templates on the client as well as on the server, in PHP and JS. This probably means the template engine has to be implemented in PHP and in JS (implied by NO-V8 and REST and STATIC).
    • Needs L10N-JS for templates to be fully data driven.
    • Alternatively, templates could use pre-formatted data: this would probably require an API mode that doesn't return abstract JSON nor rendered HTML, but some kind intermediate form (which may be annotated HTML). This could be interpreted as being a "view-model" in the sense of an MVVM architecture.
    • If NO-SW wouldn't block it, we could also combined page composition and template rendering into a single service.
  • L10N-X2: To satisfy L10N-JS and TEMPLATES, we probably need a full JS port of the relevant l10n formatting code. NO-V8 implies that we'd otherwise need to call out to a PHP based L10N API. This is probably too slow, but should still be investigated. See also Messages API support in JavaScript.
  • PURGE: Unified dependency tracking and purging mechanism to enable FRAGMENTS and REST.
    • Using Kafka as a bus and some graph database for storing the dependency graph.
What about LAMP? Are the event-driven characteristics fulfilled in basic LAMP? My sense is the cross-project and deeply nested template needs of the Wikimedia projects are different than basic LAMP. But can dependency tracking and purging be supported only in the sophisticated installation without requiring it in LAMP or making the stock PHP MediaWiki installation dramatically more complex? ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
Ah, right, I didn't mention this explicitly: the idea is to have an SQL based baseline implementation for LAMP. That should be fine for up to 100k pages and a few edits per minute or so. -- Daniel Kinzler (WMDE) (talk) 14:06, 2 March 2018 (UTC)
    • Used to re-generate HTML snippets, JSON data, and other derived artifacts.
    • This makes it easy to introduce new kinds of artifacts or change granularity, without having to implement a tracking and updating mechanism for each use case.
  • MVC: JS framework that maps between a data model and the DOM, and manages API calls to the backend (MVC/MVVM). Needed for JS-UI, and should be backed by REST.
    • This kind of framework is designed for a fully data driven environment, with all rendering done in JS. However, we will still have HTML snippets coming from the backend. At the very least for rendered page content. These snippets need to be integrated into the DOM, and they may need massaging/hydration.
    • This requires TEMPLATES and L10N-JS.
    • NO-DATA and NO-CLIENT-PARSE means this needs FRAGMENTS.
  • ROUTE: A common REST API interface for functionality implemented in PHP, JS, or whatever other language.
    • Clients should not be aware of where and how an API is implemented (as per REST)
    • Routing should be possible in the CDN layer / load balancer.
    • Routing should also be possible inside MW core, so it is available without a CDN layer.
    • The new API should map to the existing action API for most if not all cases, to we don't have to re-implement all API functionality.
  • ASSET: An endpoint for asset delivery (next generation ResourceLoader) is needed for JS-UI / MVC.
    • For JS and CSS resources, associated icons
    • For localization resources (message bundles)
    • Should make aggressive use of caching on all levels.
    • Not for embedded media (probably), that should have it's own end point (httpd, nginx, swift...)
I'd add here that asset compilation would be an important step to modernize at the same point. ABaso (WMF) (talk) 16:28, 28 February 2018 (UTC)
You mean minification and bundling? -- Daniel Kinzler (WMDE) (talk) 14:06, 2 March 2018 (UTC)
Yes, this general body of things and advancing closer to the state of the art (e.g., webpack). – ABaso (WMF) 2 March 2018

Notes[edit]

Output synthesis (page composition/rendering) can be organized into the following layers:

  1. Full HTML page / DOM
  2. HTML snippets (exposed by API, used JS and ESI)
  3. Pre-formatted data (view-model / annotated HTML, exposed by API, used for template rendering)
  4. external data model, exposed by API
    1. For meta-data: JSON, exposed by API, needs L10N-aware formatting
    2. For page content: annotated HTML
  5. internal data model
    1. For meta-data: PHP, not exposed
    2. For page content: native format, e.g. wikitext, exposed by export. Exposed by API for text-based formats.

Rendering Special pages can be HTML based (legacy) or data driven (modern):

  • HTML based special pages should generate annotated "semantic" HTML, somewhat similar to the output of Parsoid, that allows easy massaging for different target devices.
  • Data driven Special pages are just glue that applies template rendering to data returned by an API call. The template rendering would happen on the server or the client, as need be.