Reading/Web/Projects/NewMobileWebsite/Technical overview

From mediawiki.org

Introduction[edit]

This document intends to explain the technical approach to accomplish the requirements for the project outlined in the documentation. See the following sections for links to background information, goals and an overview of the proposal. At the end of the document you can find frequently asked questions and a glossary to answer and address common doubts that are not addressed anywhere else.

Background documentation[edit]

Project page on mediawiki.org/wiki/Reading/Web/Projects/NewMobileWebsite

Goals[edit]

The goal is to build a new mobile website for Wikimedia projects with a focus on better handling of poor connectivity scenarios and enabling possible offline reading experiences in the future.

More in detail, derived from the proposal:

  1. Optimizing for a modern browsing experience
  2. Client side connection management and offline resiliency. Remain functional with good UX and smooth transitions during short bursts of low connectivity or loss of network connection
  3. Data efficiency, for low end network connections
  4. Providing the ability to implement a fully offline reading experience in the future
  5. Inform decisions for other Wikimedia platforms/products in the future. For example:
    1. Encourage creation and design of more and better content APIs and services
    2. API driven frontends
    3. New performance patterns and frontend tooling
    4. Possibilities of evolution for the desktop website

Overview[edit]

Rendering and UI[edit]

In order to fulfill the Client side connection management and offline resiliency goal, the client side of the application in the browser needs to be able to render content and UI without relying on a server, since the user may interact with the UI without a network connection. On the other hand, only rendering on the client would mean a big loss of performance when loading the page for the first time, one of our most important use cases at the moment. To stay performant, a server side render is needed. On our current platform, we rely on JavaScript for rendering UIs and content in the client, and PHP for rendering UIs and content on the server.

Given we need to have rendering and UI code in both the client and the server, there are several options to consider in order to fulfill our objectives:

  • Duplicate the rendering code on the client
    • This entails keeping in sync two different full rendering paths, which seems to double the maintenance and new developments for rendering features, at a significant risk of becoming desynchronized, or diverging in terms of features, or UI differences
  • Write the client rendering code so that it can be used in the server rendering
    • The client side rendering path needs to exist for the offline goals. Given it needs to be implemented, if we could use that same render pipeline for the UIs and content in the server context, we could keep a single source of truth for the UI rendering, thus keeping the maintenance and new development cost from duplicating and being able to deliver our goals

We think that given the client rendering is going to need to be implemented for any features, we should focus on the second option. By coding the client rendering being aware that it needs to run in client and server environments from the start, we can avoid duplicating the render code twice in different languages. We can structure the client side render and UI code in a way that is environment agnostic, so that most of it is common (also called isomorphic or universal).

As proposed, we will be setting a clean separation of responsibilities (routing, rendering, API fetching and state management) in the same language (JS) so that the pieces can be reused in the server and client rendering. Effectively meaning:

  • The website will be a server side rendered website, to serve very fast and minimal HTML with inlined critical CSS to provide content to the clients as fast as possible
  • The website will be enhanced as a web application in the client after the initial content has loaded, to provide the basic connection management features, and...
    • Install Service Workers where available in capable clients to provide advanced connection management features and (maybe in the future) full offline support and an installable website (also called, Progressive Web Apps[1])
  • As much of the client code as possible needs to be reused for the server rendering (UI renderers, data models and fetching, etc).

See related RFC in the #Links section

Content sources[edit]

Both the server and browser client will rely on the same services (REST and action API) for content consumption and will be in charge of rendering the data. As much as possible, these services should be shared between platforms, like the MediaWiki desktop and mobile experiences, and the mobile apps. By sharing content sources we all benefit from the same improvements and avoid having to repeat the same efforts for each different platform.

Any client built this way needs a coherent set of APIs with good caching semantics built in (optimized for high volume reads).

This would allow us to more easily build modern browsing experiences and better UX. Some examples of UX experiences powered by services are Page previews (Desktop), Related articles (Mobile web), the Feed and modern article reading experience (Android and iOS native apps).

Specifically regarding page content, we will use services that provide wiki content based on Parsoid HTML, that will be shared across the different reading experiences (iOS, Android and this mobile web) so that different reading optimizations benefit all platforms the same.

For the prototype phase, currently the existing available service is MCS (Mobile Content Service) serves Parsoid HTML and page meta data for production usage on the Android apps. It serves and stores the data only as JSON, and we can use it in the prototyping phase to get started and measuring things. In the future we will migrate to whatever service is blessed for getting Parsoid markup content for reads.

Servers[edit]

To run the JS rendering layer on the server to generate HTML, we consider using Node.js[2] for the following reasons:

  • Most used JS VM/Environment for running JavaScript on the server
  • Widely used in the industry and open source communities
  • Wikimedia has experience developing and deploying Node.js[2] services that have proven useful
  • Enables shared use of libraries via npm for both server and client by using node based bundling tools

The servers ought to be stateless, acting just as a rendering layer for the content coming from other sources, with the same code and sources that the client will run, to ensure a consistent experience between the first HTML load and the subsequent client side renders.

Caching[edit]

From the infrastructure perspective[edit]

For the server rendered HTML of the full page, the caching story is the same as it is right now with MobileFrontend.

For subsequent visits, as the full page HTML may not be needed, the content could be consumed directly for caching and rendering on the client. This means that there is the risk that we could end up storing the same content in two different formats (HTML and JSON) thus incurring in wasted space. This is a valid and important concern, and it depends on how the Content sources are organized and how they provide the information. We will remain adaptable to the content sources while encouraging them to be better for the users and our infrastructure.

For example, one of the topics that will be proposed and discussed is keeping the HTML content separate from the page's meta data, so that they don't need to be merged and thus duplicate in HTML and JSON forms, but instead be consumed separately as needed in HTML and JSON. These will be subject of further discussions separately from this one on future RFCs.

From the client's perspective[edit]

All resources should be effectively marked to leverage the browser's HTTP cache as much as possible.

For full page HTML, the cache (Varnish[3]) should follow the same techniques we do right now on the mobile website. The content sources should follow caching best practices for services, regarding Cache-Control and ETags, like existing services do (the ones consumed for page previews for example) for better performance and caching for the clients.

Specifically, for this proposal, besides following good practices on the Node server for the headers, we will improve on the delivery of static assets. We plan on following a content hash based naming schema the front-end static assets, so that they can be treated as immutable URLs, and leverage optimizations like Cache-Control: immutable for better caching on the browsers, and to avoid the "stale HTML, fresh static assets"[4] issues we face in the current infrastructure. If the HTML refers to the static assets by a hash versioned immutable name, it won't load with new assets as those will have a new name based on their contents.

Further improvements to the client's autonomy by leveraging service workers where available to satisfy the connection management features will result in great caching and UX for recurring visitors. Repeat visits will benefit from better UX, perceived and real speed, making it an even better experience for logged in users and other recurring readers. Static assets and the client rendering and application will be served immediately after first visit from the browser's local cache, while updating to new versions in the background ready for the next visit.

Data efficiency[edit]

As a project with a big emphasis on new readers countries, data efficiency is a very important topic over which we will need to pay attention continuously. In an offline resilient application, there is always a balance to strike between the utmost data efficient solution, and the most offline capable one, and the specifics of the implementation vary. The range of variables is huge, and careful consideration, measurements and clear requirements are needed before making the right tradeoffs. Device capabilities (storage), network speeds, data costs, navigation patterns and user personas we will need to be considered and balanced -and properly documented- to drive implementation and derive most user value. In principle, we hope to create a versatile architecture that can be adapted to performance and product requirements as we develop. As tooling is set up to measure and with research data and product requirements we will shape the features optimizing for the required use cases.

Specifically, we can talk about the implications for the architecture and the assets, and then separately about the content sources.

Static assets[edit]

In order to have the client download as little code as possible for the specific page they are visiting, we will leverage code splitting and lazy loading aggressively. The resulting chunks of static assets will be named with a hash of its contents to leverage HTTP caching, and to avoid re-downloads of the same assets on subsequent visits (see Caching - From the client's perspective above). This will make the client only download the assets it needs initially, and only download more of the code as they browse around. They will not download assets they don't use.

All of this should happen transparently through tooling, which will traverse the front-end assets, both JS and CSS and create the asset graph automatically, and create split bundles based on configuration and the import statements in code.

We will also set up CI tooling to check static asset sizes, and changes in the deployables to avoid merging changes that would implicitly create unbalances on the sizes of the static assets and explode the size of them and worsen performance.

Content[edit]

How to deliver content to users, in terms of how much to deliver initially, how much on the background, or after a user action, is something that will need to be considered with the variables mentioned above. Depending on navigation patterns, connection speeds, UX and others we will have to balance how or what to deliver. Some of the variables also change depending on situation, for example, the connection speed changes for different users, so it is entirely possible that we could have different strategies for different connection scenarios, if the UX makes sense and it benefits the users. For example, for lazy loading images, you could imagine having the automatic loading as we do now on the mobile website for decent connection speeds, but change to tap-to-load if the connection type is very low grade, or even depending on user preferences. This kind of decisions for the features will have to be informed with performance metrics and user testing to make the right ones.

Some examples of efforts to reduce the data sent to the client that we can expect to be discussed and maybe implemented would include:

  • With prior history and implementations:
    • Lazy loading images
    • Lazy loading references
  • With research but in need of more documentation and POC:

Of course every implementation of a data saving feature comes with its own set of tradeoffs against other features -for example, like lazy loading images vs printing articles-, specifically about offline resiliency and offline browsing. If you remove content, like images, to load them on demand, what happens when the user loses connection mid way through browsing an article? As mentioned before, there is a delicate balance to strike and the UX will be crucial in these cases where we opt into data efficiency at the cost of offline browsing.

Architecture[edit]

Taking into account what has been mentioned above in the Overview:

  • Server
    • Stateless Node.js server
      • Based on express, following WMF best practices (see service-template-node)
      • Thin wrapper around the common code
        • Provide server specific interfaces
  • Client
    • Thin wrapper around the common code
      • Provide client specific interfaces
    • Service worker features
  • Common code
    • Code that is environment agnostic and can run in Node.js[2] and browsers without failing
    • Composed of:
      • Router
      • Views (UI + Content rendering)
      • API client layer (REST + MW API)

Roadmap[edit]

This is a high level operational roadmap/approach to the development of the project for the next year. More details about the product roadmap can be found in the Roadmap outline section of the product proposal. The current status is 1), we started not long ago. We consider 1 and 2 critical before any sort of production deployments to deem the approach viable. Steps on the roadmap are subject to adjustment or restructuring based on other factors and the product roadmap.

  1. Prototype development
    1. Base infrastructure and prototype development
      1. Basic node.js[2] server and client application sharing routing, rendering and data fetching code
      2. Page content rendering using existing services (POC)
    2. Staging server setup
      1. Cloud VPS server under wmflabs autoupdating on master to showcase and test the prototype
    3. Performance tooling setup
      1. Navigation timing dashboards
      2. WebPageTest runs with specific navigation patterns, devices and connection speeds
    4. Project documentation and socialization
      1. Narrow down the documentation
      2. Requests for comments and sharing
    5. Research plan for future production deployment
      1. How would the project be deployed
      2. What are the prerequisites and how/when should they happen (security review, etc)
  2. Prototype optimization and evaluation
    1. Based on the tooling set up, initial optimizations and documentation about the impact of the changes
    2. Initial evaluation phase.
      1. What works well
      2. What needs work
      3. Are there insurmountable obstacles
      4. Roadmap and product plan review
  3. Prototype user testing and evaluation
    1. Set up user testing tooling
      1. User feedback feature
      2. Application usage analytics
      3. Opt in and out flow
    2. Other basic reading/browsing features for the user testing (search)
    3. Controlled and limited user tests
    4. Evaluation phase
      1. What works well
      2. What needs work
      3. Are there insurmountable obstacles
      4. Roadmap and product plan review
  4. Focus on connection management features and user flows, and repeat 3) Prototype user testing and evaluation
  5. See product roadmap for next steps. All going well, feature development work should start following this kind of iterative model (work, measure, evaluate)

Q&A[edit]

Page rendering, document composition and JSON / HTML APIs in the client[edit]

Because of the need to support the most used devices on mobile web (given a big percentage doesn’t implement Service Workers, -see Safari here- and those devices probably won’t stop being used for some time), right now we need to focus on implementing a solution that gives the best possible experience to both non-service-workers devices (iOS) and service-workers devices (Android). We don’t want to settle on only server-side composition on non-Service Worker devices. Modern iOS devices (for example) are capable and should be able to compose documents on the client too.

As such, initially, we lean towards consuming JSON APIs because JSON parsing is extremely optimized on all browsers, which is very important on mobile devices.

Special attention will be put into rendering the long form content streaming and as fast as possible into the document to be as performant as possible, as our most used use case. Both first visit (cold and warm cache) and other navigation scenarios will need to be optimized after tooling is set up for measuring. In non-service-worker capable clients, there are approaches (iframe hack [1], streaming ND-JSON [2]) for getting content streamed into the window if the performance and UX measurements deem it necessary.

Using Service Workers for document composition on server and client[edit]

In the future, when most of the devices support Service Worker, we will explore performance optimizations for using HTML APIs instead and extracting metadata and content in an efficient way, streaming and composing documents on the fly using Service Workers both in client and server. 

Learnings and use for future improvements to the desktop site[edit]

The project focuses on the mobile website given that it’s reduced scope and requirements make it feasible to improve it over a short-medium timespan (1-2 years). The choices and technology will inform future improvements for the desktop site, regarding API driven frontends, use of Service Workers technology, inlining critical CSS, using Parsoid content for reads, and other various topics. Hopefully, we can end up bringing the same sort of improvements adapted for the desktop web to also improve the experience.

What happens with MobileFrontend?[edit]

There is no specific plan at the moment, until this project advances and we know what it is good for and if it is successful. Once we know how this project is going, there are many things we can do. I’ll list a few in no particular order (and of course there are many other options):

  • Put it in maintenance mode, no new development will happen
  • Work on aggressively trimming Wikimedia specific parts, and give it to the community as the mobile experience for third party wikis
  • Refactor it to be the grade C experience for old browsers and Opera mini / UC browser removing all the JS
  • Split out a Minerva skin (already happened T71366) and deprecate support for the MobileFormatter in favor of REST services/upstream changes to core (e.g. section wrapping in the parser)

Root domain and abstraction of language switching[edit]

Could we consider making the new website more “app-like” by abstracting away the project domains much like the iOS and Android apps? In this model, the new website would be hosted somewhere generic like: “https://mobile.wikipedia.org/” or “https://lite.wikipedia.org/”. And language switching would happen with a control (much like it does now in the website).

Related: Is a goal to provide a unified experience of all wikimedia projects? Only wikipedias?[edit]

Would we provide a single project app e.g. use the app to render content from all Wikimedia projects such as other languages/other projects e.g. wiktionary? This is a topic that needs discussion and exploration as the project evolves. Refer to the parent proposal for discussion.

Which APIs will be used from REST services and which from API.php[edit]

This project is agnostic about which APIs or where they live. It definitely needs a coherent set of APIs with good caching semantics built in (optimized for high volume reads).

At this moment in time and in the near future, we will be relying on existing API.php APIs and REST services. For example, the Page Content Service REST service which is on the way promises to deliver parsoid content for reads, optimized and with well structured metadata, so it seems like a great fit for this project, as it will be for the native apps.

General guidelines of “when necessary and when it makes sense” apply to choosing which APIs and services to use. In consultation with the Reading teams (infrastructure, apps) and other Wikimedia teams like Services team, Platform team, Editing teams, we'll need to evaluate what we need, where to consume it, and if we need new services to be developed. Some general principles: 

  • High volume read APIs will usually be consumed from the REST layer. Things like page content, page summary, etc. given the existing caching semantics optimized for high volume reads.
  • Endpoints that may be needed that don’t exist right now (like page history for example) will have to be created if the volume of reads is high enough.
  • PHP APIs ready for high volume consumption (like search) we may consume from the existing endpoint and only wrap around a REST service if it is needed and makes sense, and always as a shared effort across the Reading teams so that the API can be used for the different platforms.
  • Non-cacheable APIs we’ll consume from API.php unless a REST version exists (like /page/random).

How is this going to work with the editing experience?[edit]

We have met with the Editing department and discussed different options. Initially in the prototyping phases, we would use the existing mobile web editing experience, which gives the Editing team a stable target to aim at.

If the project advances successfully, when it starts getting out of prototype versions we’ll start collaborating in order to bring the mobile visual editing experience to this project. Given VisualEditor core is a standalone and portable piece of software, and that it has been integrated into different targets (desktop, flow, mobile incoming), ending up integrating it into the specifics of the project as a first class part of it would be the best final outcome and experience for users.

Why do it this way? What alternatives have been considered?[edit]

See Reading/Web/Projects/NewMobileWebsite/Technical overview#Overview for rationale for the specific approach. Besides the Node.js[2] server renderer, the rest of the work, common code and client application could definitely be used on top of existing infrastructure, like a extension, existing or new one. The problem would be having to duplicate the rendering/routing/fetching work on the server part, assuming a different rendering output for first server render and subsequent renders, or making sure the client rendering follows the server render and enduring the maintenance work for server changes and the double cost of development for new rendering changes on server & client. That is the main reason why having a server sharing code with the client seems like a good approach, which entails having a server in JavaScript, which means at this point in time using Node.js[2], as the best supported, production ready, JavaScript runtime for servers.

Other approaches considered have been:

  • Work on this approach from within the existing mobile web
    • This would entail working within MobileFrontend and MinervaNeue having to adapt and refactor the existing rendering pipeline to consume parsoid based HTML from services in addition to adding parsoid based HTML rendering on the client stack. A significant amount of additional work was identified to re-architect the existing client UI code to allow for offline resiliency and connection management features, essentially to decouple it from the server infrastructure given it is code that needs to run and be independent of the server when offline or with spotty connections. Based on estimations on the amount of re-architecting work, and the amount of future duplicate work on the two rendering stacks, and considering the complexity of the end result, we discussed and concluded that there are better approaches
  • Create a new extension+skin for this mobile website
    • Having the UI rendering pipeline in the server with PHP and in the client with JS
    • Having a common single rendering layer by having an external rendering layer on JS
      • that the PHP server consumes (either via external server, or via embedding it on the PHP server with v8js or similar tools) and the client also uses
        • This was considered, and it seemed too complex, and that by implementing the rendering layer in JS for server use we would basically be doing the same thing as the proposal with the Node server, but adding more complexity and possible problems in the server layer for no clear benefit, so it was parked to explore the node server idea.
  • Create a parallel offline resilient client-only UI on top of MobileFrontend+MinervaNeue which would be loaded in the background with service workers and only active on the second visit onwards
    • This was discussed, and it is viable, but it entails working on two rendering stacks, which if kept in sync introduce a lot of duplication in the work and maintenance costs, and if left to be different experiences, would introduce a big cognitive dissonance for mobile web users, by interacting with one experience some times, and with a different one other times, leading to confusion and discomfort. It also would introduce unnecessary data download in the background from non-shared assets from the server context and the client one, increasing data costs for the users unnecessarily

Where is rendering done? How easily can we switch between client side and server side rendering of components?[edit]

The server’s job is to use the common code to render the full page HTML. Page components and dynamic behavior on the client will be rendered on the client. Right now we’re not thinking of rendering standalone parts of the page server side, although it would be possible given the code is common/isomorphic. The initial idea is the client will be in charge of rendering dynamic parts/components using the content sources and common code.

How do we offset the cost of maintaining multiple skinning/rendering systems?[edit]

We already effectively maintain multiple skinning/rendering systems. Desktop is different from Mobile web, and there are another two systems, iOS and Android. This project is creating another Mobile web rendering system, so in the long term, if successful it should replace the existing Mobile web rendering system so that we don't incur in duplicate costs (maintenance, hardware, etc).

Also, being an API driven front-end, it will be using common services between different rendering systems, which means efforts and work done to improve those content sources is unique and benefits all rendering systems at once as work happens on only one place. In the current state, features using the same content have been needed to be implemented across iOS, Android and mobile web (and some times desktop), resulting on the same work being done three times. With this architecture we unite to benefit of the improvements on the content sources and de-duplicate work. A recent example (of many) has been page summaries, for Page previews on desktop, and link previews on all mobile platforms, where using the same APIs from all rendering systems has unified a lot of the work for all platforms.

For what could/would happen with the existing mobile web rendering layer (MobileFrontend + Minerva) we don't have clear answers right now, as it would be just guessing, but we've tried clarifying some possible options at #What_happens_with_MobileFrontend?

How do we handle content that is rendered depending on the user's interface language?[edit]

In this architecture, the UI and the content are separate, as it is an API driven rendering layer. As such, when requesting the content from the sources/services we will be able to specify the language (by choosing which project to query, as well as sending the language the user has the UI in) to appropriately retrieve the content in the desired language, while the UI will be rendering in whatever language the user has selected. Careful implementation will be needed to ensure different projects and multi-language wikis (commons, mediawiki.org, etc) are properly queried and handled.

How are mobile-specific special pages (like MobileSearch) handled?[edit]

These UI pages will need to be rewritten to adapt them to the user needs on the connection management scenarios (so that they can render on the client only, and handle network interruptions or delays gracefully). Same way that mobile platform renderers will need to reimplement the rendering adapted to their use cases. There are a few of these, like search, watchlist, etc. so we will need to ensure we have proper content sources for these so that this project and other rendering systems like native apps can also render them. In some cases, the endpoints are already available and could be used directly from the MediaWiki action API, in others new action APIs will need to be exposed, or REST services created (see #Which APIs will be used from REST services and which from API.php for some more info).

How are other important MediaWiki special pages handled?[edit]

There are a lot of special pages with useful information that are exposed and rendered in MediaWiki. It would be impossible and naive to think that we will be implementing rendering for all of these pages, that would not make sense. As a website, we will redirect and link to the MediaWiki version of these on a mobile friendly skin (MinervaNeue, or maybe responsive Vector in the future), so that all existing content can be accessed. Special pages that are widely used and could benefit from specific improvements for mobile web users or offline resiliency would be considered for implementation on this rendering layer if the improvements and user value justify the effort.

Case studies[edit]

  • Twitter Lite
    • Designed for "slow, unreliable, limited, or expensive" connectivity
    • Uses progressive loading to get quick initial experience and lower data costs
    • Uses ServiceWorker technology to account for users with bad, unreliable connections
  • WordPress Calypso
    • Single interface across platforms
    • Faster performance
    • Attracted volunteer developers
  • Treebo
    • 70%+ improvement in time to first paint
    • Loaded faster in slower connections

Notes and References[edit]

  1. ↑ PWA (Progressive Web App): a term used to denote web apps that uses the latest web technologies. Progressive web apps are technically regular web pages (or websites) but can appear to the user like traditional applications or (native) mobile applications.
  2. ↑ 2.0 2.1 2.2 2.3 2.4 2.5 Node.js: an open-source, cross-platform JavaScript run-time environment for executing JavaScript code server-side. See https://nodejs.org/
  3. ↑ Varnish: an HTTP accelerator.
  4. ↑ "stale HTML, fresh assets": When an HTML document links to CSS and JS assets by name without any hashes that differentiate the versions of such assets. If HTML has a different TTL than the assets, as is the case in our MediaWiki infrastructure -where the HTML is cached longer (2-14 days) than the assets (5 minutes)- it happens that you can get in the client browser old/stale HTML and new/fresh static assets that can malfunction if they depend on a specific HTML structure and break the experience. It is a problem that bits us often when writing features that build on server rendered content and that makes rolling out these changes quite painful and bug prone.

Links[edit]