Page Content Service

From MediaWiki.org
Jump to navigation Jump to search

The Page Content Service (PCS) is a set of node-based services in Wikimedia production designed to deliver Wikimedia project page content and metadata for modern reading clients. It delivers:

  1. Optimized page content for modern clients to provide a highly polished full article reading experience
  2. A standard structured representation for pages that can be used for display within lists and previews
  3. Additional metadata about a page that can be used for navigation purposes, business logic, or constructing ancillary views in native code (like the table of contents)
  4. Aggregated common CSS used for styling articles
  5. Business logic as JS that clients can execute locally

Some additional features are:

  • Consolidates client logic for manipulating and styling page content on the server and executes it reducing code maintenance and technical debt for clients
  • Consolidates data from disparate services into a single purpose-built service for displaying page content

The PCS delivers content in both HTML and JSON formats. It consolidates data from the Wikipedia, Commons and Wikidata MediaWiki APIs as well as the Parsoid and ORES API.

The service will supersede the mobile-sections endpoint of the Mobile Content Service (MCS). Currently, the PCS services code is part of the MCS Git repo. Eventually those will be separated so they can be deployed separately from MCS.

These services are maintained by the Wikimedia Reading Infrastructure team.

Endpoints[edit]

HTML endpoints[edit]

/page/mobile-html[edit]

This API is designed to be used with the JSON endpoints below to build a modern client experience. It provides HTML from Parsoid but optimized for the mobile apps, preprocessing what can be done server-side:

  • Additions:
    • This includes the addition of a few extra DOM elements to add information, like the page title, (probably) lead image, Wikidata description, a footer, edit icons to sections and the top of the page, etc…
  • Removals:
    • Remove navboxes.
  • Changes/Lazy loading:
    • stripReferenceListContent: Reference list(s) from the end of articles are replaced with a placeholder. This info can be retrieved separately using the References JSON API. The plan here is to have it lazy loaded, in place of the placeholder inside the WebView or in a native component.
    • Images are replaced with elements. This requires the client side to replace the placeholders back to the original <img> elements.

Extra DOM transformations done are:

  • move first paragraph to the top
  • collapse infoboxes
  • classify DOM elements to make it easier to apply other background colors for theming support

A full list of DOM transformations, as they are currently implemented, can be found in the processing script processing/mobile-html.yaml.

The HTML also includes the other CSS and JS endpoints. It aims to leverage the wikimedia-page-library as much as possible.

Examples: Prod | Beta cluster | Labs | Local RB | Local MCS

JSON endpoints[edit]

/page/summary[edit]

The Summary serves two very important purposes:

  1. It provides the data necessary for the representation of a page within a page/link preview, search results, other lists, etc…
  2. It provides basic metadata necessary for clients to make business logic and navigation decisions before displaying a page.

To accomplish number 1, it contains some basic metadata: an image/thumbnail, a description, the first paragraph of the page plain text and HTML form (extract and extract_html), and article language and directionality (RTL or LTR). It's preferable to use the extract_html over extract since some complex formulas are better handled with HTML than plain text.

To accomplish number 2, it contains some semantic information on the page, its name space, and various URLs in order for clients to understand the content of the page prior to deciding how to display it.

Additionally, the Summary structure is provided in other APIs (like the feed) that return lists of pages.

Page_Previews/API_Specification

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

For comparison, here is the action=query request this endpoint replaces: Prod. In the current version TextExtract is not used anymore, though. Instead PCS gets more of the information from the respective Parsoid HTML output and does some transformations on that.

/page/metadata[edit]

The Metadata API returns additional metadata needed for updating the chrome around a page, like the edit icon, and for displaying ancillary views like the table of contents and "other languages" that the page is available in.

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

/page/media[edit]

We're about to publish the new /page/media-list endpoint.

Lists media items shown on a page: images, videos, and audio along with licensing information. This is useful for clients wishing to build a gallery interface for content within a page.

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

More details

/page/media-list[edit]

Coming soon... The difference to /page/media is that this endpoint is faster by reducing the amount of metadata it has to request from the backend.

Lists media items shown on a page: images, videos, and audio along with licensing information. This is useful for clients wishing to build a gallery interface for content within a page.

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

/page/mobile-html-offline-resources[edit]

List of the CSS and JS schemeless URLs for offline resourcing for mobile consumption. The motivation for this endpoint is to let native clients know what other files they would have to download when saving a page for offline without having to parse the page. They may not have a WebView running when saving for offline.

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

Example output:

[
  "//meta.wikimedia.org/api/rest_v1/data/css/mobile/base",
  "//meta.wikimedia.org/api/rest_v1/data/css/mobile/pagelib",
  "//meta.wikimedia.org/api/rest_v1/data/javascript/mobile/pagelib",
  "//en.wikipedia.org/api/rest_v1/data/css/mobile/site"
]

/page/references[edit]

A structured output of reference lists. Useful for

  • allowing a quick lookup of the reference details of a particular reference on a page and
  • providing a (potentially multiple) lists of references.

More details

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

CSS endpoints[edit]

Starting task: phab:T188919

/data/css/mobile/base[edit]

General CSS rules. This is roughly also available through ResourceLoader modules but this is using its own copy of less files to be decoupled from upstream changes. Some of them are modified to remove rules we don't need or want. More info in the repo and at Update base CSS.

Example URLs:

/data/css/mobile/site[edit]

Site specific CSS. This CSS is community maintained, and comes from the MediaWiki:Mobile.css page of the respective project.

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

/data/css/mobile/pagelib[edit]

CSS rules to complement the DOM transformations implemented in the wikimedia-page-library. (We could combine this with the base CSS endpoint. For now we keep them separate since the versioning and update cycle is different.)

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

JS endpoint[edit]

/data/javascript/mobile/pagelib[edit]

This is common JavaScript functionality available to clients. Clients can use it to change the display of the mobile-html page. To try this out load any mobile-html example page and open the browser's DevTools console. Then run any of the commands listed in the PCS JS abstraction layer docs.

Example URLs: Prod | Beta cluster | Labs | Local RB | Local MCS

Clients[edit]

PCS can be used by any WMF or 3rd party client that wants to display page content for reading contexts. As mentioned above the /page/summary endpoint is heavily used in other places and already use by the native apps and the web PagePreview feature. /page/mobile-html has some coupling to the wikimedia-page-library and is somewhat tied to design decisions for the native WMF apps. If needed there could be another HTML endpoint that sits somewhere between Parsoid HTML and /page/mobile-html.

Within the WMF, the following clients are expected to integrate use of /page/mobile-html in 2019:

  1. Wikipedia Android App (the Android app would be the first so we can transition from mobile-sections to mobile-html)
  2. Wikipedia iOS App

External links[edit]

  • Usage documentation can be found at the API spec