Page Content Service
The Page Content Service (PCS) is a set of node-based services in Wikimedia production designed to deliver Wikimedia project page content and metadata for modern reading clients. It delivers:
- Optimized page content for modern clients to provide a full article reading experience
- A standard structured representation for pages that can be used for display within lists and previews
- Aggregated common CSS used for styling and theming articles
The PCS delivers content in both HTML and JSON formats. It consolidates data from the Wikipedia, Commons and Wikidata MediaWiki APIs as well as the Parsoid and ORES API.
The service will supersede the mobile-sections endpoint of the Mobile Content Service (MCS). Currently, the PCS services code is part of the MCS Git repo. Eventually those will be separated so they can be deployed separately from MCS.
These services are maintained by the Wikimedia Reading Infrastructure team.
Provides Parsoid HTML with a few key differences:
- This includes the addition of a few extra DOM elements to add information, like the page title, Wikidata description, a footer, edit icons to sections and the top of the page, etc…
- Reference lists are replaced with placeholders to improve initial load time. When references are needed by the client, they can be retrieved using the JSON API for references, /page/references
- Images are replaced with elements to improve initial load time. Images are lazy loaded as the user scrolls to them
- The lead content paragraph is moved above the first infobox
- Infoboxes are collapsed with an interface to allow users to expand them as they read
The Summary serves two very important purposes:
- It provides the data necessary for the representation of a page within a page/link preview, search results, other lists, etc…
- It provides basic metadata necessary for clients to make business logic and navigation decisions before displaying a page.
To accomplish number 1, it contains some basic metadata: an image/thumbnail, a description, the first paragraph of the page plain text and HTML form (
extract_html), and article language and directionality (RTL or LTR). It's preferable to use the
extract since some complex formulas are better handled with HTML than plain text.
To accomplish number 2, it contains some semantic information on the page, its name space, and various URLs in order for clients to understand the content of the page prior to deciding how to display it.
Additionally, the Summary structure is provided in other APIs (like the feed) that return lists of pages.
For comparison, here is the
action=query request this endpoint replaces: Prod. In the current version
TextExtract is not used anymore, though. Instead PCS gets more of the information from the respective Parsoid HTML output and does some transformations on that.
Lists media items shown on a page: images, videos, and audio. This is useful for clients wishing to build a gallery interface for content within a page or for downloading images for offline reading.
List of the CSS and JS schemeless URLs for offline resourcing for mobile consumption. The motivation for this endpoint is to let native clients know what other files they would have to download when saving a page for offline without having to parse the page.
A structured output of reference lists. Useful for
- allowing a quick lookup of the reference details of a particular reference on a page and
- providing a (potentially multiple) lists of references.
PCS can be used by any WMF or 3rd party client that wants to display page content for reading contexts. As mentioned above the
/page/summary endpoint is heavily used in other places and already use by the native apps and the web PagePreview feature.
/page/mobile-html has some coupling to the wikimedia-page-library and is somewhat tied to design decisions for the native WMF apps. If needed there could be another HTML endpoint that sits somewhere between Parsoid HTML and
Within the WMF, the following clients are expected to integrate use of
/page/mobile-html in 2019:
- Wikipedia Android App
- Wikipedia iOS App
- Usage documentation can be found at the API spec