Parsoid/API

From mediawiki.org
Jump to navigation Jump to search
On Wikimedia wikis, Parsoid's API is not accessible on the public Internet. On these wikis, you can access Parsoid's content via RESTBase's REST API (e.g.: https://en.wikipedia.org/api/rest_v1/ ).

Parsoid provides the following REST API endpoints to Parsoid's clients to convert MediaWiki's Wikitext to XHTML5 + RDFa and back.

Common HTTP headers supported in all entry points[edit]

Accept-encoding
Please accept gzip.
Cookie
Cookie header that will be forwarded to the Mediawiki API. Makes it possible to use Parsoid with private wikis. Setting a cookie implicitly disables all caching for security reasons, so do not send a cookie for public wikis if you care about caching.
x-request-id
A request id that will be forward to the Mediawiki Api.

v3 API[edit]

Common path parameters across all requests[edit]

domain
The hostname of the wiki.
title
Page title -- needs to be urlencoded (percent encoded).
revision
Revision id of the title.
format
Input / output format of content - wikitext, html, or pagebundle
wikitext
Plain text that is treated as wikitext. Content type is text/plain.
html
Parsoid's XHTML5 + RDFa output, which includes inlined data-parsoid attributes. The HTML conforms to the MediaWiki DOM spec. Content type is text/html.
pagebundle
A JSON blob containing the above html with the data-parsoid attributes split out and ids added to each node. Content type is application/json.

Pagebundle blobs have the form,

{
  "html": {
    "headers": {
      "content-type": "text/html;profile='mediawiki.org/specs/html/1.0.0'"
    },
    "body": "<!DOCTYPE html> ... </html>"
  },
  "data-parsoid": {
    "headers": {
      "content-type": "application/json;profile='mediawiki.org/specs/data-parsoid/0.0.1'"
    },
    "body": {
      "counter": n,
      "ids": { ... }
    }
  }
}

Common payload / querystring parameters across all formats[edit]

For wikitext -> HTML requests[edit]

body_only
Optional boolean flag, only return the HTML body.innerHTML instead of a full document.

For HTML -> wikitext requests[edit]

scrub_wikitext
Optional boolean flag, which normalizes the DOM to yield cleaner wikitext than might otherwise be generated.

GET[edit]

Wikitext -> HTML[edit]

GET /:domain/v3/page/:format/:title/:revision?

revision
Revision is optional, however GET requests without a revision id should be considered a convenience method. If no revision id is provided, it'll redirect to the latest revision.
format
One of html or pagebundle

Some querystring parameters are also accepted: body_only

POST[edit]

The content type for the POST payload can be: application/x-www-form-urlencoded, application/json, or multipart/form-data

Wikitext -> HTML[edit]

POST /:domain/v3/transform/:from/to/:format/:title?/:revision?

from
wikitext
format
One of html or pagebundle

The payload can contain,

{
  "wikitext": "...",  // if omitted, a title is required to fetch wt source
  "body_only": true,  // optional
  "original": {
    "title": "...",  // optional, and instead of in the path
    "revid": n,  // optional, and instead of in the path
  }
}

Some other fields exist (including previous for expansion reuse). See Parsoid's API test suite for their use.

HTML -> Wikitext[edit]

POST /:domain/v3/transform/:from/to/:format/:title?/:revision?

from
One of html or pagebundle
format
wikitext

The payload can contain,

{
  "html": "...",
  "scrub_wikitext": true,  // optional
  "original": {
    "title": "...",  // optional, and instead of in the path
    "revid": n,  // optional, and instead of in the path
    "wikitext": "...",  // optional, but the following three provide original data used in the selective serialization strategy
    "html": "...",
    "data-parsoid": { ... }
  }
}

Parsoid serializes HTML to a normalized form of wikitext. In order to avoid "dirty diffs" (differences outside the edited region of content) when serializing HTML generated from a given wikitext source, pass in the revision (either as revision in the path or original.revid in the payload) and optionally (as an optimization, because Parsoid will fetch / generate them if they're missing) the source, original.wikitext, and unedited html, (original.html and original['data-parsoid']). This strategy is known as "selective serialization"; an example of which can be seen in the test suite.

HTML -> HTML[edit]

POST /:domain/v3/transform/pagebundle/to/pagebundle/:title?/:revision?

Parsoid exposes an API which transforms Parsoid-format HTML (encapsulated as a page bundle) to itself, performing a number of possible transformations. T114413 discusses some of the transformations, both actual and potential.

The payload is of the form:

{
  original: {
    html: {
      headers: {
        'content-type': 'text/html; charset=utf-8; profile="https://mediawiki.org/wiki/Specs/DOM/1.2.1"'
      },
      body: '<html>...</html>'
    }
  },
  updates: {
    transclusions: ...,
    media: ...,   // Could specific the exact image to update later.
    redlinks: { ... },
    variant: { ... }
  }
}

The original field is a pagebundle blob, as described above.

The updates field specifies the desired transformations, which are described in more detail below.

Redlinks[edit]

XXX: write me

Variant[edit]

See T43716.

XXX: write me

Content up/downgrade[edit]

XXX: write me

Wikitext -> Lint[edit]

POST /:domain/v3/transform/wikitext/to/lint/:title?/:revision?

Parsoid also exposes an API to get wikitext "syntax" errors for a given page, revision or wikitext.

The payload can contain:

{
  "wikitext": "...",  // if omitted, a title or revision is required to fetch lint errors
}

Examples[edit]

For more intricate examples, see Parsoid's API test suite.

Wikitext -> HTML[edit]

GET[edit]

Some simple GET requests to a Parsoid HTTP server bound to localhost:8000.

http://localhost:8000/en.wikipedia.org/v3/page/html/User:Arlolra%2Fsandbox/696653152

Returns text/html

http://localhost:8000/en.wikipedia.org/v3/page/pagebundle/User:Arlolra%2Fsandbox/696653152?body_only=true

Returns application/json

POST[edit]

POSTing the following blob,

{
  "wikitext": "== h2 =="
}

to,

http://localhost:8000/localhost/v3/transform/wikitext/to/html/

returns,

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head ...>...</head><body data-parsoid='{"dsr":[0,8,0,0]}' lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body mw-body-content mediawiki" dir="ltr"><h2 data-parsoid='{"dsr":[0,8,2,2]}'> h2 </h2></body></html>

HTML -> Wikitext[edit]

POST[edit]

POSTing the following blob,

{
  "html": "<html><body>foo <b>bar</b></body></html>"
}

to http://localhost:8000/localhost/v3/transform/html/to/wikitext/ returns

foo '''bar'''

Wikitext -> Lint[edit]

POST[edit]

POSTing the following blob

{
  "wikitext": "<div/>"
}

to http://localhost:8000/localhost/v3/transform/wikitext/to/lint returns

[
  {
    "type": "self-closed-tag",
    "params": {
      "name": "div"
    },
    "dsr": [
      0,
      6,
      6,
      0
    ]
  }
]

Content Negotiation[edit]

Accept
text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.0.0"

When making a parse requests (wikitext->HTML), passing an Accept header defining an acceptable spec version will induce Parsoid to return HTML that satisfies that version, following Semantic Versioning caret semantics, or error with a 406 status code.

Older entry points[edit]

These versions have been deprecated.