Parsoid/API

From MediaWiki.org
Jump to: navigation, search

Parsoid converts MediaWiki's Wikitext to XHTML5 + RDFa and back.

In addition to the API defined below, the Parsoid service provides some form-based debugging tools at /. These are subject to change and may disappear at any time.

Common HTTP headers supported in all entry points[edit]

Accept-encoding 
Please accept gzip.
Cookie 
Cookie header that will be forwarded to the Mediawiki API. Makes it possible to use Parsoid with private wikis. Setting a cookie implicitly disables all caching for security reasons, so do not send a cookie for public wikis if you care about caching.
x-request-id 
A request id that will be forward to the Mediawiki Api.

v3 API[edit]

Common path parameters across all requests[edit]

domain
The hostname of the wiki.
title
Page title -- needs to be urlencoded (percent encoded).
revision
Revision id of the title.
format
Input / output format of content - wikitext, html, or pagebundle
wikitext
Plain text that is treated as wikitext. Content type is text/plain.
html
Parsoid's XHTML5 + RDFa output, which includes inlined data-parsoid attributes. The HTML conforms to the MediaWiki DOM spec. Content type is text/html.
pagebundle
A JSON blob containing the above html with the data-parsoid attributes split out and ids added to each node. Content type is application/json.

Pagebundle blobs have the form,

{
  "html": {
    "headers": {
      "content-type": "text/html;profile='mediawiki.org/specs/html/1.0.0'"
    },
    "body": "<!DOCTYPE html> ... </html>"
  },
  "data-parsoid": {
    "headers": {
      "content-type": "application/json;profile='mediawiki.org/specs/data-parsoid/0.0.1'"
    },
    "body": {
      "counter": n,
      "ids": { ... }
    }
  }
}

Common payload / querystring parameters across all requests[edit]

body_only
Optional boolean flag, only return the HTML body.innerHTML instead of a full document.
scrub_wikitext
Optional boolean flag, which normalizes the DOM to yield cleaner wikitext than might otherwise be generated.

GET[edit]

Wikitext -> HTML[edit]

GET /:domain/v3/page/:format/:title/:revision?

revision
Revision is optional, however GET requests without a revision id should be considered a convenience method. If no revision id is provided, it'll redirect to the latest revision.
format
One of html or pagebundle

Some querystring parameters are also accepted: body_only

POST[edit]

The content type for the POST payload can be: application/x-www-form-urlencoded, application/json, or multipart/form-data

Wikitext -> HTML[edit]

POST /:domain/v3/transform/:from/to/:format/:title?/:revision?

from
wikitext
format
One of html or pagebundle

The payload can contain,

{
  "wikitext": "...",  // if omitted, a title is required to fetch wt source
  "body_only": true,  // optional
  "original": {
    "title": "...",  // optional, and instead of in the path
    "revid": n,  // optional, and instead of in the path
  }
}

Some other fields exist (including previous for expansion reuse). See Parsoid's API test suite for their use.

HTML -> Wikitext[edit]

POST /:domain/v3/transform/:from/to/:format/:title?/:revision?

from
One of html or pagebundle
format
wikitext

The payload can contain,

{
  "html": "...",
  "scrub_wikitext": true,  // optional
  "original": {
    "title": "...",  // optional, and instead of in the path
    "revid": n,  // optional, and instead of in the path
    "wikitext": "...",  // optional, but the following three provide original data used in the selective serialization strategy
    "html": "...",
    "data-parsoid": { ... }
  }
}

Parsoid serializes HTML to a normalized form of wikitext. In order to avoid "dirty diffs" (differences outside the edited region of content) when serializing HTML generated from a given wikitext source, pass in the revision (either as revision in the path or original.revid in the payload) and optionally (as an optimization, because Parsoid will fetch / generate them if they're missing) the source, original.wikitext, and unedited html, (original.html and original['data-parsoid']). This strategy is known as "selective serialization"; an example of which can be seen in the test suite.

Wikitext -> Lint[edit]

POST /:domain/v3/transform/wikitext/to/lint

Parsoid also exposes an API to get wikitext "syntax" errors for a given page, revision or wikitext.

The payload can contain:

{
  "wikitext": "...",  // if omitted, a title or revision is required to fetch lint errors
}

Examples[edit]

For more intricate examples, see Parsoid's API test suite.

Wikitext -> HTML[edit]

GET[edit]

Some simple GET requests to a Parsoid HTTP server bound to localhost:8000.

http://localhost:8000/en.wikipedia.org/v3/page/html/User:Arlolra%2Fsandbox/696653152

Returns text/html

http://localhost:8000/en.wikipedia.org/v3/page/pagebundle/User:Arlolra%2Fsandbox/696653152?body_only=true

Returns application/json

POST[edit]

POSTing the following blob,

{
  "wikitext": "== h2 =="
}

to,

http://localhost:8000/localhost/v3/transform/wikitext/to/html/

returns,

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head ...>...</head><body data-parsoid='{"dsr":[0,8,0,0]}' lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body mw-body-content mediawiki" dir="ltr"><h2 data-parsoid='{"dsr":[0,8,2,2]}'> h2 </h2></body></html>

HTML -> Wikitext[edit]

POST[edit]

POSTing the following blob,

{
  "html": "<html><body>foo <b>bar</b></body></html>"
}

to http://localhost:8000/localhost/v3/transform/html/to/wikitext/ returns

foo '''bar'''

Wikitext -> Lint[edit]

POST[edit]

POSTing the following blob

{
  "wikitext": "<div/>"
}

to http://localhost:8000/localhost/v3/transform/wikitext/to/lint returns

[
  {
    "type": "self-closed-tag",
    "params": {
      "name": "div"
    },
    "dsr": [
      0,
      6,
      6,
      0
    ]
  }
]

Older entry points[edit]

These versions have been deprecated.