Parsoid/API

Parsoid provides the following REST API endpoints to Parsoid's clients to convert MediaWiki's Wikitext to XHTML5 + RDFa and back.

Common HTTP headers supported in all entry points

 * Accept-encoding : Please accept gzip.
 * Cookie : Cookie header that will be forwarded to the Mediawiki API. Makes it possible to use Parsoid with private wikis. Setting a cookie implicitly disables all caching for security reasons, so do not send a cookie for public wikis if you care about caching.
 * x-request-id : A request id that will be forward to the Mediawiki Api.

Common path parameters across all requests

 * domain
 * The hostname of the wiki.


 * title
 * Page title -- needs to be urlencoded (percent encoded).


 * revision
 * Revision id of the title.


 * format
 * Input / output format of content - wikitext, html, or pagebundle
 * wikitext
 * Plain text that is treated as wikitext. Content type is text/plain.
 * html
 * Parsoid's XHTML5 + RDFa output, which includes inlined data-parsoid attributes. The HTML conforms to the MediaWiki DOM spec. Content type is text/html.
 * pagebundle
 * A JSON blob containing the above html with the data-parsoid attributes split out and ids added to each node. Content type is application/json.

Pagebundle blobs have the form,

For wikitext -> HTML requests

 * body_only
 * Optional boolean flag, only return the HTML body.innerHTML instead of a full document.

For HTML -> wikitext requests

 * scrub_wikitext
 * Optional boolean flag, which normalizes the DOM to yield cleaner wikitext than might otherwise be generated.

Wikitext -> HTML

 * revision
 * Revision is optional, however GET requests without a revision id should be considered a convenience method. If no revision id is provided, it'll redirect to the latest revision.


 * format
 * One of html or pagebundle

Some querystring parameters are also accepted: body_only

POST
The content type for the POST payload can be:,  , or

Wikitext -> HTML

 * from: wikitext
 * format: One of html or pagebundle

The payload can contain,

Some other fields exist (including  for expansion reuse). See Parsoid's API test suite for their use.

HTML -> Wikitext

 * from
 * One of html or pagebundle


 * format: wikitext

The payload can contain,

Parsoid serializes HTML to a normalized form of wikitext. In order to avoid "dirty diffs" (differences outside the edited region of content) when serializing HTML generated from a given wikitext source, pass in the revision (either as  in the path or   in the payload) and optionally (as an optimization, because Parsoid will fetch / generate them if they're missing) the source, , and unedited html, (  and  ). This strategy is known as "selective serialization"; an example of which can be seen in the test suite.

HTML -> HTML
Parsoid exposes an API which transforms Parsoid-format HTML (encapsulated as a page bundle) to itself, performing a number of possible transformations. T114413 discusses some of the transformations, both actual and potential.

The payload is of the form:

The  field is a pagebundle blob, as described above.

The  field specifies the desired transformations, which are described in more detail below.

Redlinks
XXX: write me

Variant
See T43716.

XXX: write me

Content up/downgrade
XXX: write me

Wikitext -> Lint
Parsoid also exposes an API to get wikitext "syntax" errors for a given page, revision or wikitext.

The payload can contain:

Examples
For more intricate examples, see Parsoid's API test suite.

GET
Some simple GET requests to a Parsoid HTTP server bound to.

Returns text/html

Returns application/json

POST
POSTing the following blob,

to,

returns,

POST
POSTing the following blob,

to  returns

POST
POSTing the following blob

to  returns

Using CURL, this works well, replace "LinterTest" with the appropriate wikipage and this will go to the most recent version using the -L follow redirect option

Produces:

Content Negotiation

 * Accept

When making a parse requests (wikitext->HTML), passing an  header defining an acceptable spec version will induce Parsoid to return HTML that satisfies that version, following Semantic Versioning caret semantics, or error with a   status code.

Older entry points
These versions have been deprecated.


 * Parsoid/API/v2
 * Parsoid/API/v1