Parsoid/Extension API

Raw thought dump about how extensions interface with Parsoid.


 * In this first pass, we are only looking at extensions that implement implement tag handlers. We have support for extensions that implement content handlers as well and we will update this page.
 * In the Parsoid world, extensions will NOT get direct access to the wikitext engine. All interaction happens through an API and hooks (primarily transformation hooks and less pipeline event hooks). Ex: init, toDOM, fromDOM, postProcessDOM.
 * Extensions will interact with Parsoid via wikitext strings and DOM trees, i.e. they will convert wikitext strings to DOM trees or vice versa, or transform DOM trees in place.
 * Extensions should not expect to maintain global document state within the extension where ordering matters. Parsoid does not guarantee the order in which repeated occurrences of the same extension tag will match the order in which they are seen on the page (for ex: because of concurrent / asynchronous execution). Nor should extensions assume that they will be invoked for every instance that is seen in wikitext (for ex: because we reuse parsed content from a cache). So, this means, global state like counters cannot be reliably maintained by the extension. Extensions can get access to the fully processed DOM of the page which they could inspect to reconstruct ordering.
 * Since Parsoid transforms wikitext to HTML and HTML to wikitext, extensions need to be aware of the HTML to wikitext transformation. Parsoid provides a basic default HTML to wikitext transformation based on information encoded in the data-mw attribute during the wikitext to HTML transformation. However, if extensions intend to provide custom editing support in editing clients like Visual Editor, they should provide handlers that can inspect edited HTML for their extensions and convert it back to appropriate wikitext.

Extension registration and configuration
Parsoid will use the same extension registration interface that core uses. However, Parsoid will only recognize those extensions that implement the  interface. Currently, this interface has exactly one method:. The config object is an associate array with the following fields currently:


 * : The name of the extension
 * : If an extension implements extension tags (ex: Cite implements and :  Style modules that this extensions exports and need to be included in the list of modules on the page.
 * FIXME: Should this be a per-extension-tag configuration, vs a per-extension configuration?
 * FIXME: Should this be a more generic modules property vs. being a styles property?
 * : If an extension needs to inspect the global document after is it is constructed, extensions are expected to register a DOM processor that is invoked by Parsoid. This DOM processor is expected to transform the DOM in place as appropriate. To future proof against new DOM transformations that Parsoid might support for extensions, domProcessors is an associative array that maps a transformation to an implementation class that implements an interface. Currently, only the  transformation is supported.

ExtensionTag abstract class
Implementations of extension tags should extend the  abstract class. This class provides four methods:,  ,  ,. Extensions are only expected to implement the toDOM method at the very least (otherwise, what is this extension tag even doing?). Parsoid takes care of annotating the output DOM fragment returned by the toDOM method so that it can be appropriately converted back to wikitext. Please look at the docs for this class for more specific details about these methods.

Configuring extension tags
The extension tag config object is an associative array with the following fields currently:


 * : The name of the tag
 * : The class that provides the implementation of this tag. This class should extend the  abstract class as above
 * : This options block dictates how the DOM fragment returned by the  method should be handled. Currently, only one option exists. The vast majority of extensions will not need this.
 * : By default, Parsoid takes the DOM fragment returned by the  method and splices it into the parent document in the appropriate place. However, if   is , Parsoid will instead leave a marker instead and store the fragment in a map. It is expected that the extension's   DOM processor will appropriately deal with these DOM fragments and manipulate them. For example, the Cite extension relies on this to migrate the ref's fragments to the references section and leave behind a citation that is appropriately globally numbered.
 * : This options block influences Parsoid's HTML to wikitext transformation. Given that extensions might implement their own  implementation, these options primarily influence how the generated wikitext interacts with its context. Currently, only one option exists. FIXME: Should this be called   instead?
 * : By default, the wikitext from converting the HTML is rendered inline. However, if extensions specify a  value for this property, the wikitext output is rendered on its own separate line.

Parsoid API for extensions
As part of implementing the various methods (toDOM, fromDOM, etc) for extension tags and the DOM post processors, extensions might need access to certain kinds of information or functionality. For example, extensions that intend to handle wikitext as part of their implementation will rely on Parsoid to convert that wikitext to HTML. Or, they might need access to configuration information for the wiki, or the page. Or, they might need to log error messages or metrics. The  class provides this API. Please look at the docs for this class for specific details about the interface. But, here are a few high level observations.

... to be completed ...