Topic on Talk:Parsoid/Extension API

Summary by SSastry (WMF)

Config now uses ObjectFactory spec

Anomie (talkcontribs)

getConfig() seems early to instantiate processors that may not actually be needed. I'd expect either:

  • getConfig() signals which kinds of processors the extension will provide, and some other method is called to supply those processors when they actually are needed.
  • getConfig() includes ObjectFactory specs, which are instantiated when Parsoid actually needs them.

Something else to consider is whether an extension might want to provide multiple processors for a transformation. It may be more logical to do that than to have to do multiple transformations within just one class.

SSastry (WMF) (talkcontribs)

Good thought about eager instantiation of processors. I'll have to ponder which one is more appropriate.

As for multiple processors, I'm trying to understand the use case. This actually touches upon another bit that I am in the process of adding to the page which is the matter of ordering of the global dom processors across extensions. Not sure how MediaWiki core handles ordering issues among hooks, but we haven't yet figured out how to tackle that. We allude to this problem in a long code comment in DOMPostProcessors.php but haven't really thought through it.

But, assuming all processors registered by an extension are run at the same time, the extension can internally orchestrate the ordering and which processors it wants to run instead of registering multiple processors and having the API orchestrate the order. One use case I can imagine for your proposal is if extensions get a mechanism to specify priority, then processors registered by the same extension might get interleaved with those of other extensions. But, barring that, it seems simpler to provide a single entry point per global DOM transformation.

Anomie (talkcontribs)

MediaWiki mostly ignores ordering issues among hooks, unfortunately. As you observed elsewhere, it's usually the case that hooks don't actually collide. And for Parser.php hooks in particular, extensions most often just maintain internal state and produce output during the first pass rather than producing placeholder output and clean it all up in a later pass, which is exactly what we don't want for Parsoid. But the ordering question seems more relevant to the "Can domProcessors generate new DOM that might need processing?" topic rather than this one.

As a use case for multiple processors... Maybe MobileFrontend might serve as an example. One processor that runs through all the links to mangle them from "xx.wikipedia.org" to "xx.m.wikipedia.org", one to reorder the lead paragraph and infobox, one to hack out navboxes, and so on. It might make for cleaner code for those to actually be separate processors, rather than having one processor that does all of those things at once (or one processor that internally calls multiple processors, with every extension reinventing its own way of doing that).

From Parsoid's point of view, MobileFrontend having multiple processors would be no different from multiple different extensions having one processor each. The only difference would be that "wt2htmlPostProcessor" would hold an array of implementations (usually a 1-element array) rather than specifying only one.