Topic on Talk:Parsoid/Extension API

Can domProcessors generate new DOM that might need processing?

9
Anomie (talkcontribs)

For example, Cite would certainly need to be collecting all the <ref> "placeholders" from the DOM, injecting ref numbers (and setting hrefs and ids if not whole new nodes) into them, and then producing some new <ol> and <li> nodes (and nodes for the backlinks) to inject into the DOM for the <references> tag. It might even need to generate new DOM for error messages, like "reference Foo was used but never defined". What if some other extension wants to transform all the <ol>, or collect all the anchors in the page, or all error messages, or something? If that extension's processor happens to run before Cite's, it wouldn't find the ones Cite adds.

And it's possible that Cite might want to be even smarter: if there's no <references> tag, there's not much point in doing toDOM on the content of all the <ref>s. Or if multiple <ref>s collide, there's not much point in doing toDOM on both when only one will be used. So it might like to wait on doing the toDOM for each ref's contents until it knows that ref will actually be going into the page output. Is that allowed? Or does it have to do toDOM on all the refs' contents anyway even though some might be thrown away?

Anomie (talkcontribs)

I see the documentation for tags touches on this, by mentioning that Parsoid's implementation of Cite uses "sealFragment" to have the contents of the reference in a map that seems to not be part of the parent document. So if the sealed fragment contains content that some other extension's domProcessor needs to process...? Or, for that matter, the same extension's domProcessor (e.g. nested refs).

SSastry (WMF) (talkcontribs)
SSastry (WMF) (talkcontribs)
SSastry (WMF) (talkcontribs)

Not quite ... I think you are asking about DOM fragments in the map, not just in internal data-mw attributes. I think that is probably a bug / gap in Parsoid right now. Interestingly, we found all these gaps during the porting and were using hybrid testing and had to sniff out all the places HTML was hiding so that we could properly update offsets. But, we didn't get the sealed fragments bit covered in the extension API itself.

SSastry (WMF) (talkcontribs)

The first one is the hook ordering / global-transforms ordering problem that I mentioned in the other topic .. and which we need to resolve separately. Haven't thought about it but need to first understand what the current behavior is. https://github.com/wikimedia/parsoid/blob/90d0f45209175f8313540c15a5be37a658fcc0a1/src/Wt2Html/DOMPostProcessor.php#L254-L312 is a longish comment hinting at one possibility at how to solve this.

As for the second one, we could conceivably support this lazy processing scenario. And, it is possible that Cite can do it today without changes by adding additional smarts. It would for example, have to deal with `shiftDSROffsets` potentially. But, to be conservative, I will say, that we haven't considered this lazy processing scenario carefully, but I think it is doable since the model we are going for here is to be able to take the output of an extension and plop it into the top level document. It shouldn't matter how or when the output was generated as long as suitable DSR offset shifts are handled properly.

Anomie (talkcontribs)

Yes, that long comment is exactly what I was asking about! I'm satisfied to see it's already on the radar.

Tgr (WMF) (talkcontribs)

You can't really do lazy processing if you want to support partial renderings / context-free-ness, can you? If a page with a bunch of refs gets transcluded into another one which has a references tag, that should work without re-rendering the transcluded page.

Anomie (talkcontribs)

Why would that be a problem for lazy processing? You'd just have to make sure that the "map" containing the unprocessed wikitext for each ref came along with the transclusion somehow, so when the domProcessor runs over the trancluding page's DOM and finds those refs that it can get still their wikitext to process.

And that's assuming the transclusion works by pulling in a serialized DOM rather than processing the transcluded page's wikitext to DOM afresh for the transclusion.

Reply to "Can domProcessors generate new DOM that might need processing?"