Topic on User talk:Daniel Kinzler (WMDE)/MCR-SlotRoleHandler

SlotRoleHandler::getRawHtml()

5
Duesentrieb (talkcontribs)

During yesterday's meeting, concerns where raised that the proposed logic of getRawHtml() is backwards: we really want SlotRoleHandler::getSlotParserOutput() to generate ParserOutput from a Content object, and RenderedRevision::getSlotParserOutput() to provide lazy/cached access to that.

That does seem much more straight forward, but it conflates two things:

  • (A) the ParserOutput object for the slot in isolation, which is used to get the necessary DataUpdates and a combined LinksUpdate for updating the database. This ParserOutput would also be used for single-slot display.
  • (B) and the HTML representing the slot's content to be used in the combined ParserOutput for the revision, and any associated meta-information that needs to go into the HTML head or HTTP header.

In the the trivial case, (B) is based directly on (A). The HTML comes from getRawText(), and the meta-info comes from getHeadItems() and friends. Also trivially, (B) could simply be empty, hiding the output.

But we want to give SlotRoleHandler full control over how the content is represented, e.g. by wrapping it. We would need a mechanism to get the "solo" ParserOutput for a slot's content as well as the "for-combined-output" ParserOutput, which may or may not be based on the "solo" output. And at least the "solo" output needs to be cached, which cannot (easily) be done inside the SlotRoleHandler.

The current approach is to provide (lazy, cached) access to the solo output to SlotRoleHandler by passing a SlotParserOutputProvider, and then let the SlotRoleHandler decide if the HTML for use in the combined output should be based on the solo output at all, and if yes, how that HTML should be modified.

Part of the confusion stems from the fact that ParserOutput represents (at least) two very different kinds of information: output for display (HTML, but also head items, HTTP cache parameters, etc) and output for storage in the database (secondary data updates, dependency tracking info, backlinks). It may be useful to have separate interfaces for these aspects. Let's for now call them RenderedContent and ContentTrackingData.

If we want SlotRoleHandler to be the source of ParserOutput objects, we'd expect method signatures like this:

  • getContentTrackingData( Content $cont, ParserOptions $po ): ContentTrackingData
  • getRenderedContentForSoloDisplay( Content $cont, ParserOptions $po ): RenderedContent
  • getRenderedContentForCombining( Content $cont, ParserOptions $po ): RenderedContent

In most cases, all three would actually return the same ParserOutput object. And typically, at least two of them would be called when rendering a revision. We'd want to re-use that ParserOutput object, so we'll have to implement caching logic inside SlotRoleHandler (or in a ParserCache-like service used by SlotRoleHandler).

Also, passing the Content object is not enough: in the future, the rendering of one slot may depend on the content of another slot (cross-slot translcusion). So we would have to pass in the RevisionRecord instead of a single Content object. And since RevisionRecord::getContent() does audience checks, we'd have to pass in parameters for that as well.

It seems to me like having the SlotRoleHandler act as the source of ParserOutput seems straight forward, it's not feasible with the in-place caching approach: The ParserOutput is also needed for a different purpose in a different context (recording tracking info in the DB), and the same ParserOutput object should be re-used when generating combined HTML.

The issue boils down to the question of whether we want to introduce a ParserCache-like caching service for rendered content. Unlike ParserCache, that new service will have to be aware of slot names and revision IDs as well as per-user private/privileged content, and will have to work for unsaved content too. Only with such a mechanism in place can we have a more straight forward interface for SlotRoleHandler. As long as we stick with in-place caching, we need getRawHtml to take some kind of SlotParserOutputProvider that does the caching and lazy initialization.

EDIT: I'll poke at the code a bit, maybe it's not so bad afterall...

Anomie (talkcontribs)

Another possibility is that SlotRoleHandler is used to produce the ParserOutput for combined display, and for solo display you go to the Content object directly. That, of course, means that we'd limit solo display of a slot's content to be just a display of the Content rather than anything more fancy. Do we have any use cases that require something more fancy?

And since RevisionRecord::getContent() does audience checks, we'd have to pass in parameters for that as well.

We don't have to. As I pointed out elsewhere, such checks could be defined as being done at a higher level and SlotRoleHandler could just use RevisionRecord::RAW.

It seems to me like having the SlotRoleHandler act as the source of ParserOutput seems straight forward, it's not feasible with the in-place caching approach: The ParserOutput is also needed for a different purpose in a different context (recording tracking info in the DB), and the same ParserOutput object should be re-used when generating combined HTML.

I don't see why you say those two uses of the same ParserOutput object require the SlotParserOutputProvider approach.

The issue boils down to the question of whether we want to introduce a ParserCache-like caching service for rendered content. Unlike ParserCache, that new service will have to be aware of slot names and revision IDs as well as per-user private/privileged content, and will have to work for unsaved content too.

Why would it have to do all that?

ParserCache is already aware of revision IDs, and adding roles wouldn't be very difficult. "per-user private/privileged content" and "unsaved content" seem unnecessary to me.

Duesentrieb (talkcontribs)

So, after poking at the code a bit, it seems like we can have the SlotRoleHandler as the source of ParserOutput objects that are then cached by RenderedRevision if we can make the following assumptions:

  • The per-slot ParserOutput we want to cache in-place is always the one to be used for the combined view. The ParserOutput for a single slot view needs no caching, nor does the no-html ParserOutput.
  • The ParserOutput returned for the combined view contains all the tracking info, not just the tracking info for things visible in that view. In particular, if the content is invisible in the combined view (the HTML is empty), all tracking info should still be present in the ParserOutput.

For the code in DerivedPageDataUpdater, the first assumption holds. The second assumption will need to be enforced via the contract of SlotRoleHandler.

I will update the spec accordingly. I'm still a bit worried that we may miss things if the "solo" ParserOutput isn't available via RenderedRevision. But so far, it doesn't seem to be a problem.

Tgr (WMF) (talkcontribs)

IMO the easiest approach is to discourage dealing with the handler directly. Everything should go through RenderedRevision, which caches (for a specific ParserOptions, which also defines the audience) the revision, the raw ParserOutputs, the wrapped ParserOutputs and the combined ParserOutput. Putting getContentTrackingData on RenderedRevision (or making RenderedRevision a parameter for it) is inelegant but not the end of the world.

Duesentrieb (talkcontribs)

Yea, this is looking good in a code experiment.

Reply to "SlotRoleHandler::getRawHtml()"