User talk:Daniel Kinzler (WMDE)/MCR-SlotRoleHandler

About this board

Comments on the May 30 rewrite

22
Anomie (talkcontribs)

SlotRoleHandler seems mostly ok. What is a "MessageLocalizer", and what exactly is the "heading text" that it is supposed to determine? ... Ah, apparently it's a new micro-interface for IContextSource for some reason. It's still not clear what "heading text" it's supposed to determine.

SlotRoleRegistry seems mostly ok, although it seems to be different in a few ways from what was discussed at the Hackathon: we had "getHandler" taking a SlotRecord and "getDefaultHandler" taking $role and $title, and didn't have "getDefaultContentModel" at all (presumably leaving that to the handler itself?).

SlotParserOutputProvider seems rather pointless to me. If you really want a micro-interface for that, let's drop the talk about having an implementation other than RevisionRenderer. I'd also say there's little point in making some non-ParserOutput interface around ParserOutput.

RenderedRevision probably also needs getters for the RevisionRecord and ParserOptions. Why is "RevisionRenderer" not the obvious answer to "what constructs this"?

RevisionRenderer: $forUser can't be just a User object. I doubt it's needed at all (access restrictions should probably be checked before getting a rendering rather than being the responsibility of the renderer), but if it is needed we'd need options for "public" and "raw" too (cf. CentralIdLookup's $audience or RevisionRecord's $audience + $user).

SlotOutputCombiner seems like another useless micro-interface to me, and even more unnecessary as a concrete class.

Tgr (WMF) (talkcontribs)
  • SlotRoleHandler::getRawHtml: the way it is defined seems somewhat overcomplicated compared to just returning a ParserOutput containing both the HTML and the metadata.
    Also I don't understand the point of passing in a SlotParserOutputProvider - isn't that exactly what the SlotRoleHandler's purpose is, to contain role-specific logic about how a given slot should be rendered? I don't get what circularity is being avoided here.
    Also, the "raw" HTML should not be wrapped in a div etc. There should probably be a separate method for that. The raw HTML will be needed for visual diffs, probably some AJAX calls...
  • SlotRoleHandler::__construct / MessageLocalizer: would the MessageLocalizer include the language or is it just a stateless service? In the first case, different pages might require different rolehandler instances, which seems ugly. In the second case, where does it get passed in?
  • SlotRoleRegistry::defineRole: $defaultModel seems unnecessary. I suppose you want to make it easy to use the same generic instantiator for different roles; I would rather see something more flexible used in place of a callable (maybe an ObjectFactory configuration array, or an extended callable as in Hooks) which can take that extra argument when needed. Same goes for defineCustomRole.
    Also, how does it handle multiple extensions sharing a role? There would have to be some kind of priority parameter (or would it enforce that exactly one instantiator can claim the slot?)
  • SlotRoleRegistry::getRoleHandler: the older draft passed a SlotRecord, which can take the content model into account. I'm not sure that can be avoided.
    Also, as in the older draft, a getDefaultRoleHandler(role, LinkTarget) would be needed, for cases where the SlotRecord does not yet exist. Which also means two different instantiators.
  • SlotRoleRegistry::getDefaultContentModel: should be a method of the SlotRoleHandler IMO.
  • SlotRoleRegistry::isSupportedContentModel: if it's not supported, SlotRoleRegistry::getRoleHandler can just return false (as long as it gets a content model or full SlotRecord). OTOH we might need getSupportedConentModels for something like Special:ChangeContentModel - and for that SlotRoleHandler would need a similar method too.
  • SlotRoleRegistry::getAllowedRoles/getRequiredRoles: these are good but are they enough for the UI layer? It might make sense to add some of the optional roles to the initial edit form (say, categories are not required for a new page, but we might want to hint to the user that they should be provided), which would in addition require something like getSuggestedRoles or getDefaultRoles.
    Also, should these only depend on the title and not the contents of the current revision?
  • RenderedRevision::getParserOutput: nitpick: I liked the name getCombinedParserOutput better.
  • RevisionRenderer::combineSlotOutput: there doesn't seem to be any need to expose this. RenderedRevision can just use a private callback.
  • RevisionRenderer::getRenderedRevision: (nitpick: I'd just call this renderRevision) having both $forUser and ParserOptions::getUser seems like unnecessary duplication.
  • SlotOutputCombiner: seems unnecessary (but given that this is an internal detail of RevisionRenderer and not a public interface, can be discussed later).
Duesentrieb (talkcontribs)

getRawHtml(): If getRawHtml does not usually modify the HTML, then it could just return the ParserOutput object. If it does, constructing yet anouther ParserOutput, with the need to copy over complex state (which is quite tricky, as it turns out), seems overkill. It would make the interface prettier, but it would make the code more complicated, both in inside the method and in the caller.

SlotParserOutputProvider is passed in because it serves as an in-place cache. I added an explanation to the spec.

"raw" here refers to the fact that the modifications that ParserOutput::getText() applies have not been applied. We could also leave wrapping and adding the header to the caller. But then, not much is left of the idea to let the SlotRoleHandler control how the slot is represented on the page.

Perhaps getRawHtml should not exist, and all of this should be left to RevisionRenderer? The SlotRoleHandler could still indicate via getOutputPlacementHints() that the slot should not be shown, or where the output is placed. It could however not provide different output. But perhaps that's a good thing?

MessageLocalizer: MessageLocalizer exists since 1.30 and is compatible with IContentSource. It does not take a language parameter but uses the user interface language. However, the output language can be controleld by the caller using Message::inLanguage().


SlotRoleRegistry::defineRole: If you don't like the arguments, let's just have one method that takes a callable. The idea was indeed that most roles can be defined using the SlotRoleRegistry base class.

Multiple extensions cannot define the same role. defineRole() should fail for a role that is already defined.

A generic handler for the "main" role is pre-defined. Overriding that for specific pages (namespaces) could be supported later, but I don't think we need to work that out in detail just yet. But it does mean that getRoleHandler() needs to know the page title.

getRoleHandler: I decided to try out Brad's idea to let SlotRoleHandler be agnostic of the content model. That means the SlotRoleHandler always depends on the role and the title, but never on the concrete model.

getDefaultContentModel is now in SlotRoleHandler.

isSupportedContentModel is now in SlotRoleHandler. We may have specialized SlotRoleHandlers that can not handle all kinds of content. But then, maybe it's easier to implement fallback behavior inside such slot handlers. Then this method would not be needed.

getAllowedRoles/getRequiredRoles are indeed basic, but should be sufficient for now. In my mind, more advanced control over the edit form should reside in the page type handler.

By the logic that the role handler only depends on the role and the title, the allowed/required roles only depend on the title, not the concrete model.

getParserOutput renamed to getRevisionParserOutput.

combineSlotOutput: good point on not exposing this, and passing it as a callback instead. Thank you, done!

getRenderedRevision: the user used for rendering, and the user used for audience checks, is not necessarily the same. In particular, when passing no user for audience checks (that is, the public audience), we still need a way to say what user should be used when resolving ~~~~.

SlotOutputCombiner: replaced by an anonymous callback.

Anomie (talkcontribs)

I think we'll need to keep isSupportedContentModel on the SlotRoleHandler. Consider TemplateStyles, if it allows for storing the CSS in a non-main slot on the template that slot should only contain sanitized-CSS. Some slot that stores structured data might require a JSON content model (and do something other than Content->getParserOutput() to make the output too) instead of making its own content model.

we still need a way to say what user should be used when resolving ~~~~

Tilde-expansion is part of PST. If something does need a user for some reason, it should get the user from the ParserOptions object.

Tgr (WMF) (talkcontribs)

Aside: what do you think about using mock gerrit patches for interface/architecture discussion in the future? Inline gerrit comments (with their "mark as resolved" option) seem like a better way of keeping track of many small threads of discussion than Flow or wikitext.

SlotRoleHandler::getRawHtml:

> If [SlotRoleHandler::getRawHtml] does [modify the HTML], constructing yet another ParserOutput, with the need to copy over complex state (which is quite tricky, as it turns out), seems overkill. It would make the interface prettier, but it would make the code more complicated, both in inside the method and in the caller.

If getRawHtml were replaced with getParserOutput, it would not need to merge anything AFICS, just clone the ParserOutput it gets from the ContentHandler and modify the HTML. And the caller can just fetch the raw HTML from it and be no worse off than using getRawHtml. And the method would be able to add ResourceLoader modules and whatnot, which might be easily a requirement (and might be different for different roles using the same content type so cannot be punted to the content handler).

> SlotParserOutputProvider is passed in because it serves as an in-place cache. I added an explanation to the spec.

The spec says "This follows the in-place caching approach of the current architecture." but I don't get what that refers to. Is the current equivalent $content->getParserOutput()->getRaw()? I don't see any caching in that. I would expect the new classes to behave similarly - if you want caching, use getRenderedRevision::getParserOutput, not the SlotRoleHandler directly.

> "raw" here refers to the fact that the modifications that ParserOutput::getText() applies have not been applied.

So the assumption is that those transformations will always be applied to the combined slot, and consequently the same options apply to all slots (e.g. no disabling section edit links for one slot but not the others)? That seems reasonable, just want to make it explicit.

Anyway my original point was that we need both a way to get the "naked" HTML (for visual diffs and maybe action=render&slot=foo) and to get the wrapped one.

> Perhaps getRawHtml should not exist, and all of this should be left to RevisionRenderer?

I don't think we can avoid having a method for slot-specific wrapping of the raw content HTML, for adding things like the documentation header (and possibly associated styles/scripts). I can't think of a use case for slot-specific generation of the raw content HTML, I think it's fine to have callers use the Content for that. (This is the kind of limitation that should be called out in the RfC though, to see if someone can spot a use case it would break.)

SlotRoleRegistry::getRoleHandler:

The spec says "Note that the same role may have different handlers depending on the page title" and here you say "defineRole() should fail for a role that is already defined." which seems contradictory. I agree that the name "SlotRoleHandler" kind of implies you get one for each role; but as it is specified now, that seems problematic.

SlotRoleHandler::getRawHtml gets a SlotRecord, so it doesn't make much difference whether different role handlers are returned for different content types, or always the same one and it adapts behavior to different content types via some internal behavior (proxying to a subhandler or whatever). That is not the case for getSlotHeadingText/getOutputPlacementHints. I'm not sure about the latter but changing the heading text / wrapping logic based on content type (or title, if that approach is prefered) seems like a reasonable use case. So either that should be passed to all methods, or it should be passed to getRoleHandler (and then instead of singleton-like handler objects we could have a separate one for each revision slot and it could hold the SlotRecord or LinkTarget as an internal variable).

As for content model agnosticism: using a SlotRecord instead of a LinkTarget is more generic as you can always get a LinkTarget from a SlotRecord (in theory it might require a DB lookup, but it's hard to imagine a situation where the slot record is loaded but the revision record isn't). And whatever use case you have in mind for behavior splitting, it seems like a fragile assumption that it will always be deterministic based on the title.

(Unless your goal is to discourage/disallow using the content model as "mutable" state for a slot. If that's the case, it should be more explicit as other parts of the system will have to be adapted - e.g. what will happen to the content type when the page gets renamed? Plus, you need to special-case the main slot again. )

supported content model:

As I said above, I see two options:

  • content model is expected to be a deterministic option of the role name and title (with special casing for things like Flow in the main slot). No need for isSupportedContentModel as it's basically the same thing as getDefaultContentModel, but there should be something like updateContentOnPageMove($oldTitle, $newTitle) which converts the old content into a new one that has the expected content model.
  • content model is sort of a parameter to the role which acts as mutable state (so an editor can change the slot from documentation<wikitext> to documentation<markdown>). We'll need getSupportedConentModels (not isSupportedConentModel) to tell users trying to change the content model what they can change it to.

The difference between the two options is not huge - you can achieve the same thing with having separate documentation-wikitext and documentation-markdown roles. It will result in different diff behavior though, so IMO the question is, what behavior do we want there? Having a single role (so that diffs work through content type changes) seems like the more user-friendly option to me.

getAllowedRoles/getRequiredRoles:

> In my mind, more advanced control over the edit form should reside in the page type handler.

Anything that can be customizable per role must be in the role handlers, as the page type handler cannot know what roles are added by extensions it is not aware of.

> By the logic that the role handler only depends on the role and the title, the allowed/required roles only depend on the title, not the concrete model.

That is unrelated. Allowed/required flags of course can't depend on their role slot's content model (since these flags are only relevant for roles that do not exist in the current revision) but they could depend on the main content model and on the existence/nonexistence of other roles. None of the use cases listed on the RfC seem to require that, but in general it's not hard to imagine situations in which it is useful.

Also, if these flags depend on the title, what happens when the page is moved and existing slots become forbidden / missing slots become required? The spec addresses that but only superficially. IMO it's one of the questions where we should involve UX experts and consider what kind of user interactions would be needed. Shouldrole handlers be able to prevent a page move when the slot does not make sense under the new title? Do we need some role handler function that provides a nice error message when that happens? (Or not prevent, but show a warning?) Should they be able to do some kind of auto-conversion?

getRenderedRevision:

> In particular, when passing no user for audience checks (that is, the public audience), we still need a way to say what user should be used when resolving ~~~~.

Only for PST. Is there any use case when we pass an unsaved revision to the renderer, and the audience is not the user who created it?

Anomie (talkcontribs)

A danger with mock gerrit patches is that it might encourage getting too far into the code level. We'd have to watch out for that.

I can't think of a use case for slot-specific generation of the raw content HTML,

A slot storing structured data might use a generic JSON content type instead of subclassing it to customize the standalone display. OTOH, there's no reason it couldn't subclass it either.

We'll need getSupportedConentModels (not isSupportedConentModel) to tell users trying to change the content model what they can change it to.

We could do array_filter( ContentHandler::getContentModels(), [ $slotRoleHandler, 'isSupportedContentModel' ] ) to get that list. Or isSupportedConentModel() could be implemented with in_array( $model, $this->getSupportedConentModels(), true ).

Tgr (WMF) (talkcontribs)

> A danger with mock gerrit patches is that it might encourage getting too far into the code level.

I meant mock patches with zero code - just interfaces and abstract classes with signatures and comment docblocks only, and maybe docs/*.txt files. The amount of information presented on MCR-SlotRoleHandler is about right, it's just hard to have granular discussions about it and know which comment is still relevant for the current version of the text.

Duesentrieb (talkcontribs)
Aside: what do you think about using mock gerrit patches for interface/architecture discussion in the future? Inline gerrit comments (with their "mark as resolved" option) seem like a better way of keeping track of many small threads of discussion than Flow or wikitext.

Absolutely! In fact, I need to write code in order to write specs, so that mock already exists: https://gerrit.wikimedia.org/r/c/434544/

I was a bit shy about sharing in, sins Brad complained about me writing code before we have agreed on architecture... it's the hen and the egg all over.

SlotRoleHandler::getRawHtml

I completely changed that, see my comment to brad (or better, see the code). I hope you like it better now.

So the assumption is that those transformations will always be applied to the combined slot, and consequently the same options apply to all slots (e.g. no disabling section edit links for one slot but not the others)?

yes, that's the idea.

The spec says "Note that the same role may have different handlers depending on the page title" and here you say "defineRole() should fail for a role that is already defined." which seems contradictory.

It's really about the main slot being special. This is not completely finished in my head yet, since all the options are a bit annyoing. But my current inclination is to say that you can register title based overrides for the main slot, but only for the main slot.

changing the heading text / wrapping logic based on content type (or title, if that approach is prefered) seems like a reasonable use case.

In my mind, different wrapping/hading means it should be a different SlotRoleHandler instance. Though internal dispatching is also possible.

I still have to update the proposal for SlotRoleRegistery. I'll probably make two versions, one where the handle depends on the model, and one where it depends on the title. Actually, I think what I really want is that the handler depends solely on the role, except for the main slot, where is solely depends on the page type.

you can always get a LinkTarget from a SlotRecord

No you can't. But with the updated proposal, SlotRoleHandle::getSlotParserOutput gets the full RevisionRecord, and you can get the title from that.

We'll need getSupportedConentModels (not isSupportedConentModel) to tell users trying to change the content model what they can change it to.

I'd want that per page type

Having a single role (so that diffs work through content type changes) seems like the more user-friendly option to me.

Diffs across content types are not required to work. Though they generally do work if both types are text based. It's nto a big deal, since content type changes are rare.

Anything that can be customizable per role must be in the role handlers, as the page type handler cannot know what roles are added by extensions it is not aware of.

I see it a bit differently: we give role handlers some controle. The default page type handler replies on that. Specialized page type handlers can override for some roles they know about. For other roles, they just use the default behavior provided by the slot hander.

what happens when the page is moved and existing slots become forbidden / missing slots become required?

Required / allowed is always only about new revisions. The code must handle non-conformat revisions gracefully.

Shouldrole handlers be able to prevent a page move when the slot does not make sense under the new title?

OMHO a page type handler would control whether the page can be moved to a given other title. I don't see what that has to do with slots.

Which of these considerations do you believe to be essential for specifying the interface now?

Is there any use case when we pass an unsaved revision to the renderer, and the audience is not the user who created it?

I seem to recall running into something like that, but I don't remember where... I'll look.

Tgr (WMF) (talkcontribs)

> I completely changed [SlotRoleHandler::getRawHtml], see my comment to brad (or better, see the code). I hope you like it better now.

Yup, thanks!

The assumption here is that we won't need to call getSlotOutput with more than one mode in the same request (as that would result in parsing multiple times). I think that's reasonable: the solo and combined views can't both appear in the same request, and while HTML and tracking can, in most cases, those processes are either completely intertwined (so requesting one always results in the other happening as well) or completely separate (so there is no overhead in doing two separate calls). Also the situations in which tracking information is needed are fairly limited.

> my current inclination is to say that you can register title based overrides for the main slot, but only for the main slot. > Actually, I think what I really want is that the handler depends solely on the role, except for the main slot, where is solely depends on the page type.

Having the main role handler depend on (or be the same as) the page type handler makes sense. For the rest, you have three options:

  • make the role handler depend on the content model (or title, or both)
  • pass in the content model (or title, or both) to all methods which should depend on it, like getSlotHeading, getOutputPlacementHints, supportsRedirects, possibly all the functionality discussed in "ContentHandler functionality that might be missing in SlotRoleHandler"
  • accept that all these functions cannot depend on the content model and title (well, some of them will need a content model anyway, so on the title in those cases).

The third option is not the end of the world but it does seem a bit restrictive to me. Varying slot headings (and edit textarea title and similar chrome that's presumably also going to be supplied by the role handler) on the content model seems like a nice ability to have.

> I'd want [to tell users trying to change the content model what they can change it to] per page type

For the main slot, that's fine. What about the rest? The page type handler won't know about most slots. Do you want to disallow changing non-main slot content types (unless maybe the page type handler specifically handles that one slot)? Otherwise, you'll need a SlotRoleHandler method for getting the list of supported content types.

> Diffs across content types are not required to work.

That depends on the use case. E.g. if a documentation slot can be either wikitext or Markdown, and someone converts it from one to the other (and changes the text accordingly), I'd want to be able to get a diff of that. (I'm not saying that's a must-have, but preferred.) Diffs between two arbitrary content types are not required to work, of course.

> we give role handlers some controle. The default page type handler replies on that. Specialized page type handlers can override for some roles they know about. For other roles, they just use the default behavior provided by the slot hander

That sounds fine. But it still means that anything that can be customizable per role must be in the role handlers, otherwise the page type handler cannot defer to them.

>> Is there any use case when we pass an unsaved revision to the renderer, and the audience is not the user who created it? > I seem to recall running into something like that, but I don't remember where... I'll look.

A better question would have been, is there any use case where we pass an unsaved revision and the audience matters? Audience checks are only relevant for revdeleted/suppressed revisions, and those are always post-PST content.

Anomie (talkcontribs)
I was a bit shy about sharing in, sins Brad complained about me writing code before we have agreed on architecture

That's only because of the issues we've run into so far:

  • Resistance to rewriting the code if it turns out to fall short in some way, because of deadlines.
  • It's easy to get caught up in code-level stuff rather than looking at the architecture. Missing the forest for the trees, so to speak.
  • The whole architecture isn't visible because only pieces are written, and are spread across multiple patches implementing the details.
  • Waiting for new code slows down the discussion.

On the other hand, I agree it can be hard to know what architecture is needed without actually trying things out. I don't know how to properly balance this.

We'll need getSupportedConentModels (not isSupportedConentModel) to tell users trying to change the content model what they can change it to.
I'd want that per page type

We need supported content models for each slot, not just for the main slot. That could be done with either getSupportedConentModels or with isSupportedConentModel filtering the list of all content models.

Shouldrole handlers be able to prevent a page move when the slot does not make sense under the new title?
OMHO a page type handler would control whether the page can be moved to a given other title. I don't see what that has to do with slots.

If title "Foo" allows role X, and title "Bar" does not allow role X, what should happen if someone tries to move the page at "Foo" (that has role X) to "Bar"? Keep in mind that the "page type handler" may not even exist, and if it does it may not actually know about some extension's role X.

The other half of the question is the reverse: if "Quux" requires role X and someone tries to move a page from "Bar" (that doesn't have role X) to "Quux", what should happen?

Tgr (WMF) (talkcontribs)

> $forUser can't be just a User object. I doubt it's needed at all (access restrictions should probably be checked before getting a rendering rather than being the responsibility of the renderer), but if it is needed we'd need options for "public" and "raw" too

Is there such a thing as rendering content for a raw audience? I can't think of a use case.

Anomie (talkcontribs)

I don't think there's any need for permissions checks at this level at all, permissions checks should probably happen before the content gets rendered.

If anything in the rendering does depend on the user somehow, it should probably be using ParserOptions::getUser() anyway.

Tgr (WMF) (talkcontribs)

> I don't think there's any need for permissions checks at this level at all, permissions checks should probably happen before the content gets rendered.

How would you do that when the parser can fetch the content of arbitrary titles for transclusion?

I agree ParserOptions should be used for that (semantically it's a bit ugly, but there isn't really a choice since parser extensions need to be able to do permission checks themselves, and they already receive the Parser object, and we surely don't want to add a User to all those hook interfaces). Which is why the question about raw audience is important as ParserOptions cannot express that.

Anomie (talkcontribs)
How would you do that when the parser can fetch the content of arbitrary titles for transclusion?

How does the parser do it now? Is there any need to change the mechanism?

Duesentrieb (talkcontribs)

RenderedRevision calls getContent() on the RevisionRecord. getContent() does an audience check. I quite dislike that fact, but I decided to not change this when replacing Revision by RevisionRecord. Changing this would have meant changing a lot of logic throughout the code.

Anomie (talkcontribs)

It could pass RevisionRecord::RAW for the audience, if the contract of RevisionRenderer/RenderedRevision specifies that the caller is responsible for such permission checks.

Tgr (WMF) (talkcontribs)

The hackathon meeting notes say we'll use message key conventions for i18n (role names and descriptions). Which is a good idea in general, but I'm not sure it removes the need for interface methods for getting those messages - other code still needs to be able to get them, while keeping the message key construction logic in one place.

Anomie (talkcontribs)

Construction logic is just `"some-prefix-$role"`, isn't it? I wouldn't go further than a static[1] method somewhere, along the lines of User::getRightDescription() or User::getGrantName().

[1]: Or I suppose in a DI world if it returns the Message or Message->text() instead of just the message key it'd be some mostly-stateless method on a service somewhere, where the only state used is the MessageLocalizer or whatever.

Tgr (WMF) (talkcontribs)

Yeah, we shouldn't push the burden of message construction to the caller - passing around message key strings is something we need to move away from. The methods will be trivial, but the SlotRoleRegistry still needs to expose them. (Also, it is not completely inconceivable that the description or even name of some role would depend on system configuration, in which case there is need for some way for handlers to override the default or pass parameters.)

Duesentrieb (talkcontribs)

Why have a method for getting the message key, if we can have method for getting the Message object?

Duesentrieb (talkcontribs)

Hm... I want to be logged in with my work account here, but with my private account on wikidata.org...

Reply to "Comments on the May 30 rewrite"

ContentHandler functionality that might be missing in SlotRoleHandler

7
Tgr (WMF) (talkcontribs)

Looking through ContentHandler/Content, there are also these things:

  • ContentHandler:
    • makeEmptyContent. Do we want to have role-based default values? (Specifically I'm thinking of roles holding JsonContent.)
    • getActionOverrides: these will effectively go away (maybe the main slot can use them but even that seems problematic), presumably replaced by something in PageTypeHandler. Do we need some new slot-level funcionality to replace them? E.g. customize the edit interface or do extra work on rollback/purge? Or just prevent certain actions? That's off-scope for the RfC but it should mention that this part will be filled in elsewhere.
    • createDifferenceEngine: do we want to tweak diff calculation or presentation for some roles? Make it more compact for example, or provide special handling for content model changes?
    • getPageLanguage/getPageViewLanguage: do we want or need a slot-level equivalent? Using a different language makes sense for some slots (e.g. source code is usually written in English, some slot might be in the interface language) but outputting lang attributes can be done in the HTML wrapping step, and I'm not sure what else the language would be useful for.
    • merge3: maybe some roles can do more intelligent conflict resolution by considering the usage patterns. E.g. a SAL-like slot where the content type is wikitext but conflicts where both edits append to the end can be special-cased.
    • getAutosummary: seems quite useful. Meaningful autosummary is rarely possible at the content level, while the slot can say things like "added <role>".
    • getChangeTag: seems useful for changes which do not normally happen (e.g. deleting a documentation slot) or are potentially disruptive (e.g. adding a templatestyle slot). But then, can be easily done via AbuseFilter as well so maybe it's needless complication (nothing seems to override the default ContentHandler behavior either).
    • getFieldsForSearchIndex/getDataForSearchIndex/getParserOutputForIndexing and similar methods in Content: some of this will probably move to the page level. Slot existence/nonexistence should probably be searchable but no need for any functionality in SlotRoleHandler for that. But slots will need to provide some sort of weight factor (finding a word in the template documentation slot is highly relevant; in the template itself, probably less relevant; in the template unit test slot, not relevant at all) at the very least. Maybe even more customization is needed.
  • Content:
    • getWikitextForTransclusion: how would this work for non-main slots? Mostly they shouldn't be transcluded, but e.g. for TemplateStyles the current workflow is that you add a <templatestyles src="SomePage.css"/> tag in the template; when SomePage.css becomes a slot on the template, we probably want to get rid of the tag as well and prepend automatically on transclusion.
    • isValid: do we want to add extra validation based on role? E.g. for the Wikidata description override, the content type would probably be TextContent, but an empty string or one containing newlines is not valid. It would be overkill to require a new content type just for that.
    • more generally, having an empty slot vs. a missing slot wasn't a meaningful difference before MCR (as "missing" would have meant page deletion) so little thought was given to it (although a long, long time ago there was much discussion about semi-deletion, and IIRC Flow also had something like that for a while), but now we should consider whether we want to keep the UI simple and use blanking as a slot deletion mechanism (or give role handlers some mechanism to decide whether they want that).
    • isCountable: should role handlers be able to suppress countable-ness of the content?
    • getSecondaryDataUpdates/getDeletionUpdates: role handlers should probably be able to add stuff. For example, various use cases that have been suggested involve CSS/JS slots. When slots change, the ResourceLoader module needs to be invalidated (currently that's not a data update but hardcoded in doEditUpdates, but we probably want to move away from that). The Content can't do that because it does not know its role name and the cache purge logic needs to differentiate between multiple CSS/JS slots on the same page.
Anomie (talkcontribs)
  • makeEmptyContent, merge3, isValid: Let's not if we can avoid it. If it's rare enough, an extension could just subclass the Content as they would now.
  • getActionOverrides: That strikes me as a use case for a page type rather than anything that makes sense at a role level.
  • createDifferenceEngine: I don't see much need for it to be role-specific, and handling changed models is out of scope IMO.
  • getAutosummary, getChangeTag: Except "added role" and such wouldn't be determined at the slot level. That happens at the level of the revision as a whole, e.g. in EditController.
  • Search stuff: Someone should probably ask the Search team.
  • getWikitextForTransclusion: For the first version of MCR, I believe the plan is to have transclusion only transclude the main slot. When we add cross-slot stuff like the possibility of TemplateStyles automatically adding the stylesheet, that would be the time to consider expanding transclusion as well.
  • Blanking as deletion: Let's not. We'll already need UI for adding a slot, also having UI for removing an existing slot doesn't seem like it would add too much more complexity.
Duesentrieb (talkcontribs)
makeEmptyContent. Do we want to have role-based default values? (Specifically I'm thinking of roles holding JsonContent.)

I think not. JsonContent is under-specified, it should really not be used directly. I'd love to make it abstract. JSON is a serialization format, it's not a content model at all.

getActionOverrides: these will effectively go away (maybe the main slot can use them but even that seems problematic), presumably replaced by something in PageTypeHandler.

Yes, this is among the main reasons I want a PageTypeHandler.

createDifferenceEngine: do we want to tweak diff calculation or presentation for some roles?

Maybe later. Since SlotRoleHandler is a base class, we don't need to have everything we may eventually need in there right away.

getPageLanguage/getPageViewLanguage

getPageViewLanguage should not be here at all. getPageLanguage() belongs to the Content.

merge3: maybe some roles can do more intelligent conflict resolution by considering the usage patterns.

That would indicate that it's a separate content model. SAL probably should be.

getAutosummary: seems quite useful. Meaningful autosummary is rarely possible at the content level, while the slot can say things like "added <role>".

I'm not convinced. "Added role" could come from PageUpdater. Also, something like "page blanked" needs to consider all slots at once.

getChangeTag

Potentially useful, but a bit unclear. What would this take as a parameter?

getFieldsForSearchIndex/getDataForSearchIndex/getParserOutputForIndexing

Perhaps the SlotHandler should be able to override getFieldsForSearchIndex, to provide boosts and the like. The rest is fine in ContentHandler, I think.

getWikitextForTransclusion:

For now, only the main slot can be translcuded directly. Non-wikitext transclusion is unclear anyway. I experimented with HTML-based transclusin a while ago. We'd need to re-open that can of works in order to get this right.

isValid: do we want to add extra validation based on role? E.g. for the Wikidata description override, the content type would probably be TextContent, but an empty string or one containing newlines is not valid. It would be overkill to require a new content type just for that.

It really doesn't make much of a difference whwther you define a new SlotRoleHandler subclass or a ContentHandler subclass...

isValid() needs to return a Status object anyway. Maybe this is an opportinity for fixing that, and to get rid of prepareSave().

more generally, having an empty slot vs. a missing slot

I'm not sure we want to offer any UI for directly removing a slot from a page.

isCountable: should role handlers be able to suppress countable-ness of the content?

Yes. And also redirects. I added:

  • supportsRedirects()
  • getArticleCountMethod()
getSecondaryDataUpdates/getDeletionUpdates: role handlers should probably be able to add stuff. For example, various use cases that have been suggested involve CSS/JS slots. When slots change, the ResourceLoader module needs to be invalidated (currently that's not a data update but hardcoded in doEditUpdates, but we probably want to move away from that). The Content can't do that because it does not know its role name and the cache purge logic needs to differentiate between multiple CSS/JS slots on the same page.

That sounds more like a PageTypeHandler use case. The CSS or JS may be used just for pages containing code, with no special meaning for the wiki. However, some pages have a special meaning for the wiki, and because of that special purging etc is needed. This has nothign to do with the slot (which is 'main' here, anyway).

Tgr (WMF) (talkcontribs)
  • Role handlers of course should not be able replace the Action class, but they should probably be able to prevent complex undos, and do special handling on purges and rollbacks. So far this was available via action overrides, for roles it won't be so they need something else.
  • Role handlers could generate nicer autosummaries (well, autosummary fragments). "Added template documentation", "Added 2 categories" etc...
  • getChangeTag would take old and new content, like before. It's just that the tag might be specific to that role. (Yeah, not sure how useful that would be in practice. Even the Content-level ability to customize it does not seem to be used anywhere.)
  • Search is the biggest issue IMO, and I agree with Brad that we should ask someone from that team.
  • getWikitextForTransclusion: I think transcluding just the main slot is what we want 99% of the time anyway, and the remaining 1% could be handled easily by prepending/appending some tag or parser function that refers to the content of the other slots and having the extension's paerser function callback taking over from there. We just need the prepend/append ability.
  • Having a UI for adding slots but not for removing them seems to be against wiki fundamentals. Unless we make the difference between no slot and empty slot very obscure, which goes back to my question.
  • What does a redirect in a non-main slot do? Force the browser to visit the other page instead, like traditional redirects? Or just replace the contents of that slot with the contents of the target slot? There are use cases for the latter (e.g. shared template documentation) but they could be handled with transclusion as well, and maybe that would be less confusing.
  • You can have CSS/JS in a non-main slot (TemplateStyles, CentralNotice, maybe gadgets). But the problem exists with any other kind of purge-able resource that can have multiple slots on a page. getSecondaryDataUpdates( Title, ...) is just not an appropriate interface when you are not using titles to address content objects anymore.
Anomie (talkcontribs)
but they should probably be able to prevent complex undos

"Complex undos" are basically the same thing as resolving an edit conflict: if you have revisions A→B→C→D, and you want to undo B, it's the same as if you started editing at B and submitted the content of A as your edit, then try to merge3() the resulting edit conflicts due to C and D. I'm not sure there's too much point in trying to prevent such undos unless you're also preventing auto-fixing edit conflicts in general.

What does a redirect in a non-main slot do? Force the browser to visit the other page instead, like traditional redirects?

That's the one use case I can think of, instead of having every Content support some in-content syntax for redirects we could have a redirect slot or redirect as part of a "metadata" slot.

Duesentrieb (talkcontribs)

I see several interesting fine points and some rabbit holes here. While I'm tempted to dive in, I want to try to keep things simple in the name of finalizing the spec by Monday. So:

Which of these things do you believe need to be decided for a first version of SlotRoleHander?

Since we have a base class, we can easily add more methods later. What we can't easily do is remove methods or change signatures.

Tgr (WMF) (talkcontribs)

All of these are things we have in Content or ContentHandler right now which we might want to operate on the role level instead, so the only code change required would be to make sure a SlotRoleRegistry and a SlotRecord or such is available where the Content is available. That does not seem hard to do in the future, so I agree, we can easily do any of these later.

Reply to "ContentHandler functionality that might be missing in SlotRoleHandler"

SlotRoleHandler instance should depend only on the role

1
Duesentrieb (talkcontribs)

After some back and forth, I changed my mind on this. The SlotRoleHandler instance to use should depend only on the role name. The handler's behavior can then differ based on content model or based on the page's title, in particular for the main slot. Extensions may need a way to hook into that in the future, but I see no need to spec that out just now.

Reply to "SlotRoleHandler instance should depend only on the role"
Tgr (WMF) (talkcontribs)

Skinning is generally poorly supported in MediaWiki; there's lots of stuff that should be under the control of the skin but is not (e.g. thumbnail framing). It would be nice to get that (more) right with MCR.

Combining content should depend on the role (and maybe the page type), but also the skin. I like the idea of getOutputPlacementHints for dealing with positioning the content within the skin, but the role handler needs to wrap the content HTML, and there should be a way for the skin to replace that with its own wrapping mechanism, IMO.

Anomie (talkcontribs)

As I said the last time you brought this up, that seems like something to put off until we have the basics of MCR working. If we want to do it at all.

Duesentrieb (talkcontribs)

All of this is under the control of RevisionRenderer now. We can later change how that works internally, shifting responsibility between SlotRoleHandler, PageTypeHandler and Skins as we like.

Reply to "Skin integration"

SlotRoleHandler::getRawHtml()

5
Duesentrieb (talkcontribs)

During yesterday's meeting, concerns where raised that the proposed logic of getRawHtml() is backwards: we really want SlotRoleHandler::getSlotParserOutput() to generate ParserOutput from a Content object, and RenderedRevision::getSlotParserOutput() to provide lazy/cached access to that.

That does seem much more straight forward, but it conflates two things:

  • (A) the ParserOutput object for the slot in isolation, which is used to get the necessary DataUpdates and a combined LinksUpdate for updating the database. This ParserOutput would also be used for single-slot display.
  • (B) and the HTML representing the slot's content to be used in the combined ParserOutput for the revision, and any associated meta-information that needs to go into the HTML head or HTTP header.

In the the trivial case, (B) is based directly on (A). The HTML comes from getRawText(), and the meta-info comes from getHeadItems() and friends. Also trivially, (B) could simply be empty, hiding the output.

But we want to give SlotRoleHandler full control over how the content is represented, e.g. by wrapping it. We would need a mechanism to get the "solo" ParserOutput for a slot's content as well as the "for-combined-output" ParserOutput, which may or may not be based on the "solo" output. And at least the "solo" output needs to be cached, which cannot (easily) be done inside the SlotRoleHandler.

The current approach is to provide (lazy, cached) access to the solo output to SlotRoleHandler by passing a SlotParserOutputProvider, and then let the SlotRoleHandler decide if the HTML for use in the combined output should be based on the solo output at all, and if yes, how that HTML should be modified.

Part of the confusion stems from the fact that ParserOutput represents (at least) two very different kinds of information: output for display (HTML, but also head items, HTTP cache parameters, etc) and output for storage in the database (secondary data updates, dependency tracking info, backlinks). It may be useful to have separate interfaces for these aspects. Let's for now call them RenderedContent and ContentTrackingData.

If we want SlotRoleHandler to be the source of ParserOutput objects, we'd expect method signatures like this:

  • getContentTrackingData( Content $cont, ParserOptions $po ): ContentTrackingData
  • getRenderedContentForSoloDisplay( Content $cont, ParserOptions $po ): RenderedContent
  • getRenderedContentForCombining( Content $cont, ParserOptions $po ): RenderedContent

In most cases, all three would actually return the same ParserOutput object. And typically, at least two of them would be called when rendering a revision. We'd want to re-use that ParserOutput object, so we'll have to implement caching logic inside SlotRoleHandler (or in a ParserCache-like service used by SlotRoleHandler).

Also, passing the Content object is not enough: in the future, the rendering of one slot may depend on the content of another slot (cross-slot translcusion). So we would have to pass in the RevisionRecord instead of a single Content object. And since RevisionRecord::getContent() does audience checks, we'd have to pass in parameters for that as well.

It seems to me like having the SlotRoleHandler act as the source of ParserOutput seems straight forward, it's not feasible with the in-place caching approach: The ParserOutput is also needed for a different purpose in a different context (recording tracking info in the DB), and the same ParserOutput object should be re-used when generating combined HTML.

The issue boils down to the question of whether we want to introduce a ParserCache-like caching service for rendered content. Unlike ParserCache, that new service will have to be aware of slot names and revision IDs as well as per-user private/privileged content, and will have to work for unsaved content too. Only with such a mechanism in place can we have a more straight forward interface for SlotRoleHandler. As long as we stick with in-place caching, we need getRawHtml to take some kind of SlotParserOutputProvider that does the caching and lazy initialization.

EDIT: I'll poke at the code a bit, maybe it's not so bad afterall...

Anomie (talkcontribs)

Another possibility is that SlotRoleHandler is used to produce the ParserOutput for combined display, and for solo display you go to the Content object directly. That, of course, means that we'd limit solo display of a slot's content to be just a display of the Content rather than anything more fancy. Do we have any use cases that require something more fancy?

And since RevisionRecord::getContent() does audience checks, we'd have to pass in parameters for that as well.

We don't have to. As I pointed out elsewhere, such checks could be defined as being done at a higher level and SlotRoleHandler could just use RevisionRecord::RAW.

It seems to me like having the SlotRoleHandler act as the source of ParserOutput seems straight forward, it's not feasible with the in-place caching approach: The ParserOutput is also needed for a different purpose in a different context (recording tracking info in the DB), and the same ParserOutput object should be re-used when generating combined HTML.

I don't see why you say those two uses of the same ParserOutput object require the SlotParserOutputProvider approach.

The issue boils down to the question of whether we want to introduce a ParserCache-like caching service for rendered content. Unlike ParserCache, that new service will have to be aware of slot names and revision IDs as well as per-user private/privileged content, and will have to work for unsaved content too.

Why would it have to do all that?

ParserCache is already aware of revision IDs, and adding roles wouldn't be very difficult. "per-user private/privileged content" and "unsaved content" seem unnecessary to me.

Duesentrieb (talkcontribs)

So, after poking at the code a bit, it seems like we can have the SlotRoleHandler as the source of ParserOutput objects that are then cached by RenderedRevision if we can make the following assumptions:

  • The per-slot ParserOutput we want to cache in-place is always the one to be used for the combined view. The ParserOutput for a single slot view needs no caching, nor does the no-html ParserOutput.
  • The ParserOutput returned for the combined view contains all the tracking info, not just the tracking info for things visible in that view. In particular, if the content is invisible in the combined view (the HTML is empty), all tracking info should still be present in the ParserOutput.

For the code in DerivedPageDataUpdater, the first assumption holds. The second assumption will need to be enforced via the contract of SlotRoleHandler.

I will update the spec accordingly. I'm still a bit worried that we may miss things if the "solo" ParserOutput isn't available via RenderedRevision. But so far, it doesn't seem to be a problem.

Tgr (WMF) (talkcontribs)

IMO the easiest approach is to discourage dealing with the handler directly. Everything should go through RenderedRevision, which caches (for a specific ParserOptions, which also defines the audience) the revision, the raw ParserOutputs, the wrapped ParserOutputs and the combined ParserOutput. Putting getContentTrackingData on RenderedRevision (or making RenderedRevision a parameter for it) is inelegant but not the end of the world.

Duesentrieb (talkcontribs)

Yea, this is looking good in a code experiment.

Reply to "SlotRoleHandler::getRawHtml()"
Anomie (talkcontribs)

I guess this is assuming every SlotRoleHandler only handles exactly one content model. IMO that's a bad idea, see Topic:Ucdj20quoylnprcf.

Daniel Kinzler (WMDE) (talkcontribs)

I think that for a given slot on a given page, only one content model should be possible. The same slot on different pages may use different models, but the only good use case I can see for this is the main slot.

I'm ok with turning getContentModel() into isContentModelSupported( $model ) though.

Tgr (WMF) (talkcontribs)

That would mean that for things where the content model is somewhat flexible (e.g. wikicode vs. HTML vs. some other markup language), changing the content model would require moving it to another slot and thus breaking diffs. In some cases that's probably a good thing (wikicode to HTML; although even then not breaking visual diffs would be nice), in other cases probably not (wikicode to markdown).

Reply to "getContentModel()"

"there may be several different SlotRoleHandlers for the same role"

5
Anomie (talkcontribs)

That strikes me as a bad idea, and in particular the implication that role+model determines the SlotRoleHandler. If some slot actually wants to do such a thing, IMO that would be better with composition: having a SlotRoleHandler based on the role, which uses some per-model subhandler.

I see two general types of slots:

  • Slots that do something special with the Content and require a particular content model to do that.
  • Slots that deal with things at the ParserOutput level, that don't care what content model produced it.

It seems unlikely to me that we'll have some sort of hybrid where the same role does different particular things with different kinds of Content, and if it does it can easily enough be done with composition as mentioned above.

Daniel Kinzler (WMDE) (talkcontribs)

In my mind, role+title determines the SlotRoleHandler, and the SlotRoleHandler determines the content model.

I don't really see a use case for your second type of slots. Maybe a "documentation" slot that supports wikitext but also markdown, or something?

What you propose wrt per-role sub-handlers is actually pretty close to letting the handler for the main slot be determined by the current main slot's model on a given page. The only difference is whether the factory selects and returns the sub-handler up front, or whether you do dispatching/delegation based on the content model inside the main slot handler.

Anomie (talkcontribs)

More generally, a "documentation" slot that doesn't care what the content model is, it just takes whatever HTML the Content produces and displays it wrapped in a documentation box. Chances are in most cases it'll be wikitext, but there's nothing really forcing that.

The same could be done with an "infobox" slot, with expected content types being wikitext or some structured data but any HTML output could be accepted. Or a "metadata" slot that exists solely to produce categories, set the display title, and so on.

The problem with "role+title" is that you're then having to have a bunch of configuration and logic to decide which of multiple handlers for the role based on the title, and you can more easily wind up in a situation where no handler exists for a role+title despite the page having content for that slot.

Daniel Kinzler (WMDE) (talkcontribs)

I see the complication, but I don't see the solution.

Your approach just seems to move the problem from SlotRoleRegistery into SlotRoleHandler. Am I missing something?

Anomie (talkcontribs)

The SlotRoleHandler can be smarter about the branching (e.g. doing it on the user-specified content model rather than guessing based on the title), and knows more context by virtue of being the role handler rather than a generic dispatcher, and doesn't have to be generic one-size-fits-all by the same virtue.

Reply to ""there may be several different SlotRoleHandlers for the same role""

SlotRoleHandler for the main slot

4
Anomie (talkcontribs)

I still think that the main slot doesn't really need a special SlotRoleHandler. The very name "main" implies that it's the main content of the page that other slots extend, work with, or augment.

What are you thinking that any SlotRoleHandler for the main slot would actually do that wouldn't be handled as well by setting $combinedOutput to the main slot's ParserOutput before processing all the rest of the slots?

Daniel Kinzler (WMDE) (talkcontribs)

I want to avoid as much special casing for the "main" slot as possible. In my mind, the concept of a main slot only exists for backwards compatibility, and it should become less and less special over time.

We will probably want SlotRoleHandler to take on several other minor responsibilities in addition to combining output - e.g. determine countability, and perhaps other things that currently reside in ContentHandler.

Essentially, not having a SlotRoleHandler for the main slot would be as awkward as not having a ContentHandler for wikitext. Lot's of "if main do this, otherwise do that" all over the code.

Anomie (talkcontribs)

It seems highly unlikely that we'd have "if main do this, otherwise do that" all over the code. At worst we'd have a "getSlotRoleHandlerForSlot()" method of some sort that would return a generic handler for the main slot.

Daniel Kinzler (WMDE) (talkcontribs)

That generic "getSlotRoleHandlerForSlot()" method and generic slot handler is exactly what I'm proposing. The only question is whether it's the same handler regardless of content model. And I would leave that decision as an implementation detail. Code that uses the slot handler should not know or care.

Reply to "SlotRoleHandler for the main slot"

wrapOutput/getOutputPlacement/placeOutput/registerOutput

4
Anomie (talkcontribs)

It's not clear to me what model you're envisioning here.

My best guess is that wrapOutput() + getOutputPlacement() + registerOutput() would be used for a model that has a central combiner: wrapOutput() applies framing to the Content's ParserOutput, getOutputPlacement() hints to the combiner where to put that ParserOutput in the combined page, and registerOutput() I guess is to override the modules and such embedded in the Content's ParserOutput (bikeshed: that name sucks. registerOutputMetadata()?).

I'm not sure what placeOutput() is supposed to be as an alternative. Allowing for splitting one ParserOutput across multiple "locations"? That makes some sense, and may be needed if we want to slot-ify the existing (non-SDC) File page stuff (render the File itself at the top and the upload history below, assuming we don't move that to the history page). I'm not clear as to what the $layout parameter is supposed to do/be though.

Another option I don't see here would be to only have a method like registerOutput(), and let it use utility methods (or directly getRawText() and setText()) to do the placing of HTML blobs into $combinedOutput.

Daniel Kinzler (WMDE) (talkcontribs)

Your guess of my intention with wrapOutput() + getOutputPlacement() + registerOutput() is correct.

The idea of placeOutput() is that it directly puts the desired HTML into the $layout (somewhere/somehow), which is essentially an array of HTML chunks. This avoids the need to specify some kind of abstract layout constants to be returned by getOutputPlacement().

But in the end, this mechanism won't work for anything beyond a simple list anyway. More sophisticated layouts, e.g. a side-by-side view for translations/transcriptions, would need a more powerful controller (in my mind, a PageTypeHandler).

I intentionally didn't mention the third option. Manipulating a ParserOutput in that way seems extremely scary to me. manipulation of HTML strings should be avoided with strong prejudice, even if it's just appending - especially if there is no way to make sure that it's really just appending.

Anomie (talkcontribs)

A "side-by-side view for translations/transcriptions" seems like a very specialized use case, at least if you want a display that tries to keep corresponding sentences side-by-side instead of just dumping two articles. Even with a PageTypeHandler it would still have to be doing the "extremely scary" manipulation of HTML strings to properly line things up, or else it would have to be bypassing the normal ParserOutput objects entirely in favor of doing its own parsing.

Daniel Kinzler (WMDE) (talkcontribs)

I'd go for the 80% solution and let people scroll and align manually. Not as nice, but still a lot better than having to deal with separate browser windows.

The translation view is just a very obvious examples. Many use cases could benefit from departing from the "one column" model.

But if and how that should be done doesn't have to be decided here. Here,we only need to decide how a slot indicates roughly where its content should be shown.

Reply to "wrapOutput/getOutputPlacement/placeOutput/registerOutput"

getNameMessageKey() and getDescriptionMessageKey()

2
Anomie (talkcontribs)

Translatewiki already deals with non-extension-prefixed messages for things like user rights, and the default behavior for action API messages doesnt encourage extension-prefixing either (although with some work it can be overridden).

IMO it would be better to drop these methods and just go with a convention for building keys.

Daniel Kinzler (WMDE) (talkcontribs)

Fair enough. Both solutions seem kind of ugly to me, so I don't care too much.

Reply to "getNameMessageKey() and getDescriptionMessageKey()"