Topic on User talk:Daniel Kinzler (WMDE)/MCR-PageUpdater

Further refactoring

7 comments • 15:12, 7 June 2018 5 years ago

7

Anomie (talkcontribs)

It might be useful to divide the descriptions of the various components into low-level (with correspondence to the green boxes or blue ovals) and high-level (with correspondence to the various arrows), and/or make more direct reference to which pieces of the diagram each class is handling and which pieces of the diagram are handled by which classes.

From your description of EditController and PageUpdater, it seems like you're thinking that the three arrows currently going from the "Sections, Edit conflicts" box to the "PST, Make rev" box would instead go through the "Data input" blue-oval, with the result that the Red dataflow and the Purple, Black, and Yellow would use the same method to enter the "PST, Make rev" box? If that's what you mean, it sounds reasonable to me. I note only the Black could actually reuse the Red dataflow itself though, since Yellow and Purple don't have the same endpoints.

EditRevisionFactory (better name pending) builds an unsaved RevisionRecord from a RevisionSlotsUpdate, along with a RenderedRevision [...] It tries to take PST content and rendered content from th edit stash.

Why "along with a RenderedRevision"?

I wouldn't put the stash handling in there either. The high-level class for handling the Black/Grey dataflow should check the stash and follow Black or Grey as appropriate, and EditRevisionFactory shouldn't have to know anything about the stash.

I know that's a bit of a break from what WikiPage::doEditContent() does now. IMO that's ok.

DerivedPageDataUpdater takes a (safed) RevisionRecord and a RenderedRevision, and takes care of updating any secondary data and cached artifacts (optionally, recursively). It may re-use or re-render ParserOutput as appropriate.

It's a technicality, but it takes care of collecting the secondary data updates and scheduling them. The updates/jobs themselves actually do the updating.

"re-rendering" should be done by RevisionRenderer, probably via lazy callbacks from the RenderedRevision.

the job of discarding ParserOutput objects that depend on the revision ID may be handled by RenderedRevision. Perhaps it could have a setRevisionId() or updateRevision() method for this.

That seems like a lot of logic to be putting inside RenderedRevision. Why not have the code pass the existing RenderedRevision (for the unsaved revision) and the saved RevisionRecord back to RevisionRenderer, which would return a new RenderedRevision that has state copied from the old RenderedRevision when appropriate? Or I suppose you could have RevisionRenderer mutate the passed-in RevisionRecord if you want.

It feels odd that I'm telling you this instead of the other way around.

RenderedRevision should not know RevisionRenderer directly to avoid circular dependencies. We can make RevisionRenderer implement a ParserOutputCombiner interface to isolate them.

At the level of concrete objects, RenderedRevision::__construct() should be //given// $this when RevisionRenderer creates one. Personally I don't see much point to have a separate partial interface for RevisionRenderer to implement instead of just letting RenderedRevision not call the unneeded methods.

PageStore is a stateless service that can load PageRecords from the page table, and can update the page table when pages are created, updated, or deleted.

I note that "PageRecord" isn't defined anywhere in here.

Reply 17:52, 31 May 2018 5 years ago

Duesentrieb (talkcontribs)

I udpated the text to more directly refer to the diagram and fixed some other things you mentioned.

From your description of EditController and PageUpdater, it seems like you're thinking that the three arrows currently going from the "Sections, Edit conflicts" box to the "PST, Make rev" box would instead go through the "Data input" blue-oval, with the result that the Red dataflow and the Purple, Black, and Yellow would use the same method to enter the "PST, Make rev" box?

Indeed. More precisely, Black (edit) feeds into red (saveRevision), and lime (purge) is mostly the middle part of red (saveRevision). Green (null aka dummy rev) would just be a special case of red.

EditRevisionFactory (better name pending) builds an unsaved RevisionRecord from a RevisionSlotsUpdate, along with a RenderedRevision [...] It tries to take PST content and rendered content from th edit stash.

Why "along with a RenderedRevision"?

I wouldn't put the stash handling in there either.

I'd love to push stash handling to the black/yellow flows resp the "Sections" block. But since PST content and rendered output come from the stash, the stash needs to be checked by the code that otherwise creates those things. Or where you thinking to have a setPreComputedData() method on the EditRevisionFactory object, to be called by the EditControlelr? That's not very pretty, but would work too, I think.

As to why EditRevisionFactory should also create a RenderedRevison: because otherwise, we'd need a separate factory for that, and I don't see the point. EditRevisionFactory does PST, so this is the earliest time we can construct a RenderedRevision. EditRevisionFactory also triggers hooks that potentially need the RenderedRevision. But we could pull this "up" into the controller for the red flow, which is PageUpdater. I'll have to check the code, but I think that should work fine as well.

Why not have the code pass the existing RenderedRevision (for the unsaved revision) and the saved RevisionRecord back to RevisionRenderer, which would return a new RenderedRevision that has state copied from the old RenderedRevision when appropriate? Or I suppose you could have RevisionRenderer mutate the passed-in RevisionRecord if you want.

To me it seems odd to have RevisionRenderer know about that. RevisionRenderer's job is to compose revision output from slot output, and to provide a lazy wrapper around that process. The output-depends-on-revision-id stuff seems to be an arcane detail of the caching logic - and the caching logic is in RenderedRevision. So in the name of encapsulation, any knowledge about when and why to discard the cached data should be in there as well.

At the level of concrete objects, RenderedRevision::__construct() should be //given// $this when RevisionRenderer creates one. Personally I don't see much point to have a separate partial interface for RevisionRenderer to implement instead of just letting RenderedRevision not call the unneeded methods.

Indeed: RevisionRenderer::renderRevision should pass $this to RenderedRevision::__construct(). But since RevisionRenderer knows RenderedRevision, the opposite should not be true. In my experience, circular static binding is just evil. Always. If it doesn't do any harm immediately, it'll burn down your hose when you don't look.

"Micro interfaces" don't do any harm - indeed, "narrow interfaces" are generally considered a good thing, they are easier to mock, replace, and combine. I agree that implementations shouldn't be split into too many parts - knowledge locality is a thing. Splitting the implementations should be avoided if it means spreading knowledge abotu implementation details across multiple classes.

Reply 15:50, 4 June 2018 5 years ago

Anomie (talkcontribs)

I'd love to push stash handling to the black/yellow flows resp the "Sections" block. But since PST content and rendered output come from the stash, the stash needs to be checked by the code that otherwise creates those things. Or where you thinking to have a setPreComputedData() method on the EditRevisionFactory object, to be called by the EditControlelr?

It depends on what exactly is needed to key the stash, and what information exactly is in the stash.

We might key the stash on the token returned by ApiStashEdit, by having the JS that made the API call inject the it into the edit form. We'd validate that the stash entry isn't stale by including in it the page's current revision ID at stash time plus hashes of the data making up the "User input", which we can compare to the now-current revision ID and the data in the current "User input". Then the rest of the data in the stash entry would be whatever state is necessary to recreate the RevisionRecord and the RenderedRevision without having to do the potentially-expensive processing done by EditRevisionFactory.

As to why EditRevisionFactory should also create a RenderedRevison: because otherwise, we'd need a separate factory for that, and I don't see the point.

We already have RevisionRenderer, which is the factory for RenderedRevison.

The output-depends-on-revision-id stuff seems to be an arcane detail of the caching logic - and the caching logic is in RenderedRevision. So in the name of encapsulation, any knowledge about when and why to discard the cached data should be in there as well.

My mental model was that RenderedRevision holds the cache state and is externally seen as immutable (internally the cache gets populated on access, but external users can't see that). When the unsaved revision is saved which creates a new, saved RevisionRecord, RenderedRevision would never even know it. Thus the proposed method on RevisionRenderer to copy state from the old RenderedRevision based on the unsaved RevisionRecord to a new one based on the corresponding saved RevisionRecord.

If you want to make RenderedRevision's revision mutable, then sure, the cache invalidation logic should be in the setNewRevision() method that mutates it. I don't really mind that, but I thought you liked immutable value objects better than mutable ones ;)

Reply 18:28, 6 June 2018 5 years ago

Duesentrieb (talkcontribs)

Re having the creation of the RenderedRevision in the EditRevisionFactory: you are right, conceptually. I have changed the spec. How exactly this will work with EditStash is still a bit unclear to me, but we can figure that out later.

If you want to make RenderedRevision's revision mutable, then sure, the cache invalidation logic should be in the setNewRevision() method that mutates it. I don't really mind that, but I thought you liked immutable value objects better than mutable ones ;)

I do like immutable objects, but I'm not religious about it. Having a way to selectively invalidate the cache when the revision ID becomes known seems the best approach here.

Reply 15:12, 7 June 2018 5 years ago

Tgr (WMF) (talkcontribs)

This looks generally good to me (although I agree with Brad that ParserOutputCombiner seems superfluous; I'm not sure I understand the remark about circular references - is that about circular dependency, or some kind of GC optimization?).

The one thing that's missing that's been discussed in the past is page types (which would influence slot combining at the very least) - would there be some PageTypeRegistry that RevisionRenderer uses internally to pass control to the right PageTypeHandler? Or have we abandoned the idea of page-type-specific composition completely? (Or just out of scope for this?)

Reply 08:45, 1 June 2018 5 years ago

Duesentrieb (talkcontribs)

I'm not worried about circular references at runtime. Just about circular static dependencies.

As to PageTypeHandler: we'll come back to that concept when we thing about refactoring the Article class. And we may end up using it in RevisionRenderer. But I indeed consider that out of scope for this document.

Reply 15:52, 4 June 2018 5 years ago

Anomie (talkcontribs)

I think it's about having RenderedRevision take a RevisionRenderer, and RevisionRenderer having a method that returns a RenderedRevision even though RenderedRevision doesn't use that method. But Daniel might correct me.

I think we agreed to not worry about page types for now. If it turns out we do need different combining logic by page type then it can be added in easily enough as you suggest.

Reply 19:39, 1 June 2018 5 years ago

Reply to "Further refactoring"