User:Brooke Vibber/MCR alternative thoughts

From mediawiki.org

Some late-night thoughts on the MCR model and possible alternatives, just thinking out loud. Might be a little rambling.


Two major areas to look at in alternate proposals to multi-content revisions, based on various discussions.

  1. Separate tables for separate content types
  2. Separating out the atomic bundling concern

Tables[edit]

Separate tables for separate content types doesn’t sound so exciting if every type is a blob reference, but there are good reasons why they might not always (or often) be such.

Some content types will be singletons per page, but others allow multiple items, such as ideas for storing categories, or metadata, or citation data, etc. If blobs, then each type still has to reinvent array storage and addressing for its items within their blobs, or else find a way to address an open-ended set up slot entries. If tables, then have choice of one-to-one or one-to-many tables, and how to key and sort them.

Structured data may also choose to store directly in db tables rather than as blobs. Might be more compact, or avoid need to ‘compile’ summary tables from source blobs. Depending on how schema works might more efficiently encode versioning versus blob-per-version.


  • page: title “Foobar”
    • revision 1:M:
      • text 1:1: ES blob for “hello this is text”
      • catlinks_rev 1:M: references to target titles (title strings can be indirected to make these int-int pairs for each rev)
      • citations_rev 1:M: ES blobs containing JSON of revisions

Etc

Limitations:

  • still have to separately figure out how to extend import/export and other things to include such data -- the MCR model adds blobs that will 'just work' in import/export
  • if you are doing 1:1 blobs, then each extension has to do more work to support them...
  • still may or may not need to add joins to various queries (but don't change those just doing 'plaintext' pulls)
  • doesn't by itself do anything to compact the existing revision table (by not changing it)

Bundling[edit]

Separating out the atomic bundling concern reminds me of the elegance, and yet the frequent practical difficulties, of “resource forks” and multiple data streams in old Mac OS’s filesystem.

Each file in the old Mac OS came with a “data fork” which was a traditional file -- a linear stream of bytes -- plus a “resource fork” which could be read separately in a structured fashion. This allowed any file to bear a custom icon, or applications to have user interface elements stripped and replaced with localized versions.

But despite the obvious benefits of combining these additional resources into a single addressable file, there were difficulties, in particular with interchange with other operating systems. Network disk shares, copying files to a DOS diskette, or transferring a file over a modem could either destroy the resource data entirely, or add a bunch of weird extra encoding that itself caused compatibility difficulties.

Eventually Mac OS merged with NeXTStep to become Mac OS X (now “macOS” as of the new 10.12 release) which took a very different approach to bundling resources: the directory.

“Bundles” of files contained under a subdirectory could be treated in the user interface as a single element. Mostly this was already provided by the view of dirs/folders in a file manager; one need only hide the ability to dive into the directory.

A network or removable disk filesystem didn’t need to understand bundle directories -- you just had to copy in the files and directories and they worked on the other end.

This still wasn’t perfect; attaching a “file” that’s really a bundle to an email might require the extra step of compressing it to an archive, but this could usually be done in a good UI, and for the most part things don’t explode even when they look weird.


In comparison to “bundling” wiki pages of different content types into a single item that can be created, viewed, edited, moved, exported, imported etc atomically... we do have the existing usage of “subpages” similar to subdirectories in a filesystem.

Our page-move system for instance includes support for moving the “subpages” along with the parent page… but others, like Special:Export may not automatically handle such magic.

Should we fight against the existing usage in our effort to provide something better? Or could we adopt it and improve on it…?

  • Figure out how to distinguish reliably between “titles that include a / in them” and “subpage roots”
  • Make sure various things that address subpage roots have reasonable behavior with respect to the subpages
    • Export of a template should include its doc subpages, etc
  • Good UI magic for navigating within a subpage tree
  • Figure out how to do atomic versioning of a bundle for things that need that
    • Doc pages should be consistent with their templates
    • Gadget or userscript CSS and JS should be consistent with each other
    • Metadata blobs should reflect the file pages they come from
    • File description pages should reflect the file versions they describe
  • Do we need a metadata field that refers to a bundle version?

Versioning thoughts. Compare with git: a revision blob contains references to the current version of its path trees and files; because we enter from the top-level, we always see a consistent set of versions -- there might have been 6 files updated in the revision we look at, but we treat those updates as belonging to the same event and never pull inconsistent revisions of different files.

In wiki land, we could have each revision of the root page associated with a subpage bundle versioning set, which is consistently updated.

If the root page is of a special content type then it might encode those versions itself. But how to associate, say, a bundle of a template page with its documentation page without changing how editing the template works?

Is this a problem that isn’t specific to subpage bundles, though? Is this the same problem with, say, versioning of an image file or template used in a wiki document? -> dependency tracking/management

Maybe that’s something we should consider in more detail, separately from the bundling issue.