Topic on Talk:Requests for comment/Schema update for multiple content objects per revision (MCR) in XML dumps

Notes on reviewing the new schema

2
EpochFail (talkcontribs)

I'm looking at https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/docs/export-0.11.xsd#241

A few notes:

  • It seems like the "sha1" field is missing from the new ContentTextType that appears inside of ContentType. It seems to show up in the examples on this page.
  • It appears as though every <content> tag (ContentTextType) has a "deleted" field. Is it possible to deleted individual slots? Is the deleted status just duplicated across all of the slots?
  • Can we add DELECTED_RESTRICTED to the schema while we're refactoring?

A general note:

It's probably too late for this, but I would find the following change way more intuitive:

original current proposed
<revision>
  <id>123</id>
  <text>Some text</text>
  <sha1>abc123</sha1>
</revision>
<revision>
  <id>123</id>
  <text sha1="abc123">Some text</text>
  <sha1>ebf234</sha1>
  <content>
    <role>wd_entity</role>
    <format>text/json</format>
    <text sha1="cc23de">{"QID":1103379, ...}</text>
  </content>
</revision>
<revision>
  <id>123</id>
  <text>Some text</text>
  <sha1>abc123</sha1>
  <slots sha1="ebf234">
    <content role="wd_entity" sha1="cc23de" format="text/json">{"QID":1103379, ...}</content>
  </slots>
</revision>

This strategy preserves backwards compatibility and puts the slots-level sha1 in a more obvious location.

EpochFail (talkcontribs)

One other note. It appears there's a new "id" attribute for the old <text> tag but it doesn't appear in the new <content> tag. Should the <content> tag have a sub-tag for "id"? Or maybe the new <text> tag should have an attribute? I'm unclear why "format" is a new tag but "sha1" remains an attribute of the new <text> tag.

<!-- This isn't a good idea; we should be using "ID" instead of "NMTOKEN" -->
<!-- However, "NMTOKEN" is strictest definition that is both compatible with existing -->
<!-- usage ([0-9]+) and with the "ID" type. -->
<attribute name="id" type="NMTOKEN" />
Reply to "Notes on reviewing the new schema"