Topic on Talk:Requests for comment/Schema update for multiple content objects per revision (MCR) in XML dumps

Duesentrieb (talkcontribs)

The proposed dump format is still using numeric text IDs. That cannot be guaranteed to work, text blobs are now identified by URL-like blob addresses: "tt:12345" is the address of text row 12345, and we may start using "ext:DB:..." for ExternalStore soon.

So, instead of <text id="305112983" bytes="143" /> we need to use <text id="tt:305112983" bytes="143" />. The numeric form could still be supported for backwards compatibility, with the prefix "tt:" being assumed if none is given.

Tgr (WMF) (talkcontribs)

Or it could be hidden as internal detail. That's a B/C break but seems like some kind of break is necessary anyway? The blob IDs do not seem to serve any useful purpose that's not already served by the sha1.

Reply to "Blob Addresses"