Topic on User talk:GWicke

OutputPage Microdata API

6
Dantman (talkcontribs)

GWicke, I've been going over the idea for apis for adding global Microdata and RDFa data to a page. After thinking about it more I can't think of any value at all for an api that adds anything beyond one single page wide Microdata item.

  • The only value itemprops have in the <head> is for specifying things on a page wide itemscope, ie: http://schema.org/WebPage.
  • Anything other than that can be done inside the body.
  • Microdata doesn't have any way for two itemscopes to co-exist. So there is no way for any more than one itemscope to be inside the head. If there is any other page wide itemtype it can only work by completely replacing schema.org's WebPage type.
  • The reasoning for schema.org's WebPage being page-wide is both so that things like the name, image, description, etc... can be defined in the head as well as for the definition of in-body things like breadcrumbs, significant links, the primary image, etc... other itemtypes wouldn't have that same kind of value and don't have a reason to be page-wide inside the head.
GWicke (talkcontribs)

I agree- which is also why I wouldn't mind adding that itemtype statically in the skin.

I had a conversation with Ian Hickson about the missing multi-itemtype support on Monday. It does not seem to be easy to convince him of the benefits. A set of very concrete and well-presented use cases seems to be about the only thing that might move him to consider it. He also stated that Google et al would adapt their custom indexing to anything we do anyway.

RDFa handles the multiple-itemtype case well, and has the additional advantage of hooking into RDFS et al if needed. Might be something to consider for the parser DOM too, if we can't convince Hixie. The differences in DOM structure between the two are mostly about slightly different names and contents of attributes. Manu Sporny (the W3C RDFa-in-HTML5 WG chair) offered his support as well.

Dantman (talkcontribs)

My initial thought was to create an api that could create something like User:Dantman/Output Metadata/Example Output. An api that would let you define multiple Microdata items and anything beyond one global would be appended to the <body> so that it would be parsed. However then you're given the fact that WebPage also uses the body. So I thought of a hack that would make WebPage weighted heavier, ie: if WebPage was used as an itemtype the engine would implicitly make it the one on the body even if another extension inserted something else. But then where's the value in that, especially if another vocab does in fact have a good reason for the same. So it all fell apart.

Originally I was only adding itemprop="" for completion sake. It's something that <meta> supported and I was adding support for the property="" attribute.

But right now in fact I can't even find a good reason for it's actual use anywhere at all. The Google +1 button that suggests it's use actually parses OpenGraph data already. I doubt WebPage's name and description attributes give much value given they're already part of standard metadata. Heck WebPage doesn't even have a way to separate name and sitename like OpenGraph does. And I haven't quite figured out what the value of some of the itemprops that actually belong to WebPage are for. Hell, WebPage has a mainContentOfPage that's supposed to be usable to indicate the primary contents, but it uses a WebPageElement and most pages don't have types that fit into that at all.

So I'm with the thought of completely excluding Microdata support till someone comes up with a good hard use case they want to support, at which point we can go hardcode something in for it.

For skins like you think sounds like a good idea. In that case we should add another hook or two to make sure that skins are capable of tweaking the attributes for the <html> tag, etc...


I actually spent most of yesterday reading through RDFa and getting a full understanding of all the rules and properties of it. I actually kind of like it. Microdata still has it's own likeable taste, and RDFa can notably be hard to understand for general users, but still RDFa is pretty good.

If Google et-all are going to work around us, then perhaps in the cases where we do find something we want to do with Metadata on the global page we should try using RDFa even if the data is actually supposed to be Microdata.

((I wonder what Google would think if we started embedding RDFa data onto articles that let them parse out category information))

For RDFa support I haven't thought up the full API yet. However I did consider part of the Sub-API. What do you think of this class User:Dantman/Output Metadata/Namespace API? It's goal is to let us use XML/RDFa namespaces but do it in a clean format way that guarantees that we're not going to have two extensions ever collide with the same prefix for different namespaces.

GWicke (talkcontribs)

The need for such a class illustrates why microdata tries so hard to avoid XML namespaces ;) In general it looks quite useful to me, at least for the management of global prefixes defined on the html element.

An alternative would be to discourage manually defining / using prefixes, and simply use full URIs. For compression, we could either rely on gzip to do its thing or convert URIs to prefixed names for known (and/or globally defined) global prefixes.

Dantman (talkcontribs)

Actually Microdata just wants to cater to the lowest common denominator. ;)

XML's namespaces aren't that bad. In fact this class is only useful on the global scope because we'd be shoving thing from multiple extensions into the head. Besides the narrow case of stuff being put into the head everywhere else it works fine without conflicts because even if the rest of the app is disconnected and could be using different namespaces all you do is define the namespaces you actually use locally.

The class really isn't even needed, it's only to avoid a extremely, extremely, extremely remote edge case which in all reality we will never ever run into. But it's a bug and it's fun to fix ;). Makes code cleaner too.

Full URIs would be a bad idea. One note property="" is a SafeCURIEorCURIEorIRI. Technically that means if you define a xmlns:http or prefix="http: ..." it could be evaluated as a CURIE instead of an IRI. But more importantly, OpenGraph is shit. OpenGraph even states "We've based the initial version of the protocol on RDFa". In other words, it's actually NOT RDFa. Besides the fact that OpenGraph ignores <link> and uses <meta> to define urls. And the fact that the way OpenGraph uses og:image and og:image:width and requires proper ordering of the properties is technically incompatible with RDFa if an implementation doesn't keep the sort order. Most OpenGraph implementations work by looping over <meta> elements and looking for property attributes starting with og:. And there doesn't seam to be anything on the OGP page to say they're wrong to do that. Facebook works fine even when you don't define an og: namespace, so I wonder what they're even doing to parse things. In other words, if we switch from og: to absolute uris, practically every OpenGraph using system will likely stop understanding data we output.

GWicke (talkcontribs)

I do not care very much about OpenGraph. RDFa should have not trouble with absolute URIs.

Reply to "OutputPage Microdata API"