Topic on Extension talk:WikibaseMediaInfo/RDF mapping

Smalyshev (WMF) (talkcontribs)

Looks good for me, one thing that seems to be missing in the link to the file URL - either wiki page or actual media file (or maybe both?).

Smalyshev (WMF) (talkcontribs)

Oops, missed extended part - that looks good, I just wonder if we want the wiki page too - it probably can be derived from content URL (or not?) but may be useful to have it explicitly maybe?

Tpt (talkcontribs)

Thank you for having taken a look at it. Indeed having the wikipage would be great. What about using the same structure as item sitelinks?

<https://commons.wikimedia.org/wiki/File:Wikidata_time-latitude_visualization_-_2016-10-24.png> a schema:Article ;
     schema:about wd:M22222 ;
     schema:isPartOf <https://commons.wikimedia.org/> ;
     schema:name "Wikidata time-latitude visualization - 2016-10-24.png"@und .

I also missed some important points:

  1. We need an URI prefix for media info entities URIs. We should probably use the namespace http://commons.wikimedia.org/entity/ but we need a name for the prefix.
  2. It would be nice to have a relation between the MediaInfo entity and the file name. schema:name seems to be the right schema.org property for that but it's already used for item labels that are more similar to MediaInfo captions.
  3. In my proposal, neither the entity URI, the wikipage URI or the URI of the file itself is the one used for the commons media datatype, making very hard to e.g. do a query on both MediaInfo and the Wikidata "image" property. We could change the URI of the file in my proposal to be the one used by the commons media datatype but it is not very nice because it is not the actual final file URL (the URI the commons media uses redirects to it). I believe it would be better to introduced a normalized value for the commons media datatype that would give the URI of the MediaInfo entity. Generating it would require a SQL query to the images metadata table. Tpt (talk) 12:33, 8 March 2019 (UTC)
Smalyshev (WMF) (talkcontribs)

Not sure we need schema:name - does it add anything really? It's basically repeating the URL.

  1. Note that /entity/ URL requires some redirect setup too - look how it works on Wikidata. But good point that Commons entities live on Commons, so they can't use WIkidata URLs.
  2. I think schema:about already links page to URL, which essentially is the file name, so not sure it's necessary. Depends on use cases - as string manipulation slows things down, it ultimately depends on queries we're going to need. Also note that filenames, unlike most article names, tend to be long, so duplicating them would have non-negligible performance costs (we have 50M of them!).
  3. That's an excellent point, we probably want to harmonize this one way or another. Ideally, sitelink URL and commons media URL should be the same.
Tpt (talkcontribs)

Thank you for your feedback!

Indeed schema:name does not add any extra information. But I believe it is nice to have it on the sitelink description or the file description because it is highly likely that externals tools using Commons content rely on it. What we could do, is to not implement it in the first version and see if there are requests for it. What do you think about it?

  1. Indeed. Thanks!
  2. Ok. At least there is no point to have it both in the sitelink node and in the MediaInfo node. Do you have a preference for the prefix to update the example?
  3. commons media URL currently points to a special page that redirects to the file itself (the target of schema:contentUrl in my proposal). So it would be a big breaking change. If you plan to do that I would tend to prefer to point to the MediaInfo entity because it's the root of the structured data description of the file and so, enables more interesting queries without needing an extra triple pattern in the query. But, with the current way data is stored, it would make the RDF dump generation more costly so it's maybe not a good idea.
  4. An other not important point. What do you think the datatype of the value of schema:duration should be? xsd:integer or xsd:duration?
Reply to "File URL?"