User:Tpt/WikibaseMediaInfo RDF Dump Format
This is an experiment on how to represent MediaInfo Wikibase entities in RDF.
It is an extension of the Wikibase RDF model. It proposes to reuse schema.org vocabulary as much as possible. Schema.org is already intensively used in Wikibase RDF representation and provides an important set of properties and type for media contents. This proposal also aims at being consistent with the Wikibase Lexeme RDF model.
This section proposes a "basic" representation of the MediaInfo entities, aiming at providing a full mapping of the entity data but without information derived from other sources (MediaWiki file metadata...).
wd:M222222 a wikibase:MediaInfo , schema:MediaObject ; # caption schema:caption "a boat"@en ; rdfs:label "a boat"@en ; # statements wdt:P2 wd:Q3 ; wdt:P7 "value1" , "value2" ; p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ; p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 , wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c .
- The media info concept of Wikibase aligns well with
schema:MediaObject. Having a class
wikibase:MediaInfowould be convenient for consistency with the other entity types
wikibase:Lexeme... It would be meaningful to have
wikibase:MediaInfo rdf:subClassOf schema:MediaObjetin the ontology definition.
- The closest schema.org relation is
schema:captionthat has the advantage of having the same name as the Wikibase feature and being specific to media content. It would allow to write SPARQL queries looking for media file based on their caption without bothering of filtering out e.g. Wikidata items. It is also interesting to add
rdfs:labelto the RDF output (but probably not the the query service) for interoperability, similarly to what have been done for lexemes.
- For consistency and simplicity we could use the same schema as the other entity types.
This section proposes to extend the basic representation with other metadata already stored in the MediaWiki database to enable more SPARQL queries. Some of the properties proposed here only apply to some file types and should not appear on the other files.
Example (all properties are displayed here even if some would never appear together like
wd:M222222 a wikibase:MediaInfo , schema:MediaObject , schema:VideoObject ; # basic file metadata schema:contentUrl <https://upload.wikimedia.org/wikipedia/commons/f/f7/Boat_movie.webm> ; # URL to the file itself schema:encodingFormat "video/webm" ; # File mime type schema:contentSize 123445 ; # File size in bytes schema:height 1024 ; # Image/video height in px schema:width 2048 ; # Image/video width in px schema:duration "PT123S"^^xsd:duration ; # Video duration schema:numberOfPages 12 ; # Number of pages in a multi-pages document # caption schema:caption "a boat sailing"@en ; rdfs:label "a boat sailing"@en ; # statements wdt:P2 wd:Q3 ; wdt:P7 "value1" , "value2" ; p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ; p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 , wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c . <https://commons.wikimedia.org/wiki/File:Boat_movie.webm> a schema:Article ; schema:about wd:M222222 ; schema:isPartOf <https://commons.wikimedia.org/> .
- In addition to the
wikibase:MediaObjectclasses we could add the classes
schema:VideoObjectto allow easy querying of only images, audios or video. These classes would be assigned based on the mediaWiki media type returned by
- would provide the direct canonical of the file itself. Could be provided by
- would provide the MIME type of the file to be able to only query files of a given mime type, do statistics based on it... Could be provided by
- would provide the size of the file in bytes. Would be interesting to allow statistics on the file size joined with data stored in statements (e.g. size of all the uploads from a given partnership...). Could be provided by
- would provide the height and width of the file if it is an image or a video. Could be provided by
- would provide the duration of a video. Could be provided by
File::getLength(). We need to choose if we use the
xsd:durationdatatype as suggested by schema.org or just use an integer containing the number of second.
- would provide the number of pages of a multi-pages file. Could be provided by
File::pageCount(). It is a slight abuse to use this property here, in schema.org it is supposed to be used on