User:Tpt/WikibaseMediaInfo RDF Dump Format

Jump to navigation Jump to search

This is an experiment on how to represent MediaInfo Wikibase entities in RDF.

It is an extension of the Wikibase RDF model. It proposes to reuse vocabulary as much as possible. is already intensively used in Wikibase RDF representation and provides an important set of properties and type for media contents. This proposal also aims at being consistent with the Wikibase Lexeme RDF model.

Basic representation[edit]

This section proposes a "basic" representation of the MediaInfo entities, aiming at providing a full mapping of the entity data but without information derived from other sources (MediaWiki file metadata...).


wd:M222222 a wikibase:MediaInfo , schema:MediaObject ;
     # caption
     schema:caption "a boat"@en ;
     rdfs:label "a boat"@en ;

     # statements
     wdt:P2 wd:Q3 ;
     wdt:P7 "value1" , "value2" ;
     p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ;
     p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 ,
          wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c .


The media info concept of Wikibase aligns well with schema:MediaObject. Having a class wikibase:MediaInfo would be convenient for consistency with the other entity types wikibase:Item, wikibase:Lexeme... It would be meaningful to have wikibase:MediaInfo rdf:subClassOf schema:MediaObjet in the ontology definition.
The closest relation is schema:caption that has the advantage of having the same name as the Wikibase feature and being specific to media content. It would allow to write SPARQL queries looking for media file based on their caption without bothering of filtering out e.g. Wikidata items. It is also interesting to add rdfs:label to the RDF output (but probably not the the query service) for interoperability, similarly to what have been done for lexemes.
For consistency and simplicity we could use the same schema as the other entity types.

Extended representation[edit]

This section proposes to extend the basic representation with other metadata already stored in the MediaWiki database to enable more SPARQL queries. Some of the properties proposed here only apply to some file types and should not appear on the other files.

Example (all properties are displayed here even if some would never appear together like schema:numberOfPages and schema:duration):

wd:M222222 a wikibase:MediaInfo , schema:MediaObject , schema:VideoObject ;
     # basic file metadata
     schema:contentUrl <> ; # URL to the file itself
     schema:encodingFormat "video/webm" ; # File mime type
     schema:contentSize 123445 ; # File size in bytes
     schema:height 1024 ; # Image/video height in px
     schema:width 2048 ; # Image/video width in px
     schema:duration "PT123S"^^xsd:duration ; # Video duration
     schema:numberOfPages 12 ; # Number of pages in a multi-pages document

     # caption
     schema:caption "a boat sailing"@en ;
     rdfs:label "a boat sailing"@en ;

     # statements
     wdt:P2 wd:Q3 ;
     wdt:P7 "value1" , "value2" ;
     p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ;
     p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 ,
          wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c .

<> a schema:Article ;
     schema:about wd:M222222 ;
     schema:isPartOf <> .
In addition to the schema:MediaObject and wikibase:MediaObject classes we could add the classes schema:AudioObject, schema:ImageObject, schema:VideoObject to allow easy querying of only images, audios or video. These classes would be assigned based on the mediaWiki media type returned by File::getMediaType().
would provide the direct canonical of the file itself. Could be provided by File::getFullUrl().
would provide the MIME type of the file to be able to only query files of a given mime type, do statistics based on it... Could be provided by File::getMimeType().
would provide the size of the file in bytes. Would be interesting to allow statistics on the file size joined with data stored in statements (e.g. size of all the uploads from a given partnership...). Could be provided by File::getSize().
schema:height and schema:width
would provide the height and width of the file if it is an image or a video. Could be provided by File::getHeight() and File::getWidth().
would provide the duration of a video. Could be provided by File::getLength(). We need to choose if we use the xsd:duration datatype as suggested by or just use an integer containing the number of second.
would provide the number of pages of a multi-pages file. Could be provided by File::pageCount(). It is a slight abuse to use this property here, in it is supposed to be used on schema:Book.