Topic on Talk:Wikibase/Indexing/RDF Dump Format

Triple representation of labels

3
Pfps (talkcontribs)

The RDF dumps have

Entity labels - the main name of the entity. Labels are defined as schema:name, rdfs:label and skos:prefLabel predicates with objects being language-tagged string literals.

Why say the same thing thrice, particularly as there are lots of labels for many items?

JanZerebecki (talkcontribs)

The idea was to be compatible with all 3 ontologies, so that things work if you support one of them.

Pfps (talkcontribs)

The problem is that this triple representation adds a *lot* of redundant information. Similarly for the multiple representation of values.

This would not be a problem if it was easy to ignore the "other" versions of the information one wants, but this dump has everything in one very large file. As the file is in Turtle format simple textual methods cannot be trusted to find and remove the pieces that are not neeeded.

Reply to "Triple representation of labels"