Architecture Repository/Artifacts/Knowledge store

= =

Free knowledge data model based on schema.org

Status: v1 published May 2021‎ to inform the creation of the schema for Wikimedia Enterprise. For the current Wikimedia Enterprise schema, see the data dictionary on enterprise.wikimedia.com.

The purpose of this document is to define a predictable structure for distributing Wikimedia content. To do this, we’ve chosen to use standard types and properties from schema.org. This model is not meant to replace existing data structures within MediaWiki; instead, these structures can act as part of a distribution layer that consumes, structures, and serves knowledge beyond Wikimedia.

Using this model
We encourage Wikimedia projects to make use of this model, either as a whole or as a base to build on. Services currently using this model include Phoenix (structured content proof of value) and Wikimedia Enterprise.

Adding a property
As defined here, the model is restricted to properties that are meaningful outside the context of MediaWiki. To suggest a new property, leave a comment on the talk page. New properties should conform with the applicable schema.org type whenever possible.

Feedback and questions
To share feedback and question, leave a comment on the talk page. Note that there are often several unknowns associated with each type; these unknowns are tracked in the notes and questions subsections.

Patterns
Canonical data modeling

Capabilities
Serve and distribute

Distribute predictably-structured knowledge to products and platforms

Language
a human language Based on schema.org Language

Notes and questions

Project
a wiki in a single language Based on schema.org CreativeWork (not on schema.org Project)

Notes and questions
 * How should we handle inLanguage for multi-lingual projects? (Commons, Wikispecies, Wikidata, etc.)

Page
a wiki page Based on schema.org Article

Notes and questions


 * Consider using display title for  instead of reading-friendly title
 * How should we handle media files associated with a page? Schema.org has audio, video, thumbnailURL, and primaryImageOfPage (MediaObject). Note that using primaryImageOfPage would be from WebPage type.
 * How to handle licenses for images embedded in a page? (Check with legal)
 * Should we include other URLs (mobile, edit, talk, etc.)? Schema.org has discussionUrl but no others.
 * We’ve intentionally not included content at the page level in favor of providing content at the section level.
 * Is it a problem that isPartOf would be inconsistent between objects?
 * Properties to consider:
 * about - Rosette or other set of page subjects (Wikidata items)
 * interactionStatistic seems like the most logical place for pageviews, number of edits, etc. What types of stats should we include? (array of InteractionCounter)
 * mentions - array of Thing, links included within the page
 * abstract: Is there a way we could get the first two sentences of the article?
 * citation (References used on the page)
 * schemaVersion ( https://schema.org/docs/releases.html#v12.0 ) seems like a good idea, but I’m struggling to see the value. These releases seem to come out every few months.
 * page quality score (aggregateRating?)
 * copyrightHolder -  “The text of Wikipedia is copyrighted (automatically, under the Berne Convention) by Wikipedia editors and contributors and is formally licensed to the public under one or several liberal licenses.”[1] (Covered by license?)
 * dateCreated (page’s initial publication date)
 * creativeWorkStatus
 * creditText (attribution text)

Section
content grouped under a heading or as an introduction before the first heading on a page Based on schema.org CreativeWork

Notes and questions
 * Properties to consider:
 * - Rosette or other set of page subjects (Wikidata items)
 * - Rosette or other set of page subjects (Wikidata items)

License
content license Based on schema.org CreativeWork

Notes and questions

Entity
a subject of a page Based on schema.org Thing

Notes and questions
 * Connection with Wikidata