User:Tpt/RFC

The goal of this proposal is to find:
 * 1) a good way to dispatch book metadata between Wikidata, Commons and Wikisource in order to avoid as much as possible duplication.
 * 2) how we should implement the book metadata storage on Wikisource.

Wikisource
Currently the books metadata are stored in Wikisource in a really unstructured way: metadata of books without scans are stored in template like the Header template of the English Wikisource and metadata of books with scans are often duplicated between the Index: page and an header template in the main namespace, despite the existance of the "header" parameter of the tag.

Wikidata
see Wikidata books task force

Commons
Currently scans metadata are stored in a stored in the Book template but, in the future, Commons will probably adopt a Wikidata like metadata namespace that will allow machine to easily read file data.

Summary
So, work and manifestation metadata are currently duplicated, for a book with a scan, between 2 or 4 times, on the File: page on Commons, on the Index: page of Wikisource, on the main pages of Wikisource and on wikidata.

Assumptions
In this following proposal we assume that:
 * metadata duplication is bad and should be avoided as much as possible.
 * we should make the distinction between the work, manifestation/expressions entities defined by FRBR. The distinction between work and Manifestation/expression is needed because we have often more than one edition of the same work in the Wikisource(s).
 * we should make the distinction between a book scan and its transcription and the works as presented in the Wikisource main namespace because there may be more than one work for only one scan as in collections or more than one scan for one work as in books split in many volumes.

Proposal
Work item (Wikidata) |                  Manifestation item (Wikidata) |                 |             Scan data (Commons)          | |                 |  Transcription data (Index:, Wikisource)   | |                 |                      Book: page (Wikisource)

BookManager
During the Google Summer of Code 2013 Molly White, helped with .... has began to build a new MediaWiki extension in order to add the notion of book to MediaWiki. ... (link to the presentation...).

I (Tpt) believe that the new Book: pages should be used alongside Index: pages because there is not a 1-1 relationship between the two kinds of pages: the Index: pages areabout a printed book and its transcription and the Book: pages are about a version of a bookas an abstract entity. So, there may be more than one Book: pages for only one Index: page as in recueils (TODO)or more than one Index: page for only one Book: page as in books splitted in many volumes.

But I (Tpt) believe that the metadata format choosen have some strong disavantages:# it isn't really extensible and configurable, and so, is again the change hability that isat the core of how Wikimedia project are build# it is not compatible with the Wikidata datamodel, that will make integration with Wikidataand probably Commons very difficult.

Requirements
So, I believe that we should find a good metadata system that must respect these points:# will be usable by both BookManager and ProofreadPage extension in order to have only one metadatasystem in Wikisource and to avoid dispersion of the modest work resources.# allows data sharing between Wikisources and with Wikidata and Commons.# allows to have an as smooth as possible transition from the old metadata system.

The metadata system and its implementation is called MetadataBase in the rest of the document.

Wikibase
A fist solution is maybe to use the Wikibase extension that power Wikidata with some smallmodifications in order to support BookManager or ProofreadPage specific features like the book structure.It would allows us to have a very powerful metadata system, with a very easy integration withWikdidata.The inconvenients are that it wouldn't allow us to have...

Custom metadata system
The second solution is to implement a subset of the Wikibase data model. The idea is thatan "entry" (better name?) ie an Index: or a Book: page is composed of a list of claims(as Wikibase entities) that contains only a main snack ie a (property, value) tuple.The values are stored, like Wikibase values, in the format provided by the DataValue library(see the data types section for a beggining of list of datatypes ).

The storage and the API output formats will be compatible with the Wikibase ones in order toallow a possible migration in the futur to first solution and an as much as possible codesharing.

File data (Commons)  bibliographical data (Wikidata) |               |       |     Index page (Wikisource)     | |           |             Book: page (Wikisource) Unlike Wikibase, there won't be a single view and edit system implemented in JavaScript but a PHP-based editing system that would be an improved version of the currentProofreadPage and BookManager editing systems and a view managed by a template (or, maybe, bya Scribunto module) as it's done for Index: pages. With that system we doesn't break thecurrent site structure and users workflow.

Here is a formalisation in Backus-Naur form: := * := "index" | "book" :=   :=  :=

Book: page
With in the index field the name of the main related index.

Data types

 * string (without wikitext, as Wikibase)
 * monolingual strings (without wikitext, as Wikibase
 * wiki links inside of the Wikisource (really needed?)
 * wikibase item* date (as Wikibase)
 * number (number without unit of Wikibase)
 * wikitext