User:Tpt/RFC

I have the project to implement in Extension:ProofreadPage a metadata storage system based on an sql table in order to provide an api and metadata in the header of each page of the Index, Page and Main namespaces. But there are a lot of technical choices to do before beginning an implementation.

Abstract
Wikisource need to have a powerful metadata system. The ProofreadPages extension do so with the index namespace.

Storage
The metadata will be store in a json array for each pages in the main or index namespace. In the main namespace, the array contain only data that are set in this page, the others are linked with the pf_index_id filed in order to doesn't have to refresh all the pages when the index is updated. If there is multi-index translusion, the index is the first with header=1.

This array of data is like :

List of properties
= Old =

Way of getting metadata ?
We can create a new parser function like. This can be added directly in MediaWiki:Proofreadpage index template and MediaWiki:Proofreadpage header template. New datas are stored by the parser as the Extension:Geodata does. We can imagine also that show the metadata.
 * Advantages : Compatibility, flexibility...
 * Disadvantages : It's not kiss.

We can also Rewrite index: form in order to store metadata directly. I don't know how to implement it. Maybe an editmetadata api ? I don't think it's a good idea because we will have to be fully compatible with the existing system. We can also rewrite later the form maker in order to improve the edit system.

Which ID for metadatas ?
We can provide a base set of metadata (lang, title, publisher, pubdate...) and allow to add custom ones with a configuration system. We must provide a common base set in order to let tool use it for all Wikisource. We can also imagine to internationalize the name of ids in the parser function.

Which format for data ?
There are two way :
 * Be very flexible in the input, let the user set the field with what he want. The advantage will be that it's fully compatible with existing system. But it's not fun for tools that have to check the type of the input before using them.
 * Be strict. By example, the language must be the ISO language code like en or fr, the author must be a link to the author page in Wikisource... In this case the data entry may be multiple in order to allow to set many authors... The transition will be hard but I think it's the moment. With this system we can imagine a good interconnection with wikidata. The bots can help us for basic cases for the transition.

Is there a communication of metadata from Index namespace to main namespace ?
I think we mustn't change the way of the communication between index and main namespaces and the header template may be used in order to set metadata for pages that use transclusion.

Interaction with other metadata format
We can add Dublin Core tags in header of html pages, it's a very used, flexible and xhtml-compatible format. We can imagine an output of the api in RDF using the schema.rdfs.org system. The system must be interconnected to Wikidata in order, by exemple, link author property to the author resource in Wikidata.

Draft of sql structure
The problem of this system is that the values aren't typed.