Files and licenses concept

This page is a collection of technical information regarding storing certain file properties separately from wikitext. Currently the aim is to store only basic copyright information (author and license), but this could later be extended.

Current situation / Introduction
Every page has an entry in the  table, primarily uniquely identified by. Every file has an entry in the  table, identified by.

Every revision of every page has an entry in the  table. A revision is made when the page is created, edited, renamed, or protected.

Every text version of every page has an entry in the  table. If a revision didn't change the text, it keeps referring to the same  row.

Licenses allowed to be chosen during upload are defined at MediaWiki:Licenses. That page is kept as a list –  the last piped part describes the license, everything before that is wrapped between 🇦🇩 on the File-page under a.

Information about the file is stored in the Information-template.

Viewing a file:
 * Page namespace/title is looked up in, and.
 * Information from other tables is retrieved by the page ID

Purpose and use cases
It should be easy to obtain the copyright information for a file (bug 25624). Use cases include: Why this is such a good idea is summarized by the following:
 * A data consumer (like the pdf creation tool, or a mobile app) wants to transform a page into a new format, and get all the information (esp. author(s) and license(s)) that is necessary to do proper attribution in the new format.
 * Someone importing CC BY text from another source wants to ensure that the authors, source, and license of the original work are stored in a consistent way, so that the license can be complied with later in a standardized way.
 * Getting information from the API for re-use (ie. WordPress plugin to search images and attribute names automatically in the article)
 * Standardizing file-pages centrally (either by core or by mediawiki-message but not per-page with templates)
 * Perhaps for search engines to index files properly by using xmlns:cc-attributes or tags that currently are based on the general license for the wiki text instead of the file.
 * Special File-search with ability to filter by licenses (just like ns0=&ns1&, lic1&lic2; Individual wikis could edit messages (1/2) and put links to search through certain sets of of licenses)
 * Automatically attributing authors (in case of CC-BY-*) in articles and perhaps mention the license (in case of CC-SA-*)
 * Name author in search (See Flickr)

The following (sub)requirements can be identified:
 * 1) Author and licensing information should be stored outside the wikitext, since parsing wikitext is often difficult for third-party data consumers (like the pdf creator)
 * 2) A list of files by copyright holder should be obtainable
 * 3) Copyright information should be provided by the user at upload
 * 4) Copyright information should be editable
 * 5) Copyright information should be versioned
 * 6) Administrators should be able to define a set of canonical licenses
 * 7) For each license basic information should be defined (name, full title, url to legal code/deed)
 * 8) The display of a license should be localizable

Proposed situation
Every page has an entry in the  table, primarily uniquely identified by.

Every file has an entry in the  table, identified by. Special properties about this file are stored in.

Every revision of every page has an entry in the  table. A revision is made when the page is created, edited, renamed, protected or when file props changed. Every revision contains a reference to a file properties version, which may be NULL.

Every text version of every page has an entry in the  table. If a revision didn't change the text, it keeps referring to the same  row.

For every file-properties version of a file there are one or more entries in the  table. If a revision didn't change the properties, it keeps referring to the same  row.

Licenses valid on the wiki are defined in the  table, which is managed from Special:LicenseManager. Since there could potentially be many licenses, the ones choosable from the upload form are fetched from that table. The would contain an  for "Most used licensed" (top 5 or 10, order by lic_count) and an  for all licenses ordered by alphabet.

Links to information about the file (such as author and license) are stored in the -table and displayed on the File-page through a centrally determined layout. Description, source, date, location and additional wikitext (like User-templates and categories) are stored in Wikitext, only author and license are stored separately.

Table structure
Licenses have their own table, (say ). With columns like: lic_id		PRI UNIQ AI, lic_name	VARBINARY 255 lic_url		VARBINARY 255 lic_count	INT When used on the File-page, the following parameters are passed: Example: # Database-entry lic_name	TASL lic_url	http://tasl.org/licensedeed.html
 * The text of the licenses are stored in [[ MediaWiki:License-NAME-text] ] which contains wikitext (where NAME is ).
 * $1: author
 * $2: attribution (, if NULL same as author)
 * $3: title
 * The title of the licenses are stored in [ [MediaWiki:License-ABBREV-title] ] which is plain-text.

# Message MediaWiki:License-TASL-text This file by $1 is licensed under $3. Please attribute the author as: $2 MediaWiki:License-TASL-title The Awesome Something License Example: # Database-entry lic_name	CC-BY-SA-3.0 lic_url	http://creativecommons.org/licenses/by-sa/3.0/legalcode

# Message MediaWiki:License-CC-BY-SA-3.0-text MediaWiki:License-CC-BY-SA-3.0-title Creative Commons Attribution Share-Alike 3.0 License MediaWiki:License-CC-BY-SA-3.0-url/nl http://creativecommons.org/licenses/by-sa/3.0/deed.en //reason for these messages being seperated from database and in messages is to allow easier translation, example for Dutch: MediaWiki:License-CC-BY-SA-3.0-title/nl Creative Commons Naamsvermelding Gelijk-Delen 3.0 licentie MediaWiki:License-CC-BY-SA-3.0-url/nl http://creativecommons.org/licenses/by-sa/3.0/deed.nl

License management



 * Lists all licenses (may be viewed by anyone ([*]). Editing is done on pages like [ [Special:LicenseManager/12] ] (by id, like AbuseFilter)
 * The actual texts are stored in MediaWiki:-messages, so they could contain a template to allow editing by non-sysops. Editing is limited to users with the  right.
 * Changes, creations and removals of licenses are publicly logged at Special:Log/licensemanager.
 * Removal only possible if not in use. In the event a file previously using the license would be reverted to a state where it uses this one again, it would display  and categorize internally into a category like Category:Files with previously deleted licenses

Drop-down menu
During upload a license must be chosen from the drop-down menu. The drop-down menu is populated by the license table. The would contain an  for "Most used licensed" (top 5 or 10, order by lic_count) and an  for all licenses ordered by alphabet. Licenses that are marked as deleted are not shown and can't be used.

Table structure
Meta data about the file itself is still kept in the  table. Page content information is still kept in  and.

Information about the file as a work is kept as a property either in the new.

The  is similar to the   table in that it is kept per revision and only updated when needed.

A reference to the current file props is kept in the appropriate  rows, just like it keeps a reference to. When either doesn't change the reference is kept and so duplicate sets will be made in mw_file_props. mw_file_props contains: fp_id		INT (mw_revision.rev_fileprops_id is a key to this column) fp_key		VARBINARY(255) fp_value_int	INT fp_value_text	VARBINARY(255)

EXAMPLE

This file has three authors: Krinkle, Catrope (attributed as Roan) and John Doe (not a wiki user). And is dual licensed.

Management
An example of what the Wikitext of a File-page could/would look like:

The following elements: jQuery datepicker features a way to prevent other characters from being entered also do serverside check that this is a valid timestamp) .. are kept outside of wikitext in their own respective fields (fetched from and saved to ). These seperate input fields could be editable in two ways: When saving properties a revision is saved (like when moving or protecting the page) with the same   but with a new reference in   to the added row in  . Like wise when saving altered wikitext a revision is saved with the same   but with a new reference in   to the added row in.
 * Author (small textarea)
 * Date (date picker, eventually should contain 14-int timestamp
 * License (dropdown + button to add/remove license (multiple licenses are allowed)
 * Attribution (small textarea)
 * Either on [ [Special:FileProperties/File:Example.jpg] ]
 * Or in additional form elements above or below the textbox on the action=edit page

Idea: A user setting in the preferences decides whether the user sees his own language's description (if available - like {&#123;LangSwitch }} ) or all descriptions.

Display
When viewing a file-page the page is built like: [ [MediaWiki:Fileinformation-template] ] $1: description (tag-hook), $2: source (tag-hook), $3: author (file props), $4: license-titles (file props), $5: additional wikitext (all other page text (such as problem tags, user templates etc.) that wasn't filtered (for example, category and interwiki links are filtered from output)
 * Automated page generated like this.
 * Layout is fixed (perhaps allow editing the layout in a MediaWiki-message, or dont and instead provide sufficient CSS-hooks).
 * Automated page generated like this.
 * Layout is fixed (perhaps allow editing the layout in a MediaWiki-message, or dont and instead provide sufficient CSS-hooks).
 * Automated page generated like this.
 * Layout is fixed (perhaps allow editing the layout in a MediaWiki-message, or dont and instead provide sufficient CSS-hooks).
 * Layout is fixed (perhaps allow editing the layout in a MediaWiki-message, or dont and instead provide sufficient CSS-hooks).

__EMBEDMETADATA__ $1 ...


 * $3


 * $4


 * $2

...

$5

imageinfo
'prop=imageinfo' result format needs to be extended to include this metadata.
 * format of results?
 * iiprops key for including/excluding it?

This is a requirement for getting the metadata to pass cleanly to InstantCommons (ForeignAPIRepo) clients.

upload
'action=upload' needs to be able to pass metadata with a new upload, just as a user uploading directly to the web site needs to be able to add license info on the web UI.


 * parameter names?
 * parameter format?
 * what's the best way to pass sets of arbitrary data like this to an api thingy? array should work?

In order for an API client upload tool to present available license options, it also needs to be able to query:

Query license options
In order to add license metadata to a new or modified upload, an upload client will need to be able to query the available settings. This same interface could/should probably also be used in things like the UploadWizard that supplement Special:Upload with more client-side ajaxy stuff.


 * module name?
 * parameters?
 * ''result format?

Editing
In addition to initial uploads, license metadata on an existing file may need to be altered. Changed license metadata needs to be passable either to the existing page editing method, or a dedicated one for file metadata.


 * existing or new module? module name?
 * parameter format?
 * what's the best way to pass sets of arbitrary data like this to an api thingy? array should work?

Export/import and dumps
If license metadata lives outside the page text, it may also need to be added to the Special:Export & data dump format.


 * data structure
 * add to special:export
 * make sure dump tools handle it without exploding

Hm...
Some kind of detection is required to fallback to the old way (ie. don't render the new file-page layout, but as a normal wiki page). The easiest way to detect if a page has been converted from the old wki text (eg. Information) to the new system is to check if a filepage has NULL in  (ie. has no entry in  ), then it is an old-style file. In that case, we don't generate the File-page layout, but just parse the wikitext the good ol' way and display it on the page.
 * Transition
 * Sounds good afaik. Krinkle 17:00, 22 January 2011 (UTC)