Wikidata/2005 proposal

Wikidata is a proposed wiki-like database for various types of content. This project as proposed here requires significant changes to the software (or possibly a completely new software) but has the potential to centrally store and manage data from all Wikimedia projects, and to radically expand the range of content that can be built using wiki principles.

Imagine that you can edit the content of an infobox on Wikipedia (e.g. Germany) with one click, that you get an edit form specific to the infobox you are editing, and that other Wikipedias automatically and immediately use the same content (unless it is specific to your locale).

Imagine that some data in an article can be automatically updated in the background, without any work from you - whether it is the development of a company stock, or the number of lines of code in an open source project.

Imagine that you can easily search wiki-databases on a variety of subjects, without knowing anything about wikis.

This project is separate from the Wikimedia Commons, because a Wikidata database does not necessarily have to be useful for another Wikimedia project, and because it is larger in scope.

Applications
Astronomy - space.wikidata.org (spc.wikidata.org)
 * astronomical objects
 * constellations
 * craters
 * observatories and telescopes
 * surveys
 * space missions

Economy - economy.wikidata.org
 * Products
 * Corporations, companies
 * Currency exchange rates
 * Oil prices & other commodities prices
 * Stock Exchanges indices
 * Interest rates

Events   - time.wikidata.org
 * News
 * People biography
 * Timetables

Society - society.wikidata.org
 * Schools and universities
 * Cities, Countries, Subdivisions
 * Ethnic groups
 * Radio and television stations

Military - military.wikidata.org
 * Battles
 * Army divisions
 * Air Squadrons

Technology - tech.wikidata.org
 * Planes
 * Rockets
 * Ships
 * Weapons

Nature - nature.wikidata.org
 * Plants
 * Animals
 * Species
 * Mountains
 * Rivers
 * Protected areas
 * Weather

Chemistry - chemistry.wikidata.org
 * Elements
 * Rocks and minerals, compounds

Content - works.wikidata.org
 * Books
 * Journal articlse
 * Newspaper articles
 * Movies (IMDB is not open content)
 * Music

Locations - geo.wikidata.org See Wikimaps
 * Cities
 * Countries
 * Regions
 * Geo-located pictures

Science (various) - science.wikidata.org
 * Pharmacology

Stamps, Coins and bank notes
 * Postage stamps
 * Coins
 * Bank notes

Requirements
Wikidata has the following technical requirements to be useful:
 * easy setup of data groups, and of new structures within a group
 * data structure editor
 * tables
 * fields
 * field types (text, number, textarea, localizable enumerations ..)
 * field constraints (optional, unique etc.)
 * relationships between fields (parents, brothers)
 * wiki-style syntax for describing view layouts and edit layouts
 * placement of fields in a form
 * per-field difference engine to show changes to fields in a more precise manner
 * per-field history, recent changes etc.
 * transclusion of content from other Wikimedia projects
 * default link destination, so that, for example, any link in an entry on movies points to Wikipedia
 * easy localization
 * flag certain types of data as international (with possible auto-conversion routines) and not in need of localization
 * single login and Wikimedia Commons functionality should be in operation before this project goes live

Licensing considerations
Share-alike is not very fair when a much larger work includes a very small piece of data. Individual pieces of data are not copyrightable, claiming copyright on the database itself and the structure we create could help to boost such copyright claims by corporations (which in turn could harm Wikimedia), and could be difficult to enforce.

A very simple attribution license or the public domain may be a better option for data-projects.

Graphical mock-ups


For an idea of how we'd do this in Kendra Base see wikidata mockup in kendra base.

Fixed set of tables
We distinguish between wiki-pages and data through the namespace. We can define certain namespaces to be pages, and other namespaces to be data. In the following examples, namespace 0 is for articles, and namespace 402 is for data on countries.

We presume that we have a revisions table that is both used for regular wiki-pages and pieces of data:

revision_id  revision_comment    user_id    page_id 2042         created monkey      52         300 2043         added monkey info   203        300 2044         created country     593        301 ...

A pages table:

page_id   page_name    page_namespace  top_revision 300       Monkey       0               2043 301       Germany      402             2044 302       Poland       402             4893 => an article on Monkeys, two sets of country data

A relations table:

source_page_id  destination_page_id   relation_type 301             302                   2     => Germany is a neighbour of Poland

relation_types: 0=parent, 1=brothers/neighbours, 3=aunt ... whatever is useful

A data-longtext table:

page_id revision_id   name           value ---    300      2042          article_text   A monkey is an animal... A data-shorttext table:

page_id  revision_id  name           value ---    301       2044         country_flag   ]

A data-numbers table: page_id revision_id   name                      value --    301      2044          country_population     80000000 301     2040          country_population     75000000

And so on, for the different types.

Now we can structure our data in arbitrary ways and do smart SELECTs:

SELECT page_id,top_revision FROM pages WHERE page_name='Germany' AND page_namespace=402 => 301, 2044    SELECT data_numbers.value FROM data-numbers WHERE page_id=301 AND revision_id=2044 => 80000000 - the country population

Dynamic table creation
We could create a sophisticated data manager application that allows the creation of tables without much technical know how. It could automatically manage revision storage and revision associations. Advantage: more efficient, constraints at database level. Disadvantage: less flexible, all code has to be aware of which tables exist.

Related projects

 * Kendra Initiative is developing a semantic data publishing/querying system called Kendra Base.
 * Currently input is via 2 methods: wiki-style free text and also more structured forms input.
 * Also reviewed at Kendra_evaluation


 * jot.com seems to be doing something similar according to Jimbo, who has seen beta screenshots