Wikidata/2005 proposal

This proposal is outdated. For recent developments, see Kennisnet/Structured data, Wikidata/Timetable, Ultimate Wiktionary data design and Wikidata/Notes. The OmegaWiki project will be the first to use Wikidata.

Wikidata is a proposed wiki-like database for various types of content. This project as proposed here requires significant changes to the software (or possibly completely new software) but has the potential to centrally store and manage data from all Wikimedia projects, and to radically expand the range of content that can be built using wiki principles.

Imagine that you can edit the content of an infobox on Wikipedia (e.g. Germany) with one click, that you get an edit form specific to the infobox you are editing, and that other Wikipedias automatically and immediately use the same content (unless it is specific to your locale).

Imagine that some data in an article can be automatically updated in the background, without any work from you - whether it is the development of a company stock, or the number of lines of code in an open source project.

Imagine that you can easily search wiki-databases on a variety of subjects, without knowing anything about wikis.

This project is separate from the Wikimedia Commons, because a Wikidata database does not necessarily have to be useful for another Wikimedia project, and because it is larger in scope.

Applications
Astronomy - space.wikidata.org (spc.wikidata.org)
 * astronomical objects
 * constellations
 * craters
 * observatories and telescopes
 * surveys
 * space missions

Economy - economy.wikidata.org
 * Products
 * Corporations, companies
 * Governments and local administrative bodies, complete with analysis of statistical/aggregate parameters
 * Macroeconomic data (current and historical), indexes
 * Currency exchange rates
 * Oil prices & other commodities prices
 * Stock Exchanges indices
 * Interest rates

Events   - time.wikidata.org
 * News
 * People biography
 * Timetables

Languages - language.wikidata.org
 * Translations tables (multilingual)
 * Dictionaries (multilingual)

Society - society.wikidata.org
 * Schools and universities
 * Cities, Countries, Subdivisions
 * Ethnic groups
 * Radio and television stations

Military - military.wikidata.org
 * Battles
 * Army divisions
 * Air Squadrons

Technology - tech.wikidata.org
 * Planes
 * Rockets
 * Ships
 * Weapons
 * Computer hardware

Nature - nature.wikidata.org
 * Plants
 * Animals
 * Species
 * Mountains
 * Rivers
 * Protected areas
 * Weather

Chemistry - chemistry.wikidata.org
 * Elements
 * Rocks and minerals, compounds

Content - works.wikidata.org
 * Books
 * Journal articles
 * Newspaper articles
 * Movies (IMDB is not open content)
 * Music

Locations - geo.wikidata.org See Wikimaps
 * Cities
 * Countries
 * Regions
 * Geo-located pictures

Physics - physics.wikidata.org
 * Physical constants
 * Physics equations
 * Tables of wavefunctions etc.

Science (various) - science.wikidata.org
 * Pharmacology

Stamps, Coins and bank notes
 * Postage stamps
 * Coins
 * Bank notes

Calendar - calendar.wikidata.org The project should allow for translations - this would then help to include all calender events of the different wikipedias and an easier update of the calendars by just needing to translate present events and only new events will then be added. In this way much time will be saved. For translation: OmegaT is a great instrument.
 * Events
 * Births
 * Deaths
 * Holidays and observances
 * National
 * Religious
 * Laic

Requirements
Wikidata has the following technical requirements to be useful:
 * easy setup of data groups, and of new structures within a group
 * data structure editor
 * tables
 * fields
 * field types (text, number, textarea, localizable enumerations ..)
 * field constraints (required, unique etc.)
 * relationships between fields (parents, brothers)
 * edit mechanisms
 * modify more than one cell at once (e.g. search/replace)
 * export of data in suitable formats (html, xml, csv)
 * import from suitable formats
 * search mechanisms
 * limit the table to the interesting subset
 * use nested and/or/not requests
 * take use of field types (date ranges, number ranges)
 * sort mechanisms
 * by one or more fields, up or down
 * take use of field types (numbers, user defined sort orders)
 * wiki-style syntax for describing view layouts and edit layouts
 * placement of fields in a form
 * per-field difference engine to show changes to fields in a more precise manner
 * per-field history, recent changes etc.
 * transclusion of content from other Wikimedia projects
 * default link destination, so that, for example, any link in an entry on movies points to Wikipedia
 * easy localization
 * flag certain types of data as international (with possible auto-conversion routines) and not in need of localization
 * single login and Wikimedia Commons functionality should be in operation before this project goes live

Licensing considerations
Share-alike is not very fair when a much larger work includes a very small piece of data. Individual pieces of data are not copyrightable, claiming copyright on the database itself and the structure we create could help to boost such copyright claims by corporations (which in turn could harm Wikimedia), and could be difficult to enforce.

A very simple attribution license or the public domain may be a better option for data-projects.

Graphical mock-ups
This mock-up illustrates form-based editing. Note that we need easy ways to enter relations - in this illustration, the movie-actor relation must be parsed by the backend after saving. Autolinking means that on viewing, we get a link both to Wikipedia and to Wikidata itself for the autolinked word (e.g. a link to Wikipedia about the United States, and a link to Wikidata showing movies made in the United States).

For an idea of how we'd do this in Kendra Base see wikidata mockup in kendra base.

Fixed set of tables
We distinguish between wiki-pages and data through the namespace. We can define certain namespaces to be pages, and other namespaces to be data. In the following examples, namespace 0 is for articles, and namespace 402 is for data on countries.

We presume that we have a revisions table that is both used for regular wiki-pages and pieces of data:

revision_id  revision_comment    user_id    page_id 2042         created monkey      52         300 2043         added monkey info   203        300 2044         created country     593        301 ...

A pages table:

page_id   page_name    page_namespace  top_revision 300       Monkey       0               2043 301       Germany      402             2044 302       Poland       402             4893 => an article on Monkeys, two sets of country data

A relations table:

source_page_id  destination_page_id   relation_type 301             302                   2     => Germany is a neighbour of Poland

relation_types: 0=parent, 1=brothers/neighbours, 3=aunt ... whatever is useful

A data-longtext table:

page_id revision_id   name           value ---    300      2042          article_text   A monkey is an animal... A data-shorttext table:

page_id  revision_id  name           value ---    301       2044         country_flag

A data-numbers table: page_id revision_id   name                      value --    301      2044          country_population     80000000 301     2040          country_population     75000000

And so on, for the different types.

Now we can structure our data in arbitrary ways and do smart SELECTs:

SELECT page_id,top_revision FROM pages WHERE page_name='Germany' AND page_namespace=402 => 301, 2044    SELECT data_numbers.value FROM data-numbers WHERE page_id=301 AND revision_id=2044 => 80000000 - the country population

Dynamic table creation
We could create a sophisticated data manager application that allows the creation of tables without much technical know how. It could automatically manage revision storage and revision associations. Advantage: more efficient, constraints at database level. Disadvantage: less flexible, all code has to be aware of which tables exist.

m:Category:Proposed projects

Related projects

 * The Semantic MediaWiki extension to MediaWiki extends wiki link syntax to represent two kinds of properties of articles: relations between articles and attribute values of articles. It supports inline query of these properties and export of them as RDF.
 * Kendra Initiative is developing a semantic data publishing/querying system called Kendra Base.
 * Currently input is via 2 methods: wiki-style free text and also more structured forms input.
 * Also reviewed at Kendra evaluation
 * TWiki is a wiki which features form-based input as well as metadata which adds a structure to the entered data.
 * jot.com seems to be doing something similar according to Jimbo, who has seen beta screenshots
 * w:Wikipedia:Proposal for intuitive table editor and namespace
 * I have yet to see the software to do this but I always thought that wikipedia would be an excellent project for a oodb. Instead of only allowing certain data have a generic object article. Then classify each article into a person or place. these would be subclasses of the article object and would have fixed fields (begin believed birthdate range, end believed birthdate range, bio, believed birthpacelocation lat/lon, etc....) This info could be persisted across all articles and make different languages simply different chunks of text on an object with a unique id. The reference potential would be drastically modified as you could definitively refer to a person or place regardless of language. It would also allow povs to be addedd as they could just be another block of text associated with the unique id(I like to call these lenses). Finally you could have the object inherit from actor with even more specified fields, or with multiple inheritance have the object inherit from actor and director. I've looked at oodb software and have found that the commercial ones allow multiple inheritance, though I would assume the performance would be terrible.