Jump to content

Extension:JsonConfig/Tabular

From mediawiki.org

Tabular content is a machine-readable data similar to CSV and TSV formats. It allows any user to create a page, e.g. "Data:List of interesting facts.tab" (demo[dead link]), and keep it as a table, rather than wiki text. Tabular storage allows strings, numbers, booleans (true/false), and "localized strings" – strings that have different value depending on the language. Eventually, it would be good to also implement Q number support, allowing direct links to Wikidata.

Additionally, tabular data can store metadata, such as localized description and data source. More metadata can be added as needed.

Tabular storage greatly simplifies storing data for lists and tables. On-wiki tables and lists can be created by using simple Lua scripts. This storage is fundamentally different from Wikidata, because it works with "blobs" (batches) of data, whereas Wikidata works with tiny "facts".

After a long discussion, it seemed Commons would be the best fit for such data and over 70% of the Commons community supported hosting tables on Commons. The Commons community is already experienced with international multi-licensed content.

Usage

[edit]

All tabular data is stored in the Data namespace on Commons, with a ".tab" page title suffix, e.g., Data:Example.tab

The data will be accessible from all other wikis by:

  • Lua scripts in a Scribunto modules via mw.ext.data.get("Example.tab") function. The data will be returned as parsed JSON of the raw page content, so Scribunto module will be able to access all other metadata fields. This function is not tabular-data specific. We might also want to introduce mw.ext.data.getTabularData() to get data with localized strings resolved for a specific language.

To access data directly on a wiki page, you can import (if you don't already have them) the tabular data module (requires the navbar module) and optionally the tabular query template. (requires the aforementioned tabular data module) With these tools you can easily get a single cell's value.

Documentation

[edit]

See Help:Tabular Data and Help:Map Data for in depth field description.

TBD / Questions / Ideas

[edit]
  • Licenses - if requested per licensing discussion, how should the license be stored to make it machine readable and avoid untranslatable and unparsable free text
    • We may choose to deploy without license support (public domain only), and later add licensing capability. Yurik (talk) 18:20, 30 April 2016 (UTC)
  • What metadata is needed? Current proposal has "source" (string) and "info" (localized string), but we might need more.
    • Support for specifying data source(s). This is to avoid WW3 about what data is "right"/ the truth. There will be some frequently used sources that might need a shortcut and some sources will be less frequently used.
Is it enough to have one source for the whole table, or should we introduce a new data type called "source", to allow per row sourcing? The per-row sourcing could be added later of course. Also, ideally we should support multiple references just like wikidata. And it would be good to have multiple pairs of "source type" and "source value" - similar to Wikidata's property->value structure. Yurik (talk) 18:20, 30 April 2016 (UTC)
  • Cross-datacenter cache invalidation - JsonConfig supports remote cache invalidations, but it uses MW API call for that. What server should commons access to notify of data change?

Internal cross-wiki usage

[edit]

Cross-wiki data usage is based on the existing JsonConfig mechanisms that have been in production use (Wikipedia Zero) for the past few years. JsonConfig supports multiple content handlers, and can be easily used for cross-wiki shared data namespace.

JCSingleton::getContent() implementation gets content for a given page title, even if that page title is located in another wiki by first checking if the content is stored in memcached (JCCache::get()). The memcached key is non-wiki specific, allowing different wikis to share the same content object. If case of a cache miss, the page is loaded locally (in case when current wiki is the storage wiki for that title), or remotely via a query api call, and cached.

When the page changes (JCSingleton::onArticleChangeComplete), the memcached is updated with the new value, and optionally an API call is made to a remote server to notify it that the cache should be updated. This could help with cross-datacenter cache invalidation.

Configuration

[edit]

JsonConfig uses a very flexible (and a bit complicated) settings system. Both Commons wiki and all other wikis will need this code block to set up a cross-wiki shareable storage:

$wgJsonConfigModels['Tabular.JsonConfig'] = 'JsonConfig\JCTabularContent';
$wgJsonConfigs['Tabular.JsonConfig'] = array(
	'namespace' => 486, // === NS_DATA, but the constant is not defined yet
	'nsName' => 'Data',
	'isLocal' => false,
	'pattern' => '/.\.tab$/'
);

Commons wiki will need to specify that data should be stored locally:

$wgJsonConfigs['Tabular.JsonConfig']['store'] = true;

Other wikis will need to set how to access remote data:

$wgJsonConfigs['Tabular.JsonConfig']['remote'] = 'https://commons.wikimedia.org/w/api.php';

See also

[edit]

Further reading

[edit]