User:Tgr (WMF)/structured article data

Jump to navigation Jump to search

Project ideas

MediaWiki's text-based nature puts a huge stress on all software that needs to support modern experiences, whether in content consumption or curation; infobox data, navigation structure, citation data, discussion structure and so on are all buried under a mountain of wikitext. In theory, Wikidata is a solution to these problems at least some of the time. In practice, there hasn't been much uptake of Wikidata, especially in the more conservative large wikis; it seems pretty clear that the move from wikitext to structured data and the shared ownership of data are too large changes to be realistically doable at the same time. We should bring local structured data to MediaWiki (now that Multi-Content Revisions has built the platform for that) and make the sharing/unification of that data a separate future step.

A first version of this could consist of a simple flat list of page - property - (string) value semantic triplets, like the data that DBPedia parses from infoboxes: a dedicated MCR slot could be created for article properties, such as a "population" property for articles about cities, and infoboxes and similar templates could pull information from the local structured data instead of / alongside Wikidata. (A zeroth version, for ease of migration, could populate virtual MCR slots dynamically via some parser function used in the wikitext - see T156876). Later it could be easily indexed in the database or in ElasticSearch, and extended with more complex values, contstraints, source claims etc.

For developers, this would provide a sane interface for interacting with granular article data (infoboxes, navboxes, problem templates, quality assessments, userboxes and so on) including the abillity to change it without having to parse wikitext. That would result in more powerful tools for editors (in-place infobox editing, granular change tracking, microcontributions, just to name a few) and the ability to use all the power of structured data without having to adapt to both new workflows and new power dynamics at the same time. For Wikidata editos, it would make the tracking of differences between local and central data more effective. For third-party MediaWiki users, it would provide a more maintainable and performant alternative to the existing structured data extensions (Semantic MediaWiki, Cargo) which, with wikitext being the only available storage medium at the time of their creation, ended up with some crippling design choices (but are very popular even so).