Extension:Wikibase Quality Extensions

Welcome to WikidataQuality!

We are a team of students from Hasso-Plattner-Institute in Potsdam, Germany. For our Bachelor's project we're working together with the team of Wikidata to ensure their data quality.

In consultation with the Wikidata community, two projects have emerged. On part of our team is currently working on a tool that validates Wikidata by comparing it againts external databases, whereas the second part wants to improve the usage and visualization of constraints.

Improving Constraint Reports
When we started working on this projects, the only way to define constraints was on the talk page of a property, and it could only be done via editing templates. This is neither user-friendly nor easy to maintain. On the contrary: During our studies we found, that there are over 4000 hand-written constraints, but some of them don't match exactly the definition of the templates, e.g. Single_value instead of Single value. It is very difficult for a bot to check the data againts their corresponding constraints, when some of them are written wrong.

So this is the status quo: There are constraints on the talk page of properties and there is a bot checking the data he finds in dumps of Wikidata against those constraints and genereates these constraint reports. While this definetly generates additionally value, it isn't nice to read, the underlaying constraints are a pain to maintain and checking againts a dump is of course not as accurate as checking againts live data.

Luckily, it is now possible to create statements on properties. Based on that feature, we are planning to migrate the constraints from the talk pages, enabling us to generate meaningful constraint reports right where they are needed.

Our vision of this project is, that every user who visits an item page gets a small indicator, when there is a constraint violation. Clicking on it, he should get a small text that explains, which constraint has been violated, and giving him the opportunity to fix it or to add it as an exception, when he is really sure, that this is not a violation.

We also want to give the user assistance correcting the violation. When for example the symmetric constraint is violated, he should get a prompt asking him to add the missing statement to the other item. Of course we have to pay attention that this doesn't cause errors to spread. Therefore, we think about the possibility to only fill in the missing value automatically, when there is a reference proving the correctness. This would have the nice site effect that the number of references in the system evenutually grows.

Currently, we are building a special page where you give an ItemID and we generate a table with the constraint report. Right now, we do this based on the constraints that were defined on the talk pages. To be able to do this on a special page in reasonable time and particulary on live data, we parsed every talk page and build a table with every constraint with their corresponding parameters.

In the end, the result of this check should be displayed right beside the statements when you visit an item page, but this will take a while. Until then, we want to migrate the constraints to the statements on properties, so that our special page can work without the usage of the table we genereated from the talk pages.

And here, we need your help
For representing constraints with the Statements-on-properties-model, there are several possible approaches, but after we discussed them with several members of the Wikidata community, we agree with the proposal Ivan A. Krestinin made on the discussion page for property proposal. For this approach, many new properties have to be created. For every property, one needs the approval of some community members. It would be really great, if you could read this suggestion, maybe discuss it, and ideally like this approach, give your approval for the suggested properties, and maybe create them.

We don't want to take the importand decision how to handle constraints in the future of Wikidata, but we really need them represented as statement on properties to continue our work and generate meaningful constraint reports that in the end hopefully improve the quality of the data in Wikidata at all.

External Validation
tbd: Description of this project

For further information please visit our Github Wiki.