Talk:Wikidata Bridge

Jump to navigation Jump to search

About this board

This is the talk page about the Wikidata Bridge project. Feel free to give feedback or ask questions. Some threads are dedicated to specific questions or feedback loops.

Dies ist die Diskussionsseite über das Wikidata Bridge-Projekt (Verknüpfung mit Wikidata). Gerne kannst Du ein Feedback geben oder Fragen stellen. Einige Themen sind bestimmten Fragen oder Feedback-Schleifen gewidmet.

Test environment for the Wikidata Bridge

Lea Lacroix (WMDE) (talkcontribs)

Hello all,

If you want to follow the progress of the development team on the first version of the Wikidata Bridge, you can have a look at the test system.

First of all, a disclaimer: this is a live test environment, and the developers are working in real time on it, which means:

  • it is not the final version that you will see onwiki
  • some features are missing
  • it may be broken or behaving weird sometimes
  • the test wikis are almost empty and plenty of data is not there or needs to be created (items, properties)

This being said, here are the links that you can look at:

Feel free to try it and click on the pen icon to open the Bridge pop-up. Please note that in order to be able to edit data, you need to be logged-in on Beta: for this you will need to use or create a dedicated account, that is not connected to centralized Wikimedia account.

To go more into details, here’s an overview of what features are already working on the test system, which ones are not working yet, and which one will not be present in the first version of the Bridge.

What’s already working:

  • When clicking on the edit pen, instead of going to the Wikidata item, the Bridge pop-up opens
  • When the editor opens the Wikidata Bridge for a value with datatype string, they can edit it from the Bridge pop-up. For all others datatypes, the editor is told that it is not possible yet and offered to go to Wikidata.
  • For a value with datatype string, the editor can make an edit and save it, the change takes place on the Wikidata item. However, the new value is not immediately displayed (we’re working on caching issues), you will have to refresh the page to see the new value.
  • It is only possible to edit statements having one value. When trying to edit a statement with several values, an error message appears.
  • All edits made through the Bridge are tagged on Wikidata with the tag “Data Bridge” so they can be easily spotted in Recent Changes, etc. The edit summary is not very descriptive for now.
  • References can be displayed but the format is ugly for now
  • Template editors can adapt an infobox to add the “Bridge layer”. A draft of documentation is available here.
  • Permissions: when the page is protected, the edit pen will not be shown. For all the other situations where editing should not be allowed (user blocked, etc.), an error message will be shown.

What we are currently working on in order to be ready for the first version:

  • Ask the editor if they are updating or fixing a value and make the edit accordingly (for an update, a new value will be added and the rank will be changed; for a fix, the value will be overwritten).
  • Inform editors that they are editing a different project under a different license.
  • When trying to edit multiple values, special values (novalue and somevalue) or deprecated values, we will show editors an explanation and send them to the Wikidata item.
  • Fix caching issues (update the page immediately with the new value).
  • Link to the revision history of the Item to give editors a path to check and revert vandalism.
  • Support for datatypes external ID and URL.
  • Offer a way to Wikidata for adding and editing references on Wikidata.
  • Better edit summary
  • Better formatting of references

What will not be included in the first version, but potentially later:

  • Editing of datatypes other than string, URL and external ID
  • Editing of qualifiers
  • Editing of special values (no value / some value)
  • Reference editing/adding/deleting of existing references
  • Adding a reference to new values
  • Multi-value editing and viewing

The development will continue over the next few months, as well as discussions here to gather feedback. We will also update you about the wikis where we plan to roll-out the first version of the feature, if these communities volunteer to try it. See also: how to get involved.

Cheers, Léa

Reply to "Test environment for the Wikidata Bridge"
Vriullop (talkcontribs)

I see a potential confusion with the pencil icon, does it edit the value of the statement or the label of the value? Currently on Catalan Wikipedia, and other projects using the same Wikidata module, it is used for labels missing in local language. See w:ca:Angela Merkel with an icon for English fall back "Polytechnic Secondary School Templin" and even "Q56230686" with no English label, or similarly w:se:Angela Merkel and w:ast:Angela Merkel.

Should we use two icons, for labels and values?

Lea Lacroix (WMDE) (talkcontribs)

The Wikidata Bridge will edit the value of the statement, not the label of the value. In the future, we could add a link to the item of the value on Wikidata (eg Polytechnic Secondary School Templin) but I don't see the tool being able to change or add labels, this would add more levels of complexity and make it difficult to understand for the users we're targetting (casual Wikipedia editors).

I agree that the identical icon is confusing, and we should think about how we want to organize it: maybe a pencil to edit the value, and something else to edit the labels?

Amadalvarez (talkcontribs)

Hi, Léa. I understand and agree that Bridge is oriented to change infoboxes values, not infoboxes labels.

However, when the value shown was in another language (because the local label of the value doesn't exist), Bridge editors will click to change, because their point of view would be "the value I'm seeing is not correct". Then, Bridge will show "the correct value in incorrect language". What can the editor do to enter the correct value in the correct language?.

In other words, how to enter missing labels in your language using Bridge.

If I do not explain so well, just tell me and I prepare an example.

Salut !

Lea Lacroix (WMDE) (talkcontribs)

Thanks for describing the usecase, it is quite clear :)

For the first version of the Bridge, this case will not happen, because the tool will not be able to edit values are other entities - it will be only things like dates and numbers.

However, we will keep this usecase in mind for the next steps where we will start editing entities values.

Vriullop (talkcontribs)

Thanks. Then we sould think on changing the pencil icon, maybe indicating the language code of the fall back used: Polytechnic Secondary School Templin (en) with "(en)" linked to labels and the pencil used by Wikidata Bridge for the value.

Reply to "Editing labels or values"
Jura1 (talkcontribs)

Where will the definitions for mapping infobox fields to Wikidata properties be stored?

Is there a Mediawiki/Wikibase installation where a sample can be seen?

VIGNERON (talkcontribs)

Interesting. Wouldn't it be just the existing mapping?

Jura1 (talkcontribs)

Which is defined in various places ..

Lea Lacroix (WMDE) (talkcontribs)

It will be just like it is now: each type of infobox of each wiki can have a defined list of fields, usually defined in the Lua code. The Bridge will not change that, as it will only add a layer for editing the data.

Jura1 (talkcontribs)

How does "Bridge" guess which property/statement to edit?

Lea Lacroix (WMDE) (talkcontribs)
Jura1 (talkcontribs)

Does it work specifically with the format of the Catalan infobox or would it also work with other formats?

I think Krbot had some problems identifying all different ways infoboxes use to define this.

Lea Lacroix (WMDE) (talkcontribs)

The Catalan infobox is just an example. On the long run, the Bridge will be able to work with any format of infobox defined by the maintainers. During the first steps of the deployment, we may focus on the infoboxes used by the Wikipedia communities who will beta-test the feature.

Jura1 (talkcontribs)

How can I see if an infobox is supported or not?

Jura1 (talkcontribs)

@Lea_Lacroix_(WMDE): would you check with the developers to find out where/how it's defined to be recognized by the extension?

Lea Lacroix (WMDE) (talkcontribs)

I don't understand what you want to know. It's way too early to check if an infobox is supported or not, as the feature is not even finished. Later, it will be deployed on some wikis, and the infobox maintainers will be able to activate the Bridge features for some or all infoboxes. What exactly should be recognized by the extension?

Jura1 (talkcontribs)

How does the extension know which property or statement goes with a value displayed in the infobox?

e.g. how does the extension get from the mother at ca:Harry Potter (personatge) to d:Q3244512#P25.

It must already be defined somewhere.

Would one need to insert a manual edit link in the infox?

Lydia Pintscher (WMDE) (talkcontribs)

Yes. In the first version the Bridge will hook into the links that are used in edit pens like on this article's infobox. The template editor will need to make a small adjustment (adding a parameter to the link) for it to work but it's a minimal change.

Amadalvarez (talkcontribs)

I agree with @Lydia Pintscher (WMDE) that we must assume that something must be changed/added on the infobox code to inform Bridge which are the properties (in plural) involved in a "conceptual block". The current demo infobox of telescope is not the best example to understand what are the properties/qualifiers involved in block of information, because each information concept in this demo respond to one property. I suggest you see as a reference (not as the objective for first version of Bridge!) this three cases.

Although solve any situation is not part of the first phase (btw, smart decission), we need to know the whole complexity of the structures shown in infoboxes, to be able to identify the new internal tools (mapping ?) or rules we need to build/have within infoboxes to make understable to others uses, as Bridge.

Reply to "property def"

Link to a new version of the prototype

Lea Lacroix (WMDE) (talkcontribs)

Hello all,

I just updated the link to the prototype to a new version (v 3.0). Thanks to the feedback we received during and after Wikimania, we've been able to understand better what people need and improve some features.

Please note that this is still a "click dummy", with only certain reactive paths and areas where you can click on, not a full developed test system. If you get stuck, you can press "R" or click on "Restart" on the bottom right corner.

What changed:

  • the reference section was updated to be more visible
  • we added a new type of screen, "data type not supported", to warn users when a data type is (not yet) supported by the Bridge. You can access it by clicking on the coordinates.
  • another new type of screen is permission screens, that you can test by clicking on the map. This kind of screen will appear only when the user is trying to edit something without having the permission. The text and display is still to be improved, feel free to give us feedback.

What we're still working on:

  • cancel and save buttons for references
  • path to Wikidata not included yet
  • path to history not included yet
  • license notice not included yet

Feel free to try this updated prototype and give us feedback!

For your information, we're running a bit late on the timeline that is presented on the overview page, I'll give you a more detailed update at the end of November.

Thanks, Léa

Bouzinac (talkcontribs)

When someone type in "new" values (with or without ref, I don't care : I don't understand why there is such a debate, whatever) ; will the new value be "upranked", and the other pre-existent value(s) be "downranked"/deprecated ?

Lea Lacroix (WMDE) (talkcontribs)

Thanks for your feedback. Editing ranks will not be part of the first version of the feature, but that's an idea we keep for later.

Bouzinac (talkcontribs)

So there will be multiples values each times someone edit a property ?

Lea Lacroix (WMDE) (talkcontribs)

If the user states that the value is wrong, the value will be changed in Wikidata - no new value will be added.

If the value is outdated, then yes the outdated value will be kept in Wikidata and a new one will be added.

Ayack (talkcontribs)

But if the outdated value had a preferred rank, what will happen? The new value will be created with a normal rank and so won't be displayed in the infobox? Or the old value will be downranked and the new upranked? Or created with a normal rank?

Bouzinac (talkcontribs)

So the outdated value should be automatically be downranked...let's hope it will be finally implemented

Ayack (talkcontribs)

No, the outdated value should not be be downranked (it was true at a point in time), it's the new value that should be upranked.

Charlie Kritschmar (WMDE) (talkcontribs)

Hello @Ayack and @Bouzinac, the idea is that we will down rank the previously preferred value to normal and set the newly added value to preferred instead. We will not be deprecating values from the bridge. If the former value was already at a normal rank then the new value will be set to preferred.

There is currently no option to just add an additional value of the same rank via the bridge.

Jsamwrites (talkcontribs)

References is now well highlighted. Is it possible to add multiple references from multiple sources?

Charlie Kritschmar (WMDE) (talkcontribs)

Hi @Jsamwrites, It will be possible to add multiple references from multiple sources. The current prototype doesn't represent this functionality well yet.

Reply to "Link to a new version of the prototype"
Alsee (talkcontribs)

I see your UX Research page already has a link to EnWiki 2018 Infobox RfC. However I don't think you fully appreciate what that RFC says and what it means.

A very oversimplified summary would be that about half the community support Wikidata in infoboxes, and about half the community oppose use of Wikidata.

A somewhat more nuanced explanation would be that 1/3 of the community support Wikidata in infoboxes, 1/3 want it GONE, and 1/3 find the current usage of Wikidata problematical but they would be willing to support Wikidata in infoboxes if their concerns can be met. Some people would argue that it is impossible for Wikidata to meet those conditions. It could be argued that the 1/3 in the middle haven't followed the issue closely enough yet, and that they haven't yet figured out that Wikidata can't satisfy those requirements.

There have been a number of other Wikidata-related RFCs since then, denying or removing Wikidata from other locations.

  • There is a substantial likelihood that your project will provoke a new Wikidata RFC on EnWiki.
  • There is a very real possibility that your project will effectively trigger a ban of Wikidata on EnWiki. I would hesitate to claim any particular percentage chance on that outcome. The issue is precariously undecided among the community, and my impression is that it may be tilting against Wikidata.

I anticipate little chance this post is going to affect the course of your project. However I did want to alert you of the situation and the potential effect.

And as someone else noted, this tool would be literally unusable for unsourced items on EnWiki. There is an overwhelming rejection of unsourced items. The range of debate on EnWiki is whether sourced items are permissible.

Wargo (talkcontribs)

This tool's purpose is to edit Wikidata. Infoboxes could still use values set locally (in the ways as usual) and I think this tool could work on infoboxes and their properties without enabling to show in them values from Wikidata.

Jc86035 (talkcontribs)

Was it really necessary to immediately assume that the team had no idea what was said in the 2018 RfC? Many of the WMDE staff have worked on Wikidata for nearly six years, so I doubt that they would be unaware of Wikidata's reception on the English Wikipedia.

I doubt that this will result in an outright ban on Wikidata transclusion. Although most properties are rarely used in English Wikipedia articles, consider the authority control template – it would take almost a million edits for the Wikidata values to be replaced, and it would be needlessly difficult (if not completely pointless) to replicate Wikidata's semi-automatic identifier matching tools. (Right now, 60.6% of English Wikipedia articles use at least some Wikidata data; this is the 44th-highest percentage out of all Wikipedia editions' usage percentages.) Even if you assume that Wikidata data will never be usable on the English Wikipedia, I think it would probably require a significant amount of work to completely remove all Wikidata data at this point, and it could be potentially difficult to find consensus for this.

Another RfC could be a useful opportunity; a possible outcome could be to hide the pencil icons for (e.g.) unregistered users by using CSS classes. And by that time, I think Wikidata's ratio of sourced to unsourced statements would have improved since 2018, especially considering the DBpedia data-sync project and the other ongoing initiatives.

Furthermore, the WikidataIB module can ignore Wikidata values if they aren't sourced properly (and the pencil icon can also be trivially enabled or disabled, since it's just an image with a link), so I'm not sure why it's necessary to focus on unsourced data so much. Given that the draft technical documentation shows that the Wikidata-editing pop-up will be enabled through the addition of a single HTML attribute to a wrapper around the pencil icon, this would only be an issue where some values for a particular property on an item are sourced and some of them aren't (all the values would be shown in the pop-up).

In any case, it doesn't necessarily even matter if it's not going to be usable on the English Wikipedia for some years. The English Wikipedia is not the only wiki that matters, and the software will still be ready for whenever Wikidata gets those mass imports of sources.

Alsee (talkcontribs)

I'm not sure why it's necessary to focus on unsourced data so much.

It was barely mentioned. And it was mentioned because the documentation says they are building something with no support for sources at all. That means it won't even function on EnWiki unless/until they do add support for sourcing.

Devs: I just realized something. When you say your building something with no support for sources, does that mean the software will update the value and:

  1. Delete any source information that was attached? Or...
  2. Not-touch any preexisting source that was attached?

Neither option is particularly good, but I really hope you already considered this issue and already realized why one of those options would be extremely bad. Angry mob bad.

Another RfC could be a useful opportunity; a possible outcome could be to hide the pencil icons for (e.g.) unregistered users by using CSS classes.

No, that's not credibly possible as an outcome. The debate was whether to use Wikidata in infoboxes at all, and there were a lot of complaints that the RFC was overly complicated. We're not going to include trivial details like the pencil icon in the same RFC debating whether to fully deploy or fully rollback use of Wikidata in infoboxes. Small details like the icon would have to be addressed separately. The icon is irrelevant if the community decides Wikidata doesn't belong in infoboxes at all.

I expect the next RFC will follow up directly from the result of the last RFC. The result was that Wikidata might be acceptable for use if the relevant concerns are satisfied. I expect the RFC will ask whether Wikidata content is a sufficiently reliable source in specific and sufficiently compliant with Wikipedia policies&guidelines in general for automated import of Wikidata content into Wikipedia.

Wikidata's ratio of sourced to unsourced statements would have improved since 2018

The ratio of unsourced statement is irrelevant because the unsourced statements are irrelevant. They are already blocked. The only problem is the sourced statements.

WikidataIB module can ignore Wikidata values if they aren't sourced properly

No it can't. I discussed this with the module developer. The module makes an simplistic attempt, and it does filter (most) unsourced items. However it can't even reliably filter out items circularly sourced to Wikipedia. (It pattern-matches for "Wikipedia" in the source field, but millions of items are sourced to Wikipedia without having that text string anywhere in the source field.) The WikidataIB maintainer also refused to even attempt to filter out items which are circularly sourced to Wikidata. (Those items may be unsourced, but pass through the filter because there is a circular source claim attached.) The filter can't reasonably detect which items are circularly sourced to Wikidata, any such filter would have to overblock a massive percentage of all sourced items on Wikidata. And that's not even considering the universe of general bad sources, and other issues.

Snipre (talkcontribs)

I have some problem with your affirmation "No it can't." It is possible to filter values to avoid values with no source or sourced by a wikimedia project (using property P143 "imported from Wikimedia project"). I know it because the French WP has developped the lua module for that (see, parameters withsource and sourceproperty).

Then there are still some wrong formatted references using P248 "stated by" linked to a wikimedia project. But this can be handled by curating the data in WD with a bot.

Now if you want to filter value according to a reduced number of references you considered as reliable, this can be done: you just need to create a filter which analyzes the references linked to a value and match the references you listed.

Finally I don't understand your affirmation "WikidataIB maintainer also refused to even attempt to filter out items which are circularly sourced to Wikidata." As references as defined in particular item, any reference link will point to a WD item. You should explain what is you problem with circularly sourced to Wikidata.

The main solution for your problem is on WP side by developing the filter which will extract the data you want: WD has a model for reference (see ), some contributors are still not respecting that model in WD but the structured data of WD allows you to eliminate everything you don't want. Just be aware that more you increase your constraints, more the filter is complex and less data will fit your desire. That's all.

Nikkimaria (talkcontribs)

It's very easy to filter values that use P143 "imported from Wikimedia project". It's also very easy to filter values that only use a very defined set of reliable sources. It is not easy, however, to get anywhere in between, because you'd need either a comprehensive set of all reliable sources or a comprehensive set of all unreliable sources, and such a thing simply doesn't exist.

Jeblad (talkcontribs)

I have experimented with reusing sources from Wikidata, and it is quite easy to make a minimum implementation. It is although a bit hard to make something that truly look like the current cite templates. It is not hard at all to filter out statements that use P143, but it is strangely difficult to explain to users that this is easy and in fact was done in the experiment.

Snipre (talkcontribs)

Thank you for the comment. This is my point too: we can filter everything which doesn't comply with the English WP rules concerning the lack of source or circular WP references. But as WP English is not able to provide the list of reliable sources for the whole WP, this is no reason to ask the same task to WD. If someone want to extract only values from reliable sources, then he has to provide the list of reliable sources. This can then be added in the infobox code (if the infobox is codded in lua of course).

Jc86035 (talkcontribs)

Infoboxes are mostly irrelevant on the English Wikipedia for this use case, at least for now. Most of the Wikidata usage on the English Wikipedia still comes from the authority control, official website and Wikimedia Commons templates (as well as some other external identifier templates). It's still exceptionally rare to see infoboxes that use any substantial amounts of data from Wikidata, and I don't expect this to suddenly change.

Conveniently, if this remains the case until Wikidata's sourcing situation generally improves, it means that sourcing is also almost entirely irrelevant for the English Wikipedia, because external links and Commons categories don't require sources. I do also think it would ultimately be necessary for the software to show sources if it's ultimately intended for everything in infoboxes to be based on Wikidata data, but it might take a lot of user testing to make it more usable than the figure-all-of-it-out-yourself style of the current interface. I don't think this sort of user testing would be appropriate for the minimum viable product, though, especially if it'd be initially implemented as a beta feature (i.e. lower risk of vandalism and errors).

It could be possible to make the case that because the English Wikipedia convention is to omit sources from infoboxes altogether, it wouldn't make sense for sources to be shown in the pop-up anyway (though in my view it would probably make more sense to require sources for all infobox data).

I also take issue with your suggestion that Wikidata would have to be considered as a whole. As an example, if census data were imported regularly for a particular country (making it acceptable to use that data in infoboxes), it wouldn't make sense to require a complete evaluation of Wikidata just to use that census data.

I'm still not sure why you're implying that this software being unable to handle references would cause a huge fuss, given that it would be the conscious and deliberate decision of template authors to enable the software for the appropriate parameters; presumably, if it's considered absolutely necessary for the end user to view the sources, then the pop-up would just be disabled for the relevant data. I'm sure it's perfectly possible for this to become contentious if it's not handled properly, but it doesn't seem likely to me because it's (presumably) not going to be forced on anyone and editors are (presumably) to be completely in control of where and when this gets enabled.

Jc86035 (talkcontribs)

Of course, Wikidata data is actually used in infoboxes in a significant number of Wikipedia editions, so (as Snipre notes below) this would not be applicable for some of the larger wikis like the French Wikipedia, and in most situations it would be necessary to be able to view, add and modify references through the pop-up.

Lydia Pintscher (WMDE) (talkcontribs)

Hey :)

Yes. It'll be up to the template editor to decide where to use the Wikidata Bridge. Also our designer is currently figuring out how to handle references. So we'll have that as well.

ChristianKl (talkcontribs)

That sounds like a state of affairs where you would get a template editor to decide to use the Wikidata Bridge on EnWiki and then we get drama and an RfC which might put us in a worse position then we are currently.

Jc86035 (talkcontribs)

Wikidata usage itself is already limited to non-infobox data that doesn't require references on almost all English Wikipedia articles. Only a few specialized infoboxes like w:en:Template:Infobox telescope use Wikidata extensively (though w:en:Template:Infobox person/Wikidata does have more than 1,800 transclusions). Given that it wouldn't make sense to use Wikidata Bridge without also using the relevant Wikidata data, I don't think this would in and of itself cause any drama; it's already discouraged to add Wikidata-transcluded data to begin with, so I doubt that anyone is going to be running around adding a beta version of Wikidata Bridge to high-use infoboxes.

Logically, there are only four scenarios for Wikidata/Wikidata Bridge usage for infobox data:

  • The status quo, at least on the English Wikipedia is that infobox parameters generally have neither Wikidata data nor the Bridge pop-up.
  • If one were to add only the Bridge code but not the data from Wikidata, the pop-up would probably be removed (or no one would notice) since it wouldn't be possible to use it to modify anything displayed in the article. This would probably cause disruption to the Wikidata data, but wouldn't in itself cause disruption to the Wikipedia article.
  • The situation for adding data transcluded from Wikidata without the Bridge pop-up would presumably remain the same.
  • If both data from Wikidata and the Bridge pop-up were added, would this be worse than if only the Wikidata data were added? The references not being shown, I think, would only be detrimental to Wikidata and not to the English Wikipedia, because references aren't normally shown in English Wikipedia infoboxes, and it's still not possible for w:en:Module:WikidataIB to display references in the first place.
RexxS (talkcontribs)

It's not difficult to import the references, if desired, along with the values they support. The problem lies with how to format those references so that they match the style of the references used in the rest of the article. Unfortunately, the English Wikipedia allows an editor to make up whatever style of referencing they choose and once that becomes the established variant for the article, you need lengthy discussion to be able to change it. It's therefore a near impossible task to hold a list of all possible styles of reference formatting that imported references would need to match. The outcry from editors that "these references have appeared and I can't change them to fit my scheme" would set back the adoption of Wikidata on the English Wikipedia by years.

Jc86035 (talkcontribs)

Maybe I'm stating something that's already obvious or I've asked you this question already in some other discussion, but could you not have a parameter like |ref-style=cs1 and disable citations/Wikidata on pages without the parameter? Would there be a drawback in doing so, other than necessitating the use of the parameter?

RexxS (talkcontribs)

The disadvantage is that editors will copy others who use the parameter, even on pages that don't use CS1. That's when the outcry starts and the blame goes to "Wikidata" or whoever implements the code, rather than the editor who made the mistake. Nevertheless, I suppose that we'll have to have something like it eventually, so I'll sandbox an implementation when I get back from this weekend's meeting.

Snipre (talkcontribs)

I share some fears of Alsee about the risk that this tool will bring more criticism about WD. The main criticisms are the quality of the data and the integrity of the data. And the first draft of the tool didn't take account of these two aspects so there is a legitimate question to know if the developement team is aware of the opposition to the use of WD.

About the quality of data, the main element is the sourcing and some WP have some strong policy about that aspect. Proposing a tool which is not able to deal with that principle is just a casus belli because it encourages a bad behaviour in those WPs and can increase the risk to see the display of that data later by the WP if the infoboxes are not proper coded to filter the data.

Then about the data integrity, the fact to propose to correct "wrong data"is a bad understanding of the sourcing principle. If someone did a mistake when adding a sourced data with the property "retrieved" (P813), then by changing the value of the statment but not the value of that property in the reference part, we generate complete chaos in terms of chronology.

And finally no answer about the vandalism problem was proposed since the last RfC: no stronger policy about authorization of contributing (like excluding IP), no stronger policy about sourcing (like obligation to add a source), no increase of the protection of existing data (it is still possible to change the value of a sourced data without having any blocking or deletion of the reference part).

Just increasing the access to WD will just incease the vandalism posibilities and then WP having a strong concern about that problem won't be intereted to implement a that tool and further to use data from WD.

Jc86035 (talkcontribs)

Wouldn't sources presumably be made mandatory on a per-property basis using constraint violations (which Wikidata Bridge would have to be able to handle at least partially anyway)? I don't think it would be appropriate to implement such a requirement specifically for this tool, given that many properties (a majority if we include identifiers) shouldn't need sources in the first place.

I agree that it would be essential to manage vandalism, although if all the edits from the tool are tagged then it shouldn't be technically difficult to assess whether most edits are constructive and whether allowing unregistered users to access the tool is beneficial. As noted, the tool would also be enabled on a case-by-case basis (and it would be possible to mix non-interactive and interactive pencil icons in the same template), so if a template is disproportionately enabling vandalism then the pencil icon could be removed for that template.

Snipre (talkcontribs)

Jc86035: "given that many properties (a majority if we include identifiers) shouldn't need sources in the first place". On which basis can you assert that the majority of properties don't required references ? This is the inverse, the majority of properties require a reference to be able to manage the chronology of information (even identifiers can change folloging merging in the original databases) and difference between several points of view.

In my opinion only instance of/subclass of and properties used to decribed a reference shouldn't have a source.

People think that identifiers don't require references, but htis is wrong, because some databases are not open so we can know those identifiers only using others sources, then even identifiers can change with the time due to merging in the original databases, so it is necessary to have at least a retrieved date with an identifier in order to understand why two references can have different values for the same identifiers (one if before the change and the other after). Some data can be different but all can be correct: we can have differences in the precision, in the method,...

And finally, to comme back to your comment, you explain that a tagging will be used to assess the vandalism risk, but this assuems a post treatment of the edits (who will do that, is ti a permanent action or only a temporary analysis,...) and this is based on the fact that we will be able to differentiate vandals from contributors in an easy way. So again the risk is not reduced at the origin but using permanent actions which will lead to an increase of workload for "good" contributors and will be depending on the effort of these contributors Wikipedians don't like this kind of arguments, they dont' want to know what we will do to improve WD, but what is currently active to keep the quality.

So again with the announcement of the Wikata Bridge, we have a similar process than the one for WDin the past: we are trying to sell a product with identified drawbacks with the promise that the tool will be improved in the future to take account of the customer needs. And the Customer answer is "come back later when your tol will do what we need".

Jc86035 (talkcontribs)

Even assuming that it would be good practice for users to add metadata for identifier statements, given that "our designer is currently figuring out how to handle references", it would presumably be technically possible to make it easier to add retrieved (P813) and the UTC date as a reference for identifiers using a checkbox (of course, this is hypothetical, and it would be better if this and similar actions were possible in the Wikibase interface as well). Even if this isn't possible, it wouldn't be difficult to check an item's revision history if a conflict arises (and presumably this is usually necessary anyway, since most identifier statements don't have such references).

Although I agree that the prospect of increased vandalism is probably detrimental, it's possible that it would also be easier for readers who've never edited before to correct obvious vandalism (and easier for them to be introduced to editing Wikidata, and so on). Anecdotally, I've seen this happen on both Wikipedia and Wikidata. We would need to wait for the live trial to know if this actually has a measurable benefit, of course.

Snipre (talkcontribs)

I agree about the fact that every problem can be solved with a solution, and this discussion is not about pointing the impossible things to acchieve. But the development team decided to find a solution to one particular problem (contributing to WD from WP) and forget to propose a solution for all other problems which are currently the sources of opposition to the use of WD in WP: poor quality data (data without source, data sourced from WP), fight against vadalism (no possibilty to use WP history of article to see modification in WD affecting the WP article), protection of integrity of data in WD,...

We all know this is a question of priorities and resources, that's not the point. The problem in the process used to propose a new tool which solved one problem (contribution to WD) but without having from the start a solution for the other problems. We can't change the current WD framework, but any new modifications should take account of the maximum of the criticisms mentioned previously in order to show that the problems were understood and that there is a will to solve them.

To summarize a little:

  • the first beta version has to include a feature to handle reference. There is no interest from the big WPs (or at least no majority) to assess a tool without that feature
  • no correction of existing statement should be possible without an update of all relevant qualifiers or reference properties. The critical point is the retrieve property which has to updated in any case. And finally the definition of correct or wrong for a statement can't be done without a look at the reference. Some data were correct once and not more currently. This doesn't mean that value was wrong all the time and perhaps in a certain period the data was correct. WD has to be able to take account of the fourth dimension (the time). Same for data with different units, and based on different determination methods. Unless the full data set including value, qualifiers and references can be displayed by the interface, there is only acceptable action when contributing to WD from WP: adding a new statement.
  • finally, even if the interface tool allows new contributors from WP to edit WD, I can already predict that this source of contributions will be a problem for WD: the interface is not the main problem, the main problem is the addition of the whole data set linked to a value. Just look at the complaints of WD contributors: to add a reference, I need to create a new item with 2-5 properties or how data should be modeled (see the problem of date accuracy). How the interface will solve that ? I hope that the interest of WD will be taken into account and everything will be done to avoid a bunch of new statements without reference and without mandatory qualifiers, because this will just transfer the problem from WP contributors to WD contributors and I would not like to have to correct bad structured statements edited in WP.
Jeblad (talkcontribs)

I doubt this is correct the first beta version has to include a feature to handle reference. There is no interest from the big WPs (or at least no majority) to assess a tool without that feature

Most users (and communities) are willing to explore any solution to editing statements from Wikipedia, with or without sources from Wikidata.

The second point would set the bar muh higher for Wikidata than Wikipedia presently does, and I'm not even sure it is possible to enforce this at all. Not on Wikipedia, and not on Wikidata. It is possible to trigger warnings in some cases, like when a sourced document changes.

Snipre (talkcontribs)

Sorry but please add a reference to your affirmation "Most users (and communities) are willing to explore any solution to editing statements from Wikipedia, with or without sources from Wikidata". The three main WPs (en, fr, de) had RfC about WD use and put some constraints to the use of WD and the constraints were not related to the lack of possibility to edit WD from WP.

With Wikidata Bridge, there will be no change in the reasons which lead to limit the use of WD in WP. Some contributors are always ready to test new functionalties but if you don't convince the majority who emitted strong reserves in the RfC, then you will just see new constraints for the use of Wikidata Bridge. Is it what you want ? What are you proposing to avoid the criticismes already written in the previous discusisons ?

For the second point, there is no different treatments between WP and WD: WP always recommend to add source so WD Tools have to be able to edit references. You mix the possibility to add reference and obligation to add references. Wikidata Bridge doesn't need to force contributor to add sources but has to be able to provide a way to edit sources.

By delivering a beta version without possibility to edit sources, Wikidata Bridge will not comply with the recommendation of WP to add sources and will not take account of previous criticismes mentioning the poor data quality in WD mainly due to lack of sources.

Jeblad (talkcontribs)

Been involved in several discussions about Wikdata Bridge, and Wikidata in general, on several projects. (Betatesting at nowiki) The discussions goes as about ¼ is very vocal pro Wikidata, ¼ is very vocal against Wikidata, and ½ is pretty much indifferent but accepts use of Wikidata. The quarter of users against Wikidata usually claim they have way more support than they usually does, and more or less consistently claim their fringe problems are major problems. No, I don't buy it.

To recommend use of sources is one thing, but what you imply with Some data were correct once and not more currently. This doesn't mean that value was wrong all the time and perhaps in a certain period the data was correct. is way more than to simply recommend use of sources. Okey, so you back down and say it is no obligation to add references, good. It is a beta and will provide some functionality. More features will come in later updates, and will be defined by the devs and the communities involved in the betatesting. If that does not include enwiki, so be it.

Jc86035 (talkcontribs)

While I generally agree with what you've said, these issues are shared by the vast majority of existing Wikidata editing interfaces. Harvest Templates and Mix-n-match, for example, also do not add the "retrieved" property, and presumably have a much higher editing volume than this software will; and the interface is only limited by the existing property constraints and edit filters. Perhaps it would be inappropriate to make these things absolute prerequisites for the software's deployment, given that this wouldn't actually resolve the issues and that all the work to prevent these things could have almost no effect (given the existing editing volume). I think it would be much more effective to resolve these issues through a separate project that doesn't just involve one interface, but clearly the issues haven't yet been prioritized enough across the board, not just within the development of this software.

Moreover, given that existing software already suffers from these issues and that the situation doesn't change in this regard, the Bridge software could be a net positive in terms of vandalism/bad edits, since users would become less likely to click through to due to most pencil icons no longer linking directly to Wikidata items.

(From the Wikipedia point of view, even if a user adds just a plain URL as a reference, it's better than nothing and can almost always be fixed later. I would think this is pretty much the same for Wikidata.)

Jc86035 (talkcontribs)

While vandalism is a relevant subject here, I think it would be appropriate to address label/description vandalism, especially since it seems to be more common than statement vandalism. Addressing it properly would probably save much more time for experienced Wikidata contributors than adding an elaborate and foolproof citation system to the Bridge software would, and would probably require less development time and less user testing.

Perhaps the easiest way to address this would be to enable the SHORTDESC magic word on all Wikipedias (if not all Wikimedia wikis) – it does bring a number of benefits, particularly that more detailed/helpful generic descriptions (e.g. for disambiguation pages, lists, categories) can be enabled with one edit to a high-use template, rather than a bot run over a million items every time a word should be changed. At least on the English Wikipedia, I think the template also makes it seem easier (even if it isn't actually easier) to make the descriptions for individual articles more detailed. Of course, it also pushes the liability for fixing a lot of description vandalism back to the Wikipedias, which is convenient in several ways.

Stjn (talkcontribs)

I hope there will never be a decision to turn bad hacks for individual wikis like SHORTDESC into fully supported features. We should work on better (and by default) tracking of changes from Wikidata and deprecating those hacks, not on setting them in stone.

Jc86035 (talkcontribs)

I do think it helps to some degree, even if only because most English Wikipedia editors would never have otherwise noticed the descriptions' existence (e.g. [:w:en:Talk:Witness (Katy Perry album)#"Best album"? w:en:Talk:Witness (Katy Perry album)#"Best album"?], as well as #Brief article description vandalism on the same talk page – the vandalism edits actually stayed on the Wikidata item for almost half a year in total, which is concerning) because most English Wikipedia editors don't use the portal or the mobile site and don't edit Wikidata. On the other hand, I wouldn't advocate for the use of SHORTDESC on every project, simply because there are only a few wikis where maintaining the descriptions could be plausibly doable.

Alsee (talkcontribs)

Remotely exporting Wikidata item labels as if they were article-descriptions was an extremely bad hack. The Foundation needs to ask the community before pulling stupid stunts like that.

Wikidata can happily go on it's own way as its own project, or it can be shoved down other wiki's throats and a mob will show up with torches and pitchforks wanting it burned to the ground.

Jeblad (talkcontribs)

Please moderate your post. Thank you.

Jeblad (talkcontribs)

While I understand some of the feature requests, a lot of the features described in the above thread has very little to do with Wikidata Bridge. They add a considerable feature creep which may make the project unfeasible.

Wikidata Bridge should allow editing of a statements value in the first implementation. That is the core feature, and that is what should be the beta.

ChristianKl (talkcontribs)

Why is adding statements with sources a requirement that makes the project unfeasible? It makes the project more complicated but it actually means that it costs less political capital to deploy the feature. The whole Wikidata Bridge project is to make interaction with Wikipedia easy. It won't be if you burn political capital for the sake of making the project a bit easier on the technical level.

Jeblad (talkcontribs)

Note that I wrote “They add a considerable feature creep” in plural.

Snipre (talkcontribs)

"Wikidata Bridge should allow editing of a statements value in the first implementation." References and qualifiers are part of the statement so these fields have to be editable in the first version of Wikdata Bridge.

Jeblad (talkcontribs)

Statements value (Wikibase/DataModel#Values), or more precise the object in the w:semantic triple (subject–predicate–object). References and qualifiers are part of the statement, but not part of the value (object).

There are several layers in a statement, and most users does not have a clear understanding of how they relate. I should have been more clear.

Jc86035 (talkcontribs)

By "beta", are we referring to the first usable version (e.g. something that would presumably be enabled as a demonstration on one of the test Wikipedias), or the first version to be enabled on a real Wikimedia project through the Beta Features preferences tab (which would actually be able to modify Wikidata)?

Lydia Pintscher (WMDE) (talkcontribs)

We will have support for adding references in the first version that goes live on a Wikipedia. Before that there will be iterations on a test system, the first few of of which will probably not have it. We will not make references mandatory in the first release because I think that adds a burden that we should not add unless absolutely necessary (and I can be swayed by feedback on the first releases). We will start with rollouts on small to medium sized wikis that ask for it. I believe these are the right ways to go forward because at the end of the day it is in the hands of the template editors on-wiki to decide where they enable the functionality and where not so they are ultimately in control. I hope that helps clarify things.

ChristianKl (talkcontribs)

We don't we start with design iterations, till we have a design that properly supports sources and thus doesn't come with the risk of alienating Wikipedians? WMF and EnWiki relations aren't on a particular high currently and it's prudent to avoid actions that have the potential to inflame matters further.

Why code up a design that's not intended to go live on a Wikipedia?

Reply to "EnWiki reception of Wikidata"
GreenReaper (talkcontribs)

I wasn't aware of this when posting my feedback to Wikidata's federation input discussion, but if we are using data from (and potentially donating data to) Wikidata on federated projects, it stands to reason that we may want to use a bridge to edit Wikidata within our UI (as well as any data items in our own Wikibase instance), just as you forsee it being done on your own sister projects. While I appreciate this isn't the focus at this time, perhaps it could be a possibility that you consider for the future?

Lea Lacroix (WMDE) (talkcontribs)

Hello and thanks for your feedback!

Although we're developing the tool with the usecase of editing Wikidata from Wikipedia for now, we have in mind that this extension could be used on any Wikibase instance in the future. This could be used to edit the content of a Wikibase instance from one of its client wikis.

The usecase of being able to edit Wikidata's data from a external wiki (not part of the Wikimedia projects) is not part of our short-term roadmap, but it could definitely be considered in a more or less distant future :)

GreenReaper (talkcontribs)

Thank you, Lee. As outlined in our policies, we have no wish to duplicate Wikipedia or other Wikimedia projects; rather, our goal is to complement them, and such a feature may assist in that.

Reply to "Use on federated projects"

New prototype (integrating references)

Lea Lacroix (WMDE) (talkcontribs)

Hello all,

Based on your previous feedback, we built a new version of the prototype, that includes showing and editing references. This v2 has been tested by our UX researchers during Wikimania, with a dozen of people. We are already working on a v3 that will integrate the suggestions of the interviewees, but we still wanted to share the v2 with you, so you can keep track on the evolution. So here it is!

As usual, please keep in mind that this is only a "click-dummy", faking real interactions and with only one user path available.

If you have any remarks or questions, feel free to answer on that thread.

Jsamwrites (talkcontribs)

This new version v2 of prototype is clearer than v1. However, I am wondering whether the retrieved date is autofilled (on clicking like in Wikidata) or editors have to fill it. May be one example can be shown.

Charlie Kritschmar (WMDE) (talkcontribs)

Hi @Jsamwrites, thank you for your feedback. Currently retrieved is not automatically filled, but that is on our list of features we'd still like to implement.

This post was hidden by Charlie Kritschmar (WMDE) (history)
Jeblad (talkcontribs)

Seems like the template type is part of the work flow; image 8 and image 9. This must be stored at Wikidata as abstract types, as there are a lot of different ways to implement references. Local code must also be prepared to handle typeless references, as it will probably be a lot of references that lacks type for a long time.

I'm attempting to make code for figuring out which template is a likely match in w:no:Modul:Property map, but it isn't completed yet.

Note that some references has an implicit type by lack of specific information, even slight differences in some information, or even that the information could be lacking completely. A well-known example is newspapers that publish short versions of the articles on the net, and complete articles on paper. When users find the story in the paper they try to look the articles up on the web, and link to them, without realizing the web-version is shorter, and that specific information (the reason for using the article as reference) is missing.

Note also that a lot of references at Wikidata does not have required pieces of information, like “title”. We could run a citoid-bot to fill in titles, and perhaps some other fields, but until that is done (not sure if that will ever be finished) a fallback for cite templates are necessary. This can be implemented quite easily in Lua, but it must be done.

Reply to "New prototype (integrating references)"
Mohammad (talkcontribs)

Hi, I'm working on creating a local page and a local announcement on Persian Wikipedia, explaining all that is explained here and updating that page accordingly, I'll also request for discussions both locally and on here, I was wondering, since not all editors on Persian Wikipedia are required to be fluent in English, is it possible if I occasionally post a translation of user discussions on this page and reflect WMDE answers locally?

Lea Lacroix (WMDE) (talkcontribs)

Hello Mohammad, and thanks for all the work you're doing! Of course, we'll be happy to answer the questions from the Persian Wikipedia community. Feel free to add a summary of the discussions here, and I'll answer any questions you have.

When the page is ready, you can also add it in the list on this page.

Jeblad (talkcontribs)

I wonder if there should be a page with links to local pages on the individual projects. That could make it easier to track down important implementations, and also differences.

Mohammad (talkcontribs)

Isn't there already one? @Jeblad

Jeblad (talkcontribs)

Not that I know of…

Lea Lacroix (WMDE) (talkcontribs)

I started a list on Wikidata Bridge/Updates, section "Pages about Wikidata Bridge on other wikis", is that what you were looking for?

Reply to "Feedbacks from a local Wiki"

Qualifiers as additional information

Jeblad (talkcontribs)

In some cases qualifiers are important. One notable use is in entries for spouse (d:Property:P26), or “ektefelle” at nowiki. See for example w:no:Knut Hamsun, where the row reads “Ektefelle Marie Hamsun (1908–1952)”. (Which comes from Wikidata in this case.)

Lea Lacroix (WMDE) (talkcontribs)

You're absolutely right, qualifiers often carry some meaningful information. We will investigate to find the best way to display them when it's necessary.

Reply to "Qualifiers as additional information"
Jeblad (talkcontribs)

During preparations for use of references from Wikidata for dates, by extending w:no:Module:WikidataDato, we found that some items has a lot of references for birth dates. One of the “worse” is d:Q4295 which has 30 references. That is slightly over the top, and we (I) implemented a simple scoring mechanism for prioritizing a few important (4) references. If such a mechanism is in use in an infobox, then an edit to a reference might not be visible. It may also move a reference to a visible slot, or vacate a visible slot. If the next number of references is not above max number of entries, then a change to a preexisting reference will not make it invisible. The problem arise when the number of entries goes above max number of entries. I believe the solution is to score and limit the entries in well-defined functions instead of a project specific module. That makes it possible to figure out if a specific entry will be visible in the infobox after editing in the gadget.

Current code uses a best score of language of title, root domain of reference url, linked entity, and property use. Code without to much doc is at w:no:Module:References. The code will probably be extended with a few other functions.

Lea Lacroix (WMDE) (talkcontribs)

Thanks for letting us know, this scoring mechanism is indeed very interesting! I'm not sure at what point we can integrate it, but we'll keep in mind the need for sorting and selecting references.

Jeblad (talkcontribs)

About “I'm not sure at what point we can integrate it”, perhaps I don't understand you, but when the scoring and prioritizing is done in a module it is outside the gadgets knowledge and it can't predict how it will be done. If the same thing is done in the Wikibase Client lib for Lua, or a parser function, then the gadget can predict how the infobox will look like and act accordingly.

An alternate solution to automatic ordering could be to manually order the references, but that ordering must then be project specific, which would create additional work load on the users.

Note that my scoring solution is not especially good, it is just one of many ways to do this. It can be viewed as a winner-takes-all on Manhatten metrics, pretty straight forward. If we had better statistics an Bayesian estimator could give better results.

Reply to "Limiting use of references"