Topic on Talk:Cross-wiki Search Result Improvements

Suggestion: Use Wikidata to fetch the main image for multimedia results

8
197.218.80.233 (talkcontribs)

Problem: As a non-English reader, search may often return irrelevant and inappropriate results due to using the image caption / description or inappropriate page-image.

Background: Consider if someone searches for monkey , and a man shows up due to the label or description of the image. In certain contexts this may be insulting. Aside from that, in non-English wikis, a lot of the times the multimedia results will yield wrong images due to the fact that the image descriptions are not translated.

Proposed solution:

When the search matches a wikidata item / alias use the image property as the primary image.

Compare:

https://fr.wikipedia.org/wiki/Special:Search?search=poisson&fulltext=1&cirrusUserTesting=recall_sidebar_results&searchToken=4dyah5jglulkped63h48gu00m

(Shows these: https://fr.wikipedia.org/wiki/Fichier:Poisson%20distribution%20lambda%20001.svg, https://fr.wikipedia.org/wiki/Fichier:Poisson%20distribution%20CMF.png, https://fr.wikipedia.org/wiki/Fichier:Poisson%20distribution%20PMF.png)

VS

https://www.wikidata.org/wiki/Q152#P18

(https://commons.wikimedia.org/wiki/File:Abramis_brama_drawing.jpg)

DTankersley (WMF) (talkcontribs)

Thanks for the suggestion, but using wikidata to search Commons for images matching a search query is a bit different than what we're trying to do - right now.

Sometimes the images that are returned could indeed be offensive but we don't censor things in regards to how an image is tagged. We just do the search and then display the top most relevant results.

On other hand, this presents an excellent opportunity to edit images that are mis-tagged or mis-labeled to avoid them from showing in search results where they really don't belong. :)

Using your sample URL, I think if I was on frwiki and I was trying to do a search for a fish and I saw some math diagrams, I'd try a different search term. According to fr.wiktionary, the more common term for 'fish' in French is 'poiscaille' - https://fr.wikipedia.org/wiki/Special:Search?search=poiscaille&fulltext=1&cirrusUserTesting=recall_sidebar_results&searchToken=7bt1at9xmpdr5s4pch1ksd904 which does return one image of a fish.

Or, being redundant in your search query by using 'poisson' and 'fish' returns an image of a fish, a fossilized fish and a large rock structure in the shape of a fish: https://fr.wikipedia.org/wiki/Special:Search?search=poisson+fish&fulltext=1&cirrusUserTesting=recall_sidebar_results&searchToken=cioy7r5b1uycvbaoczumkffvz Still not optimal. Clicking on the multimedia link (on the search results page) does indeed show more images of fishes: https://fr.wikipedia.org/w/index.php?title=Sp%C3%A9cial:Recherche&profile=images&search=poisson+fish&fulltext=1&searchToken=chdtvl1pm555wt1d03fk7ojxb but it needs to have the dual search query terms. Again, not optimal at this time.

We have a new project that is starting soon that will help with this in the long term. The new project will be for structured data in Commons and we will be updating the API for searches like this.

197.218.91.5 (talkcontribs)

Thanks for the response.

Perhaps it might be prudent to not let perfect be the enemy of good by implementing something like https://phabricator.wikimedia.org/T95026. This would be a short term solution until the structured metadata project comes along and eventually replaces it. Currently people seem to ignore inaccurate images because they aren't really visible. The search interface doesn't surface them except on mobile, and on wikipedia.org.

Once this becomes widely deployed, you're very likely to frequently receive a bunch of "bug reports" of inaccurate multimedia content showing up as every search will potentially show up some image, audio or video, whereas currently only text is shown. Labeling these images won't work because the search engine seems to emphasize the image filename, instead of its description.

The idea is not to only use wikidata, but simply choose 1 image from it in addition to the normal ones that are already displayed.

197.218.91.214 (talkcontribs)

One alternative idea is simply to pull the pageimage of the matching page. For example, for the "poisson" search string above, the first article in the search results is an exact match, and its page image would be perfect to illustrate the fish. Currently it seems that the search engine relies on simply searching the file namespace or commons for the article title, and it yields worse results than the first image in the article itself:

https://fr.wikipedia.org/wiki/Poisson

https://upload.wikimedia.org/wikipedia/commons/thumb/2/23/Georgia_Aquarium_-_Giant_Grouper_edit.jpg/330px-Georgia_Aquarium_-_Giant_Grouper_edit.jpg

Even articles with no images may yield some illustration picked up from wikidata, if the task above is solved.

197.218.91.148 (talkcontribs)

This is pretty bad even on english wikipedia, searching for something as common as a "leg"ː

https://en.wikipedia.org/w/index.php?search=leg&title=Special:Search&uselang=en&fulltext=1&cirrusUserTesting=recall_sidebar_results

Imagesː

https://en.wikipedia.org/wiki/File:Wooden%20Leg%201913.jpg

https://en.wikipedia.org/wiki/File:Las%20Limas%20left%20leg.svg

https://en.wikipedia.org/wiki/File:Las%20Limas%20right%20leg.svg

The article itself contains a pretty strange image too (without the caption it was hard to tell what it is)ː

https://en.wikipedia.org/wiki/Leg

https://upload.wikimedia.org/wikipedia/commons/thumb/3/30/InsectLeg.png/220px-InsectLeg.png

But wikidata has more easily recognizable images of legsː

https://www.wikidata.org/wiki/Q133105

Images

https://commons.wikimedia.org/wiki/File:Legs_of_woman.jpg

https://commons.wikimedia.org/wiki/File:Beine.JPG

DTankersley (WMF) (talkcontribs)

I agree that there could be better images for 'leg' but unfortunately, without more context about 'leg' it's hard to get really good results back.

For instance - using 'human leg' works pretty well:

https://en.wikipedia.org/w/index.php?search=~human+leg&title=Special:Search&cirrusUserTesting=recall_sidebar_results&searchToken=5hsh03wb9249tpoued9vs818s

or using 'dog leg' which shows a dogleg (zig-zag) and two images of a dog: https://en.wikipedia.org/w/index.php?search=~dog+leg&title=Special:Search&cirrusUserTesting=recall_sidebar_results&searchToken=d80f0qxtteiygpfabs0c0dmu9

197.218.90.68 (talkcontribs)

Well, that's partly true. Although leg is a pretty simple term, it wouldn't be that surprising if it at least managed to show a "table leg". There are even more cases where it fails for common terms:

The thing to remember is that many non-native English speakers use these resources. It may be quite easy for a native speaker to try and narrow their search but this isn't always possible for someone with limited knowledge especially in cases where a wikipedia / wikimedia project doesn't have resources in their native language. Relevant images would make it far easier to ensure that search results are relevant even before clicking any of them. Consider the discussion in this forum:

http://ell.stackexchange.com/questions/87976/is-there-an-english-word-for-the-fruit-we-call-paterna-in-el-salvador

That individual would likely recognise the image faster than the text in the article. In fact the article description might just confuse them. A picture is worth a thousand words after all.

Anyway, hopefully structured commons will come eventually.

DTankersley (WMF) (talkcontribs)

Thanks for the real-life examples - they're always helpful. :)

I chatted with the Search team about this topic this morning - to be sure there wasn't anything that I was missing. Creating a new method, right now, to search for content on Commons will be a bit of an exercise in futility. Once the Structured Data team ramps up and gets their new format of metadata established, the Search team will incorporate it into the widely used CirrusSearch API and any extra work we do now will be trashed.

The goal behind the sister project search results is to give our readers and editors more information about their search query - to enable discovery into the other projects that maybe they didn't know about.

I'm confident that adding in the new sister project search results will aid in that discovery for millions of users - even though a better method of utilizing our search APIs will be coming in the near future.

For the example about 'paterna' -- if the 'inga feuilleei' term was used instead, it would indeed show lovely images of the fruit the user was hoping to find. Maybe those images could be tagged with the term 'paterna' by some very kind contributors, to make it easier for all to find?

https://en.wikipedia.org/w/index.php?search=~Inga+feuilleei&title=Special:Search&cirrusUserTesting=recall_sidebar_results&searchToken=20m0hy0wegstk363rj4jmb7no

Reply to "Suggestion: Use Wikidata to fetch the main image for multimedia results"