- For example, if I'm on Wikiquote, do I want to also see relevant search results from Wikivoyage, Wikipedia or Wikinews?
- Or, if I'm on Wikipedia, just show me results from other projects?
Talk:Cross-wiki Search Result Improvements
Do we want these new search results to work across all Wikimedia projects?
I wonder if specific projects have a given relevance for other projects, like Wikitionary have a higher relevance for Wikipedia, and a lower for Wikispecies. It will probably also change given the categorization of pages within the projects. Wikispecies has a high relevance for articles in Wikipedia within biography, but would have a low relevance for art.
If you do a search in a project, then the categories could be used as an indicator for how relevant (likely) some other project would be, given this specific result set. If a project is highly relevant, then the number of hits could be increased from 1 to 3 (just an example, use whatever number).
It really depends on the nature of the question. If someone is looking for the meaning of the Latin word ''vicesimanus'', Wiktionary information will be of most use, and it may not matter which language Wiktionary the results come from, as the word may only appear in a few projects, and might be illustrated with a picture, with a list of translations into other languages, or at least with an explanation in another language besides Latin. Likewise if someone is looking up the pronunciation of a word, or its syllabification for the purposes of hyphenating it, or synonyms. All of these features of a word may be presented on any Wiktionary, and may be found independently of the project language.
I don’t think the average user searching English Wiktionary would be happy with a definition of a Latin term that was in Finnish, Russian, or Chinese—generally in any non-Indo-European language or any language that doesn't use the Latin alphabet. The lack of readable cognates makes those pages useless. Look at the Russian page for gato (Spanish "cat"). If you don't at least know some Cyrillic, you can't get much out of that page. Finnish gato is actually better than I expected, but only because there are some cognates (Espanja, Portugali, and substantiivi). You can translate those pages using your browser or online tools, but I think that's getting into the realm of “power users” unfortunately.
My intuition is that what most people want is results in the language of the project they are on, or projects in the same language. (Exception: when their query is clearly in another language. Exception to the exception: when they are on Wiktionary—which is where I often go for words I don’t know even when they are not in English.) Users could also use results in other languages they can read (which they need to specify or we need to surmise, say, based on browser settings). Only power users and researchers are going to dig into results for languages they don't know. This may change over time as machine translation gets better and people become more sophisticated about handling text in other languages—but I think most people aren't there yet.
I’m open to other opinions on user preferences and the typicality of any given use cases, of course!
However, there may be some technical limitations. We can’t index English Wiktionary both with all the other English projects and with all the other Wiktionary projects. Searching across all Wiktionaries without a shared index is probably too resource intensive to be practical.
Re: "Only power users and researchers are going to dig into results for languages they don't know." I disagree. During the time I was seriously active on Wiktionary, requests for translations into languages the user did not know were very common. We had daily requests for assistance.
Interesting! Requests for translations into, say, Russian, seems very different from using a Wiktionary page in Russian (without machine translation).
For example, if I'm on Wikiquote, do I want to also see relevant search results from Wikivoyage, Wikipedia or Wikinews?Or, if I'm on Wikipedia, just show me results from other projects?
This answers it better than anything (in short both):
To be clear, this "problem" should be expanded to most projects so that anyone can keep jumping from wiktionary or commons to others, and back again in a perfect loop. If nothing else it helps with cross-wiki vandalism detection.
Suggestion: Reduce the weight of the description page and add a reporting tool
One of the drawbacks of simply searching the file description and title is that it can be very unreliable and can easily be used to vandalize. This means that searching for "whore", surfaces obvious vandalism like this , . The image of a famous woman being "shown in results" for whore, or a "molester" (https://en.wikipedia.org/wiki/File:Photo_of_molester.jpg, Carl_Sagan)
- Reduce the weight of the file description page - it can often be misleading and just plain bad especially on non-English wikis.
- Add a reporting tool to make it easier for people to report / remove vandalism - this will also help get more eyes on commons, and potentially somewhat reduce their workload.
- Prioritize media used locally in multimedia results - images used locally might be more relevant to the project and to the search.
We'll be launching the sister project snippets soon and based on a RfC on enwiki, we will not display the multimedia results on the English Wikipedia search results pages. You can test the new look by using this url: https://en.wikipedia.org/wiki/Special:Search?search=~test&fulltext=1&cirrusUserTesting=recall_sidebar_results
Suggestion: Show page image (thumbnail) for wiktionary results
Search results don't provide enough visual clues to make it easy to find content.
While normal search results from commons are unstructured and may result in a lot of false positives. Wiktionary's narrow focus makes it quite useful to use as a "visual" dictionary, and it often avoids controversial or images.
When wiktionary returns search results show, the page image used in the page as a thumbnail show the image:
- A small icon near the wiktionary results (the pageimage)
- A multimedia result at the top of the side box
This seems to be quite good results compared to commons, e.g.:
|Term||Commons||Wiktionary( see image)|
|Honey (mel, honig)|
As demonstrated by the last row, it also returns useful images for the same term in various languages. Until a better system comes around this seems to be a reasonable alternative.
Thanks for the suggestion and samples! We'll take it into account when we start working on the thumbnail icons next to the search results. :)
Ignore image annotations on commons
From https://en.wikipedia.org/wiki/Special:Search?search=ugly&fulltext=1&cirrusUserTesting=recall_sidebar_results&searchToken=362e1qz8z5cc9qtw1y48rb7gj it appears that Cross-wiki search tries to read the (history of) annotations on commons. It finds strings like " ugly stitching error" here: https://en.wikipedia.org/w/index.php?title=Special:Search&profile=images&search=ugly&fulltext=1
Hi @Mduvekot, the testing URL that you're using in your sample is currently awaiting a new code update to not display the commons / multimedia results in the sister project snippets on English Wikipedia.
Suggestion: Use Wikidata to fetch the main image for multimedia results
Problem: As a non-English reader, search may often return irrelevant and inappropriate results due to using the image caption / description or inappropriate page-image.
Background: Consider if someone searches for monkey , and a man shows up due to the label or description of the image. In certain contexts this may be insulting. Aside from that, in non-English wikis, a lot of the times the multimedia results will yield wrong images due to the fact that the image descriptions are not translated.
When the search matches a wikidata item / alias use the image property as the primary image.
(Shows these: https://fr.wikipedia.org/wiki/Fichier:Poisson%20distribution%20lambda%20001.svg, https://fr.wikipedia.org/wiki/Fichier:Poisson%20distribution%20CMF.png, https://fr.wikipedia.org/wiki/Fichier:Poisson%20distribution%20PMF.png)
Thanks for the suggestion, but using wikidata to search Commons for images matching a search query is a bit different than what we're trying to do - right now.
Sometimes the images that are returned could indeed be offensive but we don't censor things in regards to how an image is tagged. We just do the search and then display the top most relevant results.
On other hand, this presents an excellent opportunity to edit images that are mis-tagged or mis-labeled to avoid them from showing in search results where they really don't belong. :)
Using your sample URL, I think if I was on frwiki and I was trying to do a search for a fish and I saw some math diagrams, I'd try a different search term. According to fr.wiktionary, the more common term for 'fish' in French is 'poiscaille' - https://fr.wikipedia.org/wiki/Special:Search?search=poiscaille&fulltext=1&cirrusUserTesting=recall_sidebar_results&searchToken=7bt1at9xmpdr5s4pch1ksd904 which does return one image of a fish.
Or, being redundant in your search query by using 'poisson' and 'fish' returns an image of a fish, a fossilized fish and a large rock structure in the shape of a fish: https://fr.wikipedia.org/wiki/Special:Search?search=poisson+fish&fulltext=1&cirrusUserTesting=recall_sidebar_results&searchToken=cioy7r5b1uycvbaoczumkffvz Still not optimal. Clicking on the multimedia link (on the search results page) does indeed show more images of fishes: https://fr.wikipedia.org/w/index.php?title=Sp%C3%A9cial:Recherche&profile=images&search=poisson+fish&fulltext=1&searchToken=chdtvl1pm555wt1d03fk7ojxb but it needs to have the dual search query terms. Again, not optimal at this time.
We have a new project that is starting soon that will help with this in the long term. The new project will be for structured data in Commons and we will be updating the API for searches like this.
Thanks for the response.
Perhaps it might be prudent to not let perfect be the enemy of good by implementing something like https://phabricator.wikimedia.org/T95026. This would be a short term solution until the structured metadata project comes along and eventually replaces it. Currently people seem to ignore inaccurate images because they aren't really visible. The search interface doesn't surface them except on mobile, and on wikipedia.org.
Once this becomes widely deployed, you're very likely to frequently receive a bunch of "bug reports" of inaccurate multimedia content showing up as every search will potentially show up some image, audio or video, whereas currently only text is shown. Labeling these images won't work because the search engine seems to emphasize the image filename, instead of its description.
The idea is not to only use wikidata, but simply choose 1 image from it in addition to the normal ones that are already displayed.
One alternative idea is simply to pull the pageimage of the matching page. For example, for the "poisson" search string above, the first article in the search results is an exact match, and its page image would be perfect to illustrate the fish. Currently it seems that the search engine relies on simply searching the file namespace or commons for the article title, and it yields worse results than the first image in the article itself:
Even articles with no images may yield some illustration picked up from wikidata, if the task above is solved.
This is pretty bad even on english wikipedia, searching for something as common as a "leg"ː
The article itself contains a pretty strange image too (without the caption it was hard to tell what it is)ː
But wikidata has more easily recognizable images of legsː
I agree that there could be better images for 'leg' but unfortunately, without more context about 'leg' it's hard to get really good results back.
For instance - using 'human leg' works pretty well:
or using 'dog leg' which shows a dogleg (zig-zag) and two images of a dog: https://en.wikipedia.org/w/index.php?search=~dog+leg&title=Special:Search&cirrusUserTesting=recall_sidebar_results&searchToken=d80f0qxtteiygpfabs0c0dmu9
Well, that's partly true. Although leg is a pretty simple term, it wouldn't be that surprising if it at least managed to show a "table leg". There are even more cases where it fails for common terms:
- silver mineral - https://en.wikipedia.org/w/index.php?search=~silver+mineral&title=Special:Search&cirrusUserTesting=recall_sidebar_results&searchToken=cl9rccdv6ojlr3d34mijhmv9q
- human tears - https://en.wikipedia.org/w/index.php?search=~human+tears&title=Special:Search&cirrusUserTesting=recall_sidebar_results&searchToken=14t8ew06qrr90dzyz95mod3kq
The thing to remember is that many non-native English speakers use these resources. It may be quite easy for a native speaker to try and narrow their search but this isn't always possible for someone with limited knowledge especially in cases where a wikipedia / wikimedia project doesn't have resources in their native language. Relevant images would make it far easier to ensure that search results are relevant even before clicking any of them. Consider the discussion in this forum:
That individual would likely recognise the image faster than the text in the article. In fact the article description might just confuse them. A picture is worth a thousand words after all.
Anyway, hopefully structured commons will come eventually.
Thanks for the real-life examples - they're always helpful. :)
I chatted with the Search team about this topic this morning - to be sure there wasn't anything that I was missing. Creating a new method, right now, to search for content on Commons will be a bit of an exercise in futility. Once the Structured Data team ramps up and gets their new format of metadata established, the Search team will incorporate it into the widely used CirrusSearch API and any extra work we do now will be trashed.
The goal behind the sister project search results is to give our readers and editors more information about their search query - to enable discovery into the other projects that maybe they didn't know about.
I'm confident that adding in the new sister project search results will aid in that discovery for millions of users - even though a better method of utilizing our search APIs will be coming in the near future.
For the example about 'paterna' -- if the 'inga feuilleei' term was used instead, it would indeed show lovely images of the fruit the user was hoping to find. Maybe those images could be tagged with the term 'paterna' by some very kind contributors, to make it easier for all to find?
Should we limit the amount of languages we search in?
- i.e.: only use the top 50 languages to implement this in?
- Or, only use the languages that we are detecting queries in an other language than the wiki the user is on?
''within the same language'' is stated clearly in Cross-wiki Search Result Improvements#A New Goal. So start easily.
If no results are found in the current language, it could be considered in a next step to search other languages, starting with those the user seems to understand (for instance: other language wiki's he has contributed to). Top-languages may result in more results, but also require more capacity (larger) and not everyone is capable of reading the top 5-50 languages.
Another option is to stick to related languages, for instance Roman if requested from a Spanish or Itailan wiki or German languages if the request comes from a Danish of Norwegian site.
If the number of languages are not limited, then smaller languages (and projects) will be swamped by hits from larger, and that would be very unproductive.
The actual languages should be the ones the user knows, ie. the babel list of languages. I don't think it should be limited to the language of the source wiki.
My comment above applies here as well: It really depends on the nature of the question. If someone is looking for the meaning of the Latin word ''vicesimanus'', Wiktionary information will be of most use, and it may not matter which language Wiktionary the results come from, as the word may only appear in a few projects, and might be illustrated with a picture, with a list of translations into other languages, or at least with an explanation in another language besides Latin. Likewise if someone is looking up the pronunciation of a word, or its syllabification for the purposes of hyphenating it, or synonyms. All of these features of a word may be presented on any Wiktionary, and may be found independently of the project language.
For myself (and perhaps most logged-in users), I think it would be ideal to allow quick searching from a specified list of wikis. (either by default, or as a new supplementary button in the Special:Search page).
For myself, as a somewhat meta-oriented and monolingual editor, I often want to be able to search (all of, or specific namespaces in):
- English Wikipedia
for example when I'm looking for documentation, or an essay, or an old presentation, or an old discussion.
Speculation: For multilingual logged-in users, I wonder if we could somehow re-use the info that Compact Language Links stores about languages we've purposefully visited? Or use a new shared storage location with a user-configurable override? (i.e. so that I can manually add a list of n languages.)
Yes, we've discussed this in the past and we have a few tickets in Phabricator to investigate further, at least on the wikipedia.org portal page. One ticket would add in a button or a link to switch the query string to search in different languages. Another ticket would display regional language links, but that probably wouldn't fit your need here. Just as an FYI, on the mobile apps, there is a way to preset languages to use for searching and it's fairly easy to switch between multiple languages.
However, you're requesting searching by project or namespace all at once, based on a pre-determined set of sites. I've added that idea into a new ticket to investigate how we could do this. I can envision a variety of ways we could do this, but we'd need to test and see which is more effective and intuitive to users (logged in or not).
Thanks for the suggestion!
Preview of audio files
I tried the searching ogg files example, and some results are shown in a "multimedia card". Since all results are audio files all look the same (a speaker icon), I have to hover to get some more details based on the file name and make several clicks to actually listen to it.
Maybe we should consider a more informative way of preview results when they are audio files. This may involve including part of the name of the file, showing a relevant related image in addition to the speaker, playing the audio on clic/hover, or something else.
“The Plan” needs update
The Plan section looks outdated, showing items (that should be done by now) in the future tense. Can someone from the Engineering or Discovery teams update it?
Will do, thanks for the reminder! :)
How can we help users better understand the functionality of other wikis through verb-driven language?
Suggestion for language on boxes: of TIs there a way to indicate to users what they might find in each Wiki through a verb-driven phrase that prompts them to take an action? This could be hover-text on added tags or banners, or specific verb-driven language. For example, instead of "free dictionary" we may want to say "Look up the meaning of a word" which drives the user to take a specific actioin on Wikidictionary and also tells them what Wikidictionary does, instead of what it is. We might be able to test this, which would indicate might help users understand what these other resources are. srousers what h
Yup - those all sound like great suggestions!
We've got several initial design mocks on the /Design page and these suggestions sound like a new mock candidate(s).
Thanks @DTankersley (WMF)! I'm not sure why additional letters were added to that question! Apologies for that!
I'm happy to write up verb-driven action sentences for each platform, although I don't know if we have standards floating around somewhere.
Hi! A few examples would be cool to put on a mock, but I don't know if there are standards written up anywhere. :)
Awesome. Here are some examples, but these are drafts (and I am very open to feedback and edits...)
Wikidictionary - Look up the meaning of a word
Wikiquote - Discover quotes from thousands of people
Wikibooks - Read free textbooks about a variety of subjects
Wikipedia - Read a free encyclopedia that anyone can edit
Happy to contribute more if these are useful!
Will the display of the additional search results from other wikis encourage contributions from editors?
- i.e.: if you search for
Piazza del Duomoand don't see a Wikivoyage article about it (while I'm searching on Wikiquote), would that encourage you to start an article for it?
I believe so, but this is a research issue. (The problem is formulated as a question about convictions and feelings.)
If the intention is to replace the existing search function with the proposed new one, then the answer to the question will vary by project. If someone is searching from a Wikipedia, it may encourage work on the other projects. If someone is searching on Wiktionary, it may promote contributions to Wiktionaries in other languages. But a search from projects like Wikisource would pull contributors away from a project that is already struggling to gain new contributors, and would be irritating to users who are trying to search Wikisource itself.
Hi @EncycloPetey - thanks for the feedback.
We're initially thinking of only putting the additional relevant search results that are gathered from the sister projects on the Wikipedia project itself. We'd like to see how this new feature is used by the broader community before determining if the new search results would work on the other projects (i.e.: searching Wikisource might not need or want results from Wikipedia or Wikivoyage).