Jump to content

Topic on Talk:Reading/Multimedia/Structured Data/Technical requirements

Search reqs for file captions

7
Abittaker (WMF) (talkcontribs)
Smalyshev (WMF) (talkcontribs)

When we search for a file, which things we will be matching against? Currently I see:

  • Page title
  • Page text
  • File caption
  • (?) File description

Unclear: if we search in specific language, should captions/descriptions in other languages match too? I.e. if we're searching for "gift", should we get entries with German word "gift" (which means "poison") or we should exclude these?

What are criteria for ranking the results - i.e. what is the hierarchy of matches against title, text, caption & description? Are there any boosters - like incoming links count, file type, description length, anything else? Note that popular terms can produce lots of matches, we need meaningful order for it to look good.

I am not sure what is relationship between WikibaseMediaInfo and multilingual captions. As far as I understand, WMI creates separate type, and then associates File: namespace page with separate Wikibase page from MediaInfo: namespace. Besides association created by this extension, these two are completely different MediaWiki pages. Are ML captions part of this setup - if so, which part - or are they separate? How this relates to MCR in which one MediaWiki page contains more than one content?

What you want to display in search results when the page has been matched? We usually display page title & snipped of matched page text, but here we have two additional possible fields that we could have a match in. How that would be reflected on display? If we match in different languages, how would that reflect in display? Wikidata patch in review now displays interlingual links/matches with language name in superscript - GiftGerman - may work here too?

CParle (WMF) (talkcontribs)

Hi Stas

> if we search in specific language, should captions/descriptions in other languages match too

The short answer is 'yes'. We haven't considered what to do about cases like 'gift', and tbh initially we want to do this as simply as possible and see how it behaves so probably don't need to consider it just yet. We haven't considered ranking at all - just off the top of my head I'd say we'd like to weight words in the user's language more heavily. Does the work you've done for T178851 cover this a little?

You're right about MediaInfo and File pages. ML captions are just MediaInfo labels. I don't know yet how this relates to MCR, because I don't know enough about MCR - hopefully I'll be remedying this soon

Initially we want to display search results exactly as they're displayed now.

Smalyshev (WMF) (talkcontribs)

OK, but if we are searching all labels and we keeping the display as is, there would be a lot (or some, depends on the query) of search results where the term is not in the displayed position in any way. It might look a bit weird.

T178851 helps with ranking but only a little - Wikidata criteria like sitelink count and labels count would not be very good in Commons. We might use something like incoming link count, that's all that comes to mind right now.

RIsler (WMF) (talkcontribs)

I like how technically detailed this conversation is :)

Just to add a couple of product-level answers:

Ranking:

As Cormac said, we don't know enough yet to decide on ranking rules. We still have to delve into a lot more of the technical stuff before we can answer that. We'd love to hear any ideas on this, and I think that once we're at the prototyping phase we can start testing out what would work best.

Search results layout:

Yes, it might look a little weird if we don't display the text that matched the search result. However, it's important to keep in mind that we're only planning to keep the current search results layout *temporarily*. By the end of the year, it's going to be different (maybe dramatically different). So it may not make much sense to put effort into tweaking the current search results system since it's only going to be around for a few months after initial Caption implementation.

We plan to have at least some high-level wireframes for the new-and-improved search results within the next 4 weeks and we can all have a better idea of direction then.

Smalyshev (WMF) (talkcontribs)

@RIsler (WMF) thanks for explanations! That makes sense. I recognize that I am highlighting some things that may not be implemented right here, but still it is important to keep them in mind as TODOs both to cover them later and to design code in a way that will make it easy to cover them when the time comes.

RIsler (WMF) (talkcontribs)

Yes! Absolutely agree, and thank you for pointing these things out. Feel free to keep on doing that :)

Reply to "Search reqs for file captions"