User:TJones (WMF)/Notes/Analysis of Abandoned Queries—English, Spanish, French
March/April 2025 — See TJones_(WMF)/Notes for other projects. See also T375554.
Summary
[edit]There's no obvious single smoking gun to explain why searchers abandon their fulltext queries, which is not really a surprise. English, Spanish, and French have very broadly similar patterns of queries and detectable searcher behavior. German, Italian, Portuguese, and Russian do not have obviously wildly different patterns of queries, though Russian does have more queries that are not just straightforward "human queries in the home language of the wiki".
There are a lot of well-formed queries by people who seem to understand how to search Wikipedia. Some of those queries don't find anything (because the thing they seek does not (yet) exist on the wiki). Other times, the obvious result is given as a suggestion and as one of the top fulltext results, but people don't click on them.
Some people don't know how to search particularly well, and search with questions, homework problem statements, etc. Sometimes these sub-optimal queries still get pretty good search results.
Some people seem to be looking at a series of related things (European explorers, car models, insect species, whatever) and they just stop. Maybe they got tired or bored and went to go do something else, and their trip down the wiki rabbit hole just happened to end on the Special:Search page.
"One-shot" queries—those with one search in a session of a fully formed query, with no autocomplete—may be bots/tools/scrapers that are using the results for something other than navigation.
Our notion of "abandoning" a fulltext query (i.e., a query with no clicks) may need some refinement, since our autocomplete is so good and handles so much of our search traffic. Some users clearly navigate away from "abandoned" fulltext queries by using autocomplete to take them to the next page (or several pages) of interest.
- We might want to reconsider whether we should count a search session as abandoned if the searcher clicks on an autocomplete suggestion after a fulltext search.
- If we look at abandoned queries again, we might want to try to get a more complete version of the searcher/reader's path through our pages.
Trying to automate the detection of some of these situations and/or searcher intents seems like it would be fairly difficult!
General improvements to search quality, query handling, second-try searching, and presentation may help people find information more easily, but we are always going to have people just ending their wiki spree on the search page.
Analysis of Abandoned Queries—English, Spanish, French
[edit]Background
[edit]When a searcher gets a page of fulltext search results, doesn't click on any of them, and then leaves their search session, that is an abandoned search. The simplest interpretation of search abandonment is that searchers didn't find what they are looking for and the search "failed", but "good abandonment" is also a possibility. For example, if you want to know Mister Rogers' middle name, you can search for Mister Rogers and get the answer you wanted from the snippet on the fulltext result page without clicking on anything.
There are different kinds of "failed" search abandonment, too. Sometimes information a searcher wants just isn't in Wikipedia. If you search for the name of a moderately popular YouTuber and no page with that title comes up, you can be pretty sure there is no page on that person. There may be some information about them in other articles, but not the full article you were hoping for. Other queries on Wikipedia can be classified as "not encyclopedic", like Alton Brown's aged eggnog recipe, which is not something you'd expect to be on Wikipedia.
The worst failures are when the information is there, but the searcher can't find it. This could be because their search strategy doesn't work very well (e.g., asking how many people climbed the Eiffel Tower before the elevators were operational?) or having the answer presented to them in a way they don't recognize (e.g., searching for corn and getting the article for "Maize" and not realizing that is the main article about corn and not clicking on it).
Other potential reasons for "abandoning" queries include just being done with your current journey down the Wiki Rabbit Hole, leaving your computer for too long (we arbitrarily define a "search session" as ending after 10 minutes without a new search), etc., etc., etc.
Our Task
[edit]In an attempt to understand why searchers abandon queries and what we might be able to do to to improve their experience, I set out to review 50 search sessions that ended in query abandonment from each of English, Spanish, and French Wikipedias. Thanks to Erik for pulling samples of search sessions for a bunch of languages/wikis! The three specific languages for the first analysis were chosen because I can read them at least reasonably well.
Note: Fifty search sessions is not a lot, but we are looking for big patterns, like we have seen with zero-results queries. All our measurements are around ±10% (maybe less if they are all similar across languages). We don't need to know specifically whether 58.4% or 73.1% of abandoned queries seem to be looking for titles.. the fact that it is roughly ⅔ is very useful to know. Similarly, it doesn't matter whether it's precisely 0.9% or 2% or 6.1% of abandoned queries that are complete gibberish.. it's not "most" or "nearly half" or even "a good chunk", so they are worth looking into.
User intent and search "success" are notoriously hard to pin down, and I think even more so for Wikipedia than for, say, e-commerce sites, which can assume most searchers are looking for something to but, and can define success (for the site, if not the searcher) in terms of purchases made. On Wikipedia, we can't necessarily tell that a searcher wanted to learn Mister Rogers middle name (and succeeded without a click), or wanted to know how many people climbed the Eiffel Tower (and clicked the right article but didn't read far enough to actually get the answer).
The description of the analysis task in T375554 is much too ambitious. It isn't always possible to discern the searcher's intent, let alone their reason for abandoning their query. Some reasons—like a sudden loss of Wi-Fi, or being called to dinner—are inherently impossible to determine from search logs.
I struggled at first to even find any patterns in the data that I could identify and label. Retracing the search sessions helped me understand what searchers were probably thinking about and searching for—especially visiting the pages the searchers were on when they search for something new, and seeing snapshots of their queries (and how many suggestions they were possibly seeing) as they typed them in the search box.
I settled on labelling fairly objective attributes of searches, suggestions, results, and searcher behavior, and often digging deeper into their topics than the searchers had (at least while on Wikipedia... like me, they may have turned to internet search engines for more information or context).
Retracing search sessions also gave me a fresh perspective on our search results. Even when casually reading Wikipedia, I can't help but be aware of good search strategies for finding information, and I think I tend to search for topics I either have some familiarity with (making results easier to navigate) or for which I expect a specific page on the topic as a result. Looking at results from less expert searchers and on unfamiliar topics resulted in some observations and possible future task ideas that may not necessarily be about abandoned searches in particular.
Observations and Patterns
[edit]Some observations and recurring patterns I've found in the search sessions. So far, these are only for English. I'll update with observations and examples from Spanish (marked ESWIKI) and French (marked FRWIKI) as I work on them. (Note that ESWIKI and FRWIKI observations and ideas aren't necessarily specific to those data sets; I've just marked them so people following the page can find new additions without having to scour the page diffs.)
- People seem to see something on a page (like the name of a famous person's relative), search for that thing, and find that there is no specific page for that thing, and the page they were on was probably the best one.
- Alternatively, in some cases it may be that they don't notice the thing is on the page they are on (because it is not the main topic of the page, as with a disambiguation page) and they search for the thing rather than look for the info or link on the current page.
- Sometimes a reasonable result will not be obvious because the snippet doesn't actually include the relevant bit of the article.
- I found one example where the results are mostly "sidebar spam", where a large portion of matches to a fairly rare term (e.g., 74 out of the first 100) matched fixed text from a template of some sort. (Hence, "sidebar spam" and not "template spam".. some templates generate focused, custom text based on their params.) This decreases effective precision since the spammy results are not very good.
- Once I realized I had missed a couple of good DYM suggestions, I started paying more attention. Suggestions for "title searches", which look like Wikipedia titles, are often good or okay. Suggestions for "non-title searches", which are often phrases or short descriptions, are often pretty bad. Unfortunately, it is hard to say why the suggestions aren't great or how to make them better... but "usb pope" just doesn't feel like it should have been a thing.
- They aren't super easy to come up with, but there are examples where not being able to autocomplete against "Lastname, Firstname" is especially frustrating. Hoeing is an uncommon surname, and there is one exact match (a redirect to "Hoe (tool)"), but other than that, all the autocomplete suggestions are typo matches ("Homing", "Hoengseong", "Hodding", "Hovingham", "Honing", "Horinger"..) despite there being a partial title match for "Bryan Hoeing", and partial redirect match for "Bernd Hoeing" and "Ketil Hoeing" (all of which do show up in the fulltext results, so it's far from a disaster).
- This is in contrast with searches like Smith, which doesn't match "Smith, John", but at least that is because there are a gajillion good matches to titles that actually start with "Smith" or "Smith..." ("Smithsonian", "Smithfield", "Smithtown") rather than a bunch of typo-looking suggestions.
- Autocomplete nukes links to sections within pages from redirects. For example, the redirect "Default (film)" links to "Default#Other_uses", but following it from autocomplete takes you to the top of the "Default" page.
- Sometimes people ignore good autocomplete suggestions, which is sad when they then get poor fulltext results.
- FRWIKI: While looking at the French Wikipedia data, I properly recognized a pattern where people will continue using autocomplete after "abandoning" their last fulltext query. Sometimes they navigate to other pages and continue using autocomplete on those other pages. I'm not sure how to interpret additional autocomplete searches with no clicks; maybe a last gasp before really giving up. However, when they visit other pages—sometimes several more pages—these technically count as "abandoned" searches but they aren't necessarily "unsatisfied" in finding the information they are looking for.
- I think some of these abandoned fulltext searches ''could'' be interpreted as the result of relying too heavily on autocomplete, in that either the searcher isn't looking at the list of results, or the top result is the page they are after—and then they just hit "enter" or click "search". However, because of a typo, they go to the fulltext search results rather than the page they meant to. Then they just go back to autocomplete—either by going to the search box or going back a page in their browser—and then they type a little more carefully, either the same query or something else.
- I went back and looked more carefully at the English and Spanish samples for "post-abandonment autocomplete" and found it in 14% and 8% of sessions, though page visits are much rarer (0% and 8%) with no cases of multiple page visits after abandoning their fulltext query (which is why it didn't register earlier). French matches English with 14% "post-abandonment autocomplete" and 8% with page visits.
- FRWIKI: I'm seeing more evidence that ignoring certain prefixes in titles would be helpful. We've talked more about things like ''List of'' in English, but in French I'm seeing that just simple stop words like ''le'' ("the") would be helpful. I expect it would show up in any statistically derived list, but it might make sense to include some/all stop words. (I can imagine that ''le/la/les'' in French and ''el/la/los/las'' in Spanish are not all equally common in actual usage.)
- We'd definitely have to think about corner cases like "The The".. and "To be, or not to be". (And I just learned that "List of" is a page on enwiki!) Maybe not allowing arbitrary combinations of stop words as prefixes, just single stop words, plus statistically derived multi-word phrases, and maybe having some limit on percentage of chars or tokens that can be stripped... lots of tweaking and corner cases to be considered for ignoring title prefixes.
- FRWIKI: What I call "snippet hints" are something I only started recording when I got to the French Wikipedia data, though I had seen them a few times in English and Spanish. This happens when the fulltext results don't really include a reasonable target page (no "good" or "decent" results), but one or more of the top snippets includes information that is useful in getting to the target page, because it has a correct spelling, or a synonym, or other information that is helpful.
- On French Wikipedia, searching for ''platisme'' ("flat-eartherism") gets a somewhat related page ("Figure de la Terre") with a snippet that includes "Le platisme (mythe de la Terre plate)...". "Mythe de la Terre plate" is the target page, and ''platisme'' is a synonym that is apparently rarely used. (There's is now a redirect from "Platisme" to "Mythe de la Terre plate".)
- Also on French Wikipedia, the typo ''Kouang-Tchou-Wan'' gets as a result a snippet that includes the correct spelling "Kouang-Tchéou-Wan". This is totally random, because ''Tchou'' just happens to appear in the article somewhere else, and the snippet just happened to focus on ''Kouang'' and ''Wan''.
- I sort of ignored this category at first because it isn't at all a reliable method of finding information or reformulating queries. It's happenstance (the synonym expansion of ''platisme'') or dumb luck (the ''Kouang-Tchou-Wan'' example).
Additional Miscellaneous Observations:
- Event ordering is hard. It looks like the timestamps are server-based, which are more accurate, but less precise. (Server-side time is less precise because of internet lag. Client-side time can be less accurate because it can be set to 1972.) Some events can be re-ordered by assuming reasonable editing/typing order. Others are a mystery.
- FRWIKI: It may be too difficult to collect or to tie it all together, but I wonder if we'd have a clearer idea of what people were doing or thinking if we had more information about their click path on the rest of the site. Do people also abandon Wikipedia after abandoning their search, or do they go back in their browser to read a page they had already visited and then click through to something else? There may not be anything there, or it may be hard to interpret, but I have found that seeing the pages people were on before their autocomplete and fulltext queries does ''sometimes'' make it easier to interpret ambiguous queries and/or typos. More context might make intent/satisfaction clearer.. though not necessarily in an automatable way.
- Main pages have ID's that are very big numbers! The enwiki main page is ID 15580374. Weird.
- 10 minutes can be a loooooong time for a search session; new queries 5 minutes apart often seem like unrelated topics—though other times there's a theme that runs for more than ten minutes.
- There exist redirects that differ only by plurals—"security camera" (the device) vs "security cameras" (the general idea of surveillance)... that's a very fine-grained distinction!
- As a searcher myself who issues Go box searches from my browser bar, and looking at some examples, I think it could help to try to match queries to titles with all "non-text" removed (e.g., "long legs" == "longlegs" or "avengers comics" == "Avengers (comics)" or "john smith architect" == "John Smith (architect)" or even "long legged buz zard" == "long-legged buzzard" == "long—legged buzzard"... did you catch the subtle dash difference?).
- I've noticed that DYM suggestions, and the fact that you are getting DYM results can be kind of hard to see. This seems to be more of a problem for me when searching for things I am unfamiliar with, because there is more unfamiliar stuff on the page pulling away my attention.
- ESWIKI: DYM suggestions can be blatantly ungrammatical. For example, on Spanish Wikipedia, the DYM suggestion changed an adjective from feminine to masculine, making it no longer agree with the noun it modified.
- ESWIKI: DYM suggestions can be be practically almost identical. On Spanish Wikipedia, when the DYM suggestion changed an adjective from feminine to masculine, the underlying tokens after stemming are the same. (The plain field could come up with different matches between the two queries, affecting ranking a fair amount if matches are in the title, for example. Intuitively, though, it feels like DYM suggestion is essentially the same query.)
- I think it may be confusing to some searchers, but as a developer/data analyst/power user I like the fact that the Special:Search big search box shows redirect titles in the autocomplete suggestions instead of the resolved page titles. I want to see what's being matched, not where the link will take me. Both are good info, but the current matching string is more valuable in understanding the results, especially with partial word matches.
- Some queries are well-formed, wiki-savvy queries for something that just doesn't exist on Wikipedia yet, for example the medieval monster / drug called the cefusa. These can be niche topics, but that's why people are looking to Wikipedia. But some (people, organizations, possibly medieval monsters) are not notable enough to be included, or high-priority enough to be included yet, which is fair.
- Maybe a full thesaurus is too much to ask for, but the old lsearchd hack that equated movie and film was a winner!
- Google, without notability requirements or limits to encyclopedic content, often has more information.
- Google's AI Overview does not help nearly as often its fulltext results. Mostly it helps on queries that look more like phrases or sentences than keywords.
- I've noticed that DuckDuckGo has a smaller index than Google. I have not checked how often it gets results.
- FRWIKI: Some of the minor nobility of the 19th century have lots of variation in their names in different languages. Karl Theodor, Duke in Bavaria (enwiki) is Carl Theodor in (his presumably native) German, Carlos Teodoro in Spanish & Portuguese, Charles-Théodore in French, Carlo Teodoro in Italian, Karel Theodoor in Dutch, and Karol Teodor in Polish. Other than German, Polish is the only Wikipedia that correctly lists his German name. A few others list his German name as part of the eye clinic he opened, but several don't have "Carl Theodor" anywhere on the page. English Wikipedia uses "Karl Theodor" in the title and throughout the article, "Carl Theodor" in the German name of the eye clinic, and "Charles Theodore" in the English name of the clinic. This kind of thing can happen even more so with names in transliteration, such as Russian, Chinese, or Arabic names. Even old C/Karl T⁽ʰ⁾eo has variations in his name in Cyrillic, depending on the language being transliterated into.
- Someone on French Wikipedia searched for his name using the German version of his name—they never had a chance! The Wikidata widget did make the match, but there were a lot of not-so-good fulltext matches pushing it down the screen.
Tags—A Sad Ontology
[edit]This is the current vaguely ontology-like system of tags I'm using to annotate abandoned queries. The list may expand as I continue tagging queries in Spanish and French.
- page
- main/search page—searching from the main page, search page, or other page with no page ID
- on relevant page—searching from a page that seems fairly relevant to their search; target info may be on or linked to from that page
- on non-relevant page—searching from a page that does not seem relevant to their search (may be in the same broad category, like songs or films, though)
- missing link—the obvious text for a link is there, but there is no link, making a potentially relevant page non-relevant
- on the same page—searching for a page that seems to be the page they are on (which may not be obvious until after searching, e.g. Patrícia Bündchen)
- search type
- title search—this seems like a reasonable name for a Wikipedia page (john smith)
- non-title search—this is not a reasonable name for a Wikipedia page (john nasa moonwalk space shuttle)
- title + details—a reasonable title plus extra details (john smith height)
- question—phrased as a question or partial question (how tall is john smith)
- cut-n-paste—query seems to be a chunk of text cut-n-pasted from some other source; may include punctuation, formatting, or special characters
- homework—query seems to be the copied text of a homework problem (special case of cut-n-paste)
- non-encyc—query is non-encyclopedic
- wrong part of speech—query is an encyclopedic topic, but not a typical title noun (geological rather than geology); also possible that search is incomplete (should be geological engineering)
- url—query is a URL, and so it looks searching the wiki was accidental
- wrong site—query includes a keyword (like google) that seems to be intended as a browser shortcut or as a keyword to get results from a specific site, and so it looks searching the wiki was accidental
- more specific info—seems like query is a follow-up to the current page or previous query
- wdqs—seems like the kind of question that could be answered by a WDQS query if the relevant page/list doesn't exist
- autocomplete suggestions
- no auto—this session didn't include autocomplete
- zero suggestions—by the end of the query, there were no autocomplete suggestions
- no obvious suggestion—none of the suggestions look relevant
- decent suggestion—based on current and previous queries, the target is in the top 10 suggestions
- good suggestion—based on current and previous queries, the target is in the top 3 suggestions
- intermediate—there was a good or decent suggestion for a prefix of the final query (e.g., too many typos gave no results)
- hidden redirect—there is a good or decent suggestion, but if you don't know something about what you are searching for, it might not look relevant
- fulltext results
- zero results—fulltext had no results
- no obvious result—query got results, but there's nothing obvious that addresses the query
- decent result—based on current and previous queries, the target is in the top 10 results
- good result—based on current and previous queries, the target is in the top 3 results
- cross-language—cross-language results are provided, and they are good or decent
- wrong guess—language detection made the wrong guess (but still got some results)
- cross-project—cross-project (sister search) results are provided, and they are good or decent
- everything--searching "Everything" in the search interface or "All" in the Advanced Search widget gets good or decent results, because the target is something like a Category, rather than an article
- cross-language—cross-language results are provided, and they are good or decent
- dym—did you mean gave a suggestion that gave a good or decent result
- bad dym—did you mean rolled over on zero results to bad results
- snippet hint—while there was no result that was really the target result, at least one top-three snippet provides useful information (typo correction, another name for what's being searched, etc.) [French only]
- page status
- page exists—there seems to be a page/redirect for the main topic of the query
- section exists—there seems to be a sub-part of a page (section, table, paragraph) for the main topic of the query
- unclear snippet—the snippet does not make it clear that the desired info is on the page
- page does not exist—there does not seem to be a page for the main topic of the query
- wrong language—there is an obvious good/exact match on another Wikipedia, where the target is more linguistically or culturally relevant
- wrong script—search is in another language, but also non-Latin script, greatly decreasing the likelihood of suggestions or results
- wrong project—there is an obvious good/exact match on another project—same language (like Wiktionary), or Commons or Wikidata
- query details
- typos—there are one or more obvious typos in the query
- space—there is a typo that is either an extra space or a missing space (can be harder to detect and correct)
- spelling—misspelling is a reasonable word (john smits vs john smith) (can be harder to detect and correct)
- reformulation—there is some query reformulation, either in the autocomplete sequence, or in subsequent fulltext queries
- non-terminal editing—looks like the searcher edited the middle (or, less often, the beginning) of the autocomplete query string, which makes good early autocomplete suggestions much harder
- needs quotes—quoting the query string would give notably better fulltext results
- unclear—query is very hard to interpret; can be not exactly gibberish, but has no clear meaning (57.82334)
- gibberish—might be actual gibberish (fdhkds), but not something unknown to Wikipedia (XKCD) or Google;
- typo—possible typo/possible spelling variant (e.g., flickr, if Flickr weren't well-known)—especially without much context
- incomplete—query is a phrase that seems incomplete, making the intent unclear (where did john smith)
- ambiguous—query is so ambiguous that intent is unclear (without more context)
- search features—the query uses search features, like quotes
- typos—there are one or more obvious typos in the query
- external sources
- google—Google understands or corrects the query well enough, indexes non-encyclopedic or non-notable content (note that I only check Google when I can't find the information on the wiki using the given query or obvious related queries (like corrected typos))
- non-encyc—Google results are not encyclopedic information, even though the query could be encyclopedic
- AI—best results were in AI overview
- no google—Google results are the same or worse than Wikipedia results
- enwiki—English Wikipedia has a page for the topic/subject (without translation), even though the subject's apparent "home" wiki does not (e.g., a Latin American politician has an article on enwiki, but not eswiki)
- google—Google understands or corrects the query well enough, indexes non-encyclopedic or non-notable content (note that I only check Google when I can't find the information on the wiki using the given query or obvious related queries (like corrected typos))
- searcher behavior
- one-shot—session is one query, directly to fulltext, no autocomplete, no page visits
- rabbit hole—searcher seems to be looking at a number of related pages (≥3)
- post-abandonment autocomplete—searcher keeps typing autocomplete queries after "abandoning" fulltext search
- page visits—including visits to one or more pages
Tag Stats
[edit]Here are the number of queries (out of 50) tagged with each tag.
| fr | es | en | tag |
| page | |||
| 44% | 32% | 50% | main/search page |
| 38% | 44% | 18% | on non-relevant page |
| — | 2% | — | --missing link |
| 10% | 18% | 20% | on relevant page |
| 8% | 6% | 12% | on the same page |
| search type | |||
| 58% | 50% | 66% | title search |
| 2% | — | — | --wrong part of speech |
| 40% | 38% | 32% | non-title search |
| 2% | 8% | — | --(non-title search) |
| 6% | 2% | 14% | --non-encyc |
| 14% | 18% | 14% | --title + details |
| 12% | 2% | 4% | --question |
| — | 2% | — | --homework |
| — | 4% | — | --cut-n-paste |
| — | 2% | — | --wrong part of speech |
| 4% | — | — | --url |
| 2% | — | — | --wrong site |
| 8% | 6% | 4% | more specific info |
| — | 2% | — | wdqs |
| autocomplete suggestions | |||
| 12% | 12% | 20% | no auto |
| 60% | 72% | 44% | zero suggestions |
| 6% | 4% | 16% | no obvious suggestion |
| 2% | — | 2% | decent suggestion |
| 18% | 12% | 12% | good suggestion |
| — | — | 6% | good suggestion--hidden redirect |
| — | 6% | 6% | intermediate good suggestion |
| — | 4% | — | intermediate decent suggestion |
| fulltext results | |||
| 28% | 16% | 24% | zero results |
| 36% | 34% | 38% | no obvious result |
| 4% | 4% | 4% | decent result |
| 28% | 40% | 34% | good results |
| 2% | 8% | 4% | cross-language--good result |
| — | 2% | — | cross-language--wrong guess |
| 8% | — | 4% | cross-project--good result |
| 2% | — | — | everything--good result |
| 2% | — | — | everything--decent result |
| 10% | 6% | 6% | dym |
| 10% | 8% | 6% | bad dym |
| 12% | ** | ** | snippet hint |
| page status | |||
| 24% | 26% | 22% | page does not exist |
| 50% | 52% | 36% | page exists |
| 22% | 16% | 20% | section exists |
| 4% | 8% | 12% | unclear snippet |
| 14% | 4% | 4% | wrong language |
| 4% | 8% | 4% | wrong project |
| 2% | — | 6% | wrong script |
| query details | |||
| 18% | 20% | 24% | typos |
| 16% | 16% | 14% | --typos |
| — | — | 4% | --space |
| 2% | 4% | 6% | --spelling |
| 4% | 10% | 10% | reformulation |
| 2% | 4% | 10% | non-terminal editing |
| 2% | 2% | 2% | needs quotes |
| 14% | 12% | 8% | unclear |
| 8% | 8% | 2% | --ambiguous |
| 2% | — | 2% | --gibberish |
| — | — | 2% | --incomplete |
| — | 4% | 2% | --typo |
| — | — | 2% | search features |
| external sources | |||
| 24% | 36% | 50% | |
| 16% | 28% | 42% | --(google) |
| 8% | 6% | 4% | --AI |
| — | 2% | 4% | --non-encyc |
| 18% | 8% | 6% | no google |
| — | 2% | — | enwiki |
| searcher behavior | |||
| 18% | 10% | 26% | one-shot |
| 6% | 4% | 4% | rabbit hole |
| 14% | 8% | 14% | post-abandonment autocomplete |
| 8 | — | 8% | --page visits |
** I wasn't tracking snippet hints fro English and Spanish and I have not gone back to figure them out for those wikis.
Below is a table of suggestions vs results for the sample of English abandoned queries.
| enwiki | no auto | zero sugg | no obv sugg | decent sugg | good sugg |
| zero res | 4 | 4 | 2 | 2(1)*‡ | |
| no obv res | 2 | 12 | 3 | 1‡ | 1‡ |
| decent res | 1† | 1 | |||
| good res | 4 | 5† | 3† | 5(2)* |
Columns show the quality/presence of autocomplete suggestions, rows show the quality/presence of fulltext results. Cells have the number of queries in each intersection of categories. The "no auto" column is for queries (10/50) that didn't use the search bar and so had no autocomplete suggestions.
- 3/40 queries had good suggestions, but they were somewhat hidden by resolved redirects (counts in parens, marked with *).
- 9/40 queries had poor suggestions, but worthwhile fulltext results (marked with †)
- 4/40 queries had worthwhile suggestions, but poor fulltext results (marked with ‡)
The last two groups are highlighted in the table in yellow, since one of the two methods (autocomplete suggestions and fulltext results) provided worthwhile options.
The same table for Spanish abandoned queries:
| eswiki | no auto | zero sugg | no obv sugg | decent sugg | good sugg |
| zero res | 8 | ||||
| no obv res | 2 | 12 | 1 | 3‡ | |
| decent res | 1† | 1 | |||
| good res | 4 | 12† | 1† | 3 | |
| unclear | 3 |
- 14/41 queries had poor suggestions, but worthwhile fulltext results (marked with †)
- 3/41 queries had worthwhile suggestions, but poor fulltext results (marked with ‡)
The same table for French abandoned queries:
| frwiki | no auto | zero sugg | no obv sugg | decent sugg | good sugg |
| zero res | 4 | 8 | 2‡ | ||
| no obv res | 2 | 12 | 2 | 2‡ | |
| decent res | 2† | ||||
| good res | 7† | 1† | 1 | 5 | |
| unclear | 1 | 1 |
- 10/42 queries had poor suggestions, but worthwhile fulltext results (marked with †)
- 4/42 queries had worthwhile suggestions, but poor fulltext results (marked with ‡)
Possible Tasks
[edit]Below are some possible follow-up tasks—improvements, projects, etc.—suggested by the analysis above. Not all are directly related to abandoned queries. This is the list after reviewing queries from English Wikipedia. It will be updated after I review the Spanish (marked ESWIKI) and French (marked FRWIKI) Wikipedia samples. (Note that ESWIKI and FRWIKI task ideas aren't necessarily specific to those data sets; I've just marked them so people following the page can find new additions without having to scour the page diffs.)
- Improve DYM
- Don't show DYM suggestion or results for DYM suggestion when there are no results! Definitely prevent zero-result DYM suggestion leading to another zero-result DYM suggestion. (I found a chain of query + 3 suggestions, all get zero results).
- Make DYM suggestions (and the fact that we are showing DYM results) more prominent (but only after fixing the problem above).
- General DYM improvements (unclear how).
- Especially for longer queries. Maybe have a length limit, by words or characters? Interesting test: what percentage of DYM suggestions are clicked on, aggregated by query length (or suggestion length.. though they should be similar). There may be a cut off where clicks are ~10x less likely, and maybe we should stop showing those. Even if there aren't many of them, the rare really dumb suggestion could decrease confidence in suggestions (or on-wiki search) overall.
- Look into better snippeting? Nothing really concrete to suggest, but some random thoughts:
- Stats could be page-specific. For example, Bündchen is in only 354 articles on English Wikipedia. Patrícia matches over 88K articles, so is ranked less in retrieval. However, on the page for "Gisele Bündchen", Bündchen occurs 286 times, while Patrícia/Patricia occurs 7 times, so for the query Patrícia Bündchen, maybe Patrícia is more important in the "Gisele Bündchen" snippet. (And if Patrícia Bündchen was listed on the page for the name "Patricia", then Bündchen would be more important to the snippet.)
- We could prioritize snippets with more individual keywords in them.
- We could look into sentence- or paragraph-level embeddings for matching text to get a snippet from.
- Maybe the size of the snippet affects which snippet is chosen (easy to test in vitro whether it changes the stats and affects snippet location, and straightforward to A/B test in vivo whether people click on snippets of different sizes).
- Better formatting for lists in snippets. Maybe not possible, but putting some divider character (• comes to mind) between list items would make reading some listy snippets easier.
- Improve autocomplete
- Make section anchors work from autocomplete suggestions that come from resolved redirects.
- Make showing better redirect resolution easier (maybe by doing something in our API to pre-format "Borscht (redirect from Barszcz)".
- Create a "text-only" title index (needs a better name) for Go feature—index title with only "letters" or "text characters" (no spaces, no punctuation) and match on that if other title-matching fails. Autocomplete already effectively does this. There will be some clashes, but we already have case-folding and diacritic-folding clashes and handle those fine by rolling over to fulltext results.
- Look at ways to make autocomplete matches from Lastname to "Firstname Lastname" ahead of typo matches for "Lattename", etc. Default sort has been mentioned and is used on some wikis, but has idiosyncratic uses on some wikis. Maybe partial matches (starting at a token boundary?) or other approaches. Quite possible this isn't plausible because it either is too expensive or gets too many false positives.
- Add partial autocomplete suggestions, especially in the Special:Search big search box, to correct typos (jhon), join or split tokens (johnsmith, sm ith), and, more expensively and expansively, make suggestions for the next word in the query. Maybe some of these should only happen after title matches dwindle to zero? (Though that could make fixing early errors harder...)
- Improve search results
- Think again about showing cross-language "round-trip" results. For example, Schreibtafel (German for "blackboard") gets 1 result on English Wikipedia, and cross-language results from German Wikipedia. The top result is an exact title match. It would be cool if we could figure out how to get from the German "Schreibtafel" article back to the English "Blackboard" article (via Wikidata, or the language links on the German article, etc.)—even if only for a small subset of cross-language results (top 3? top 1? even top 1 only if it is an exact title match!).
- Think again about implementing a thesaurus... whether hard-coding a very small hand-curated list (movie/film for enwiki) or a fully developed community-driven thesaurus... or something in between.
- Look into a way to filter "sidebar spam". It's important to differentiate between a template that generates fixed text (like a topic sidebar or a box with links to a director's every film) from a customizable template. Though some templates generate small bits of fixed but hard-to-type or hard-to-format text, which should not be ignored. Maybe the dividing line is length of output? The distinction may be impossible to determine automatically, so a list of wiki-specific ignorable templates might be the only approach, and may also be not worth it.
- FRWIKI: The "non-title" search of "title + details" makes me wonder what would happen if we matched the longest substring (or the longest prefix, or the longest suffix) against titles. (Need to review the features of such searches to see how often fulltext returns decent results anyway.)
- FRWIKI: We could search for very good matches (exact, or maybe only differ by pluralization, or similar) for non-article titles, and show them as suggestions or other alternatives. Then categories like "Ukrainian musicians" (enwiki) or "Musicien ukrainien" (frwiki) would be good matches for exact or nearly exact queries.
- Improve Sister Search / Wikidata Widget
- ESWIKI: Sister search results are displayed in a confusing way when there are zero fulltext results. (e.g., "number of wholesale houses" on English Wikipedia gets zero results, but gets sister search results from Wikisource.)
- ESWIKI: How are sister search results ordered? It doesn't seem to be consistent, and sometimes a clearly better result was second. Ranking across projects is very hard, but maybe we could do something to improve it.
- ESWIKI: The Wikidata Widget sometimes has good results. For example, it can have higher precision when the search is a title of a thing that's just not in the current wiki, and it can have higher recall when the search is in a foreign language. However, I did not systematically review its results because (i) it is not on every Wikipedia and (ii) it is way down at the bottom of the page where I didn't usually scroll if I had 20+ results. Maybe we should consider moving the Wikidata widget to a more prominent position—with sister search? On mobile, there is so little room.. and it looks like it is suppressed there.
- FRWIKI: Another vote for either figuring out how and when to elevate Wikidata widget results, or expanding exact or near exact matching beyond fulltext titles comes from names, which can have wild variation across languages. (See notes on minor nobility above.)
- FRWIKI: We might undertake a survey of gadgets and widgets that various wikis are using and see if there are any we can repurpose more generally. Sister search came from a widget on Italian Wikipedia. DWIM is/was a gadget on ruwiki and hewiki. The Wikidata widget is really good at finding exact matches of variants or other languages, and is on several wikis. French Wikipedia has a widget (that I only see when I am logged out) that offers links to Wikipedias in English, German, and Spanish, and nine other projects (French projects plus Wikidata and Commons). What other neat ideas are out there?
- Help for editors
- Maybe mine abandoned queries—they seem to be a better source of mining for possible new articles or redirects than zero-results queries.
- There are still lots of issues with mining queries, but this might be worth looking into more to see if there is enough signal here to be worth thinking about how to deal with the other issues. One issue that abandoned queries have that zero-results queries don't have is that they are easier to game. Not just any query will get zero results, but any query can be abandoned.
- Maybe mine abandoned queries—they seem to be a better source of mining for possible new articles or redirects than zero-results queries.
Raw Tags
[edit]Below are the raw tags for the sampled abandoned queries. The queries themselves are not included for privacy reasons. Queries can contain personally identifiable information and must be reviewed before being released. They are generally grouped first by "title-search" vs "non-title search" (or neither, for really gibberishy queries), and within those groupings the lists are just sorted alphabetically.
English Wikipedia (enwiki):
- main/search page | no auto | zero results | unclear--gibberish | one-shot
- main/search page | non-title search | no auto | no obvious result | one-shot | unclear
- main/search page | non-title search--non-encyc | zero suggestions | zero results | dym | section exists | typos--space | typos--spelling | reformulation | non-terminal editing
- main/search page | non-title search--question | no auto | good result | page exists | one-shot
- main/search page | non-title search--title + details | zero suggestions | intermediate good suggestion | good result | section exists | unclear snippet | google
- main/search page | non-title search--title + details | no auto | good result | page exists
- main/search page | non-title search--title + details | zero suggestions | no obvious result | page exists | typos | reformulation | non-terminal editing | google | post-abandonment autocomplete
- on non-relevant page | non-title search | no obvious suggestion | zero results | cross-language--good result | wrong script
- on non-relevant page | non-title search | zero suggestions | no obvious result | unclear--incomplete
- on non-relevant page | non-title search--title + details | zero suggestions | no obvious result | page does not exist | google
- on relevant page | non-title search | zero suggestions | no obvious result | google--AI
- on relevant page | non-title search | zero suggestions | zero results | wrong script | google--non-encyc
- on relevant page | non-title search--non-encyc | zero suggestions | zero results | google--AI
- on relevant page | non-title search--question | more specific info | zero suggestions | no obvious result | google
- on relevant page | non-title search--title + details | more specific info | zero suggestions | no obvious result | google
- on relevant page | non-title search--title + details | zero suggestions | good result | cross-project--good result | section exists | unclear snippet
- on relevant page | non-title search--title + details | zero suggestions | no obvious result | page does not exist | typos | reformulation | google
- main/search page | title search | decent suggestion | no obvious result | page exists | typos | unclear--typo | one-shot | no google
- main/search page | title search | good suggestion | good result | page exists | reformulation
- main/search page | title search | good suggestion | good results | section exists | unclear--ambiguous | no google | one-shot
- main/search page | title search | good suggestion | zero results | bad dym | section exists | typos | one-shot | google
- main/search page | title search | good suggestion--hidden redirect | good result | page exists | unclear | wrong project
- main/search page | title search | good suggestion--hidden redirect | zero results | bad dym | page exists | typos | google
- main/search page | title search | no auto | good result | one-shot | google--non-encyc
- main/search page | title search | no auto | good result | page exists | one-shot
- main/search page | title search | no auto | no obvious result | page does not exist | wrong language | one-shot | google
- main/search page | title search | no auto | zero results | page does not exist | bad dym | google | one-shot
- main/search page | title search | no auto | zero results | page does not exist | google | one-shot
- main/search page | title search | no auto | zero results | page does not exist | google | search features
- main/search page | title search | no obvious suggestion | good result | section exists | unclear snippet | no google | one-shot
- main/search page | title search | zero suggestions | intermediate good suggestion | no obvious result | dym | page exists | typos--spelling | google
- main/search page | title search | zero suggestions | intermediate good suggestion | no obvious result | page exists | google
- main/search page | title search | zero suggestions | no obvious result | page does not exist | google
- main/search page | title search | zero suggestions | no obvious result | page does not exist | typos--spelling | google
- main/search page | title search | zero suggestions | zero results | page does not exist | wrong language | typos | one-shot
- on non-relevant page | title search | good suggestion | good result | page exists | post-abandonment autocomplete--page visits
- on non-relevant page | title search | good suggestion | no obvious result | dym | page exists | typos--space | google
- on non-relevant page | title search | no obvious suggestion | no obvious result | page does not exist | rabbit hole | post-abandonment autocomplete--page visits
- on non-relevant page | title search | no obvious suggestion | no obvious result | page does not exist | non-terminal editing | google
- on non-relevant page | title search | no obvious suggestion | no obvious result | page exists | typos | reformulation | google | post-abandonment autocomplete
- on non-relevant page | title search | no obvious suggestion | zero results | cross-language--good result | wrong script
- on relevant page | title search | no obvious suggestion | good result | section exists | non-terminal editing | google
- on relevant page | title search | zero suggestions | good result | section exists | unclear snippet | post-abandonment autocomplete--page visits
- on relevant page | title search | zero suggestions | no obvious result | page exists | wrong project | needs quotes | rabbit hole | google | post-abandonment autocomplete
- on the same page | title search | good suggestion | decent result | page exists | non-terminal editing
- on the same page | title search | good suggestion--hidden redirect | good result | page exists | post-abandonment autocomplete--page visits
- on the same page | title search | no obvious suggestion | good result | cross-project--good result | page exists
- on the same page | title search | zero suggestions | decent result | page exists
- on the same page | title search | zero suggestions | good result | section exists
- on the same page | title search | zero suggestions | good result | section exists
Spanish Wikipedia (eswiki):
- main/search page | title search | good suggestion | decent result | page exists | unclear--typo | no google
- main/search page | title search | no auto | good result | cross-project--good result | page exists | one-shot
- main/search page | title search | no auto | good result | page exists | one-shot
- main/search page | title search | no auto | good result | page exists | one-shot
- main/search page | title search | no obvious suggestion | no obvious result | page does not exist | google
- main/search page | title search | zero suggestions | good result | page exists | reformulation
- main/search page | title search | zero suggestions | no obvious result | page does not exist | google
- main/search page | title search | zero suggestions | no obvious result | page does not exist | reformulation | non-terminal editing | google | enwiki
- main/search page | title search | zero suggestions | zero results | bad dym | page does not exist | google
- on non-relevant page | title search | good suggestion | good result | page exists
- on non-relevant page | title search | good suggestion | no obvious result | dym | page exists | typos
- on non-relevant page | title search | good suggestion | good result | section exists | unclear snippet | reformulation | google--AI | rabbit hole
- on non-relevant page | title search | more specific info | zero suggestions | good result | page exists
- on non-relevant page | title search | no obvious suggestion | good result | page exists | needs quotes | unclear--ambiguous | google--AI
- on non-relevant page | title search | wdqs | zero suggestions | no obvious result | page does not exist
- on non-relevant page | title search | zero suggestions | decent result | section exists | unclear snippet
- on non-relevant page | title search | zero suggestions | no obvious result | page does not exist | google
- on non-relevant page | title search | zero suggestions | no obvious result | page exists | typos--spelling | reformulation
- on non-relevant page | title search | zero suggestions | zero results | bad dym | page does not exist | typos | no google | post-abandonment autocomplete
- on non-relevant page | title search | zero suggestions | zero results | page does not exist | google
- on relevant page | title search | good suggestion | good result | page exists | post-abandonment autocomplete
- on relevant page | title search | zero suggestions | good result | page exists | wrong language
- on relevant page | title search | zero suggestions | intermediate decent suggestion | no obvious result | page exists | typos--spelling | unclear--typo | google
- on relevant page | title search | zero suggestions | no obvious result | cross-language--wrong guess | cross-project--good result | page does not exist | wrong language | wrong project
- on the same page | title search | zero suggestions | intermediate good suggestion | zero results | dym | page exists | typos
- main/search page | non-title search | zero suggestions | good result | page exists
- main/search page | non-title search | zero suggestions | good result | section exists
- main/search page | non-title search--cut-n-paste | zero suggestions | intermediate good suggestion | good result | section exists | unclear snippet | google--non-encyc | one-shot
- main/search page | non-title search--non-encyc | no auto | no obvious result | cross-project--good result | wrong project | one-shot
- main/search page | non-title search--title + details | no auto | no obvious result | page exists | google
- main/search page | non-title search--title + details | zero suggestions | good result | page exists | reformulation | non-terminal editing | rabbit hole | post-abandonment autocomplete
- main/search page | non-title search--title + details | zero suggestions | no obvious result | page does not exist | typos | no google
- on non-relevant page | non-title search | more specific info | zero suggestions | good result | section exists | unclear snippet | google--AI
- on non-relevant page | non-title search | zero suggestions | page does not exist | wrong project | typos | unclear--ambiguous | google
- on non-relevant page | non-title search | zero suggestions | unclear--ambiguous
- on non-relevant page | non-title search | zero suggestions | unclear--ambiguous
- on non-relevant page | non-title search--cut-n-paste | zero suggestions | no obvious result | section exists | typos | google
- on non-relevant page | non-title search--question--homework | zero suggestions | zero results | page exists | google
- on non-relevant page | non-title search--title + details | zero suggestions | intermediate decent suggestion | good result | page exists
- on non-relevant page | non-title search--title + details | zero suggestions | no obvious result | dym | page exists | typos
- on non-relevant page | non-title search--title + details | zero suggestions | no obvious result | page exists | google
- on non-relevant page | non-title search--title + details | zero suggestions | no obvious result | section exists | typos | google
- on non-relevant page--missing link | non-title search--question | zero suggestions | good result | page exists
- on relevant page | non-title search | more specific info | zero suggestions | zero results | bad dym | page does not exist | no google
- on relevant page | non-title search--question | no auto | good result | cross-project--good result | page exists
- on relevant page | non-title search--question | zero suggestions | zero results | page exists | google | post-abandonment autocomplete
- on relevant page | non-title search--title + details | zero suggestions | zero results | bad dym | page does not exist | wrong project
- on relevant page | non-title search--wrong part of speech | good suggestion | no obvious result | page exists
- on the same page | non-title search | zero suggestions | good result | page exists
- on the same page | non-title search--title + details | zero suggestions | intermediate good suggestion | good result | section exists
French Wikipedia frwiki):
- main/search page | title search | decent suggestion | good result | page exists | one-shot
- main/search page | title search | good suggestion | good result | page exists
- main/search page | title search | good suggestion | good result | page exists | unclear--ambiguous
- main/search page | title search | good suggestion | good result | snippet hint | page exists | rabbit hole | post-abandonment autocomplete
- main/search page | title search | no auto | no obvious result | page does not exist | google | one-shot
- main/search page | title search | no obvious suggestion | no obvious result | page exists | wrong language | google
- main/search page | title search | zero suggestions | decent result | section exists | needs quotes | google | one-shot
- main/search page | title search | zero suggestions | no obvious result | everything--good result | snippet hint | page exists
- main/search page | title search | zero suggestions | no obvious result | page does not exist | no google
- main/search page | title search | zero suggestions | no obvious result | page does not exist | unclear--ambiguous | no google
- main/search page | title search | zero suggestions | no obvious result | snippet hint | everything--decent result | page exists | post-abandonment autocomplete--page visits
- main/search page | title search | zero suggestions | no obvious suggestion | page does not exist | non-terminal editing | no google
- main/search page | title search | zero suggestions | zero results | page does not exist | wrong language | google
- main/search page | title search--wrong part of speech | no auto | zero results | bad dym | wrong project | google | one-shot
- on non-relevant page | title search | good suggestion | no obvious result | cross-project--good result | snippet hint | page exists
- on non-relevant page | title search | good suggestion | no obvious result | dym | cross-project--good result | typos--spelling
- on non-relevant page | title search | good suggestion | zero results | dym | page exists | typos
- on non-relevant page | title search | good suggestion | zero results | page exists | typos | rabbit hole | post-abandonment autocomplete--page visits
- on non-relevant page | title search | no obvious suggestion | no obvious result | cross-project--good result | page does not exist | wrong project
- on non-relevant page | title search | zero suggestions | good result | cross-language--good result | wrong language | post-abandonment autocomplete--page visits
- on non-relevant page | title search | zero suggestions | good result | section exists
- on non-relevant page | title search | zero suggestions | no obvious result | dym | snippet hint | page exists | typos | rabbit hole | post-abandonment autocomplete--page visits
- on non-relevant page | title search | zero suggestions | no obvious result | page does not exist | no google
- on non-relevant page | title search | zero suggestions | no obvious result | page does not exist | wrong language
- on non-relevant page | title search | zero suggestions | zero results | bad dym | page exists | typos | google
- on non-relevant page | title search | zero suggestions | zero results | cross-project--good result | page exists | wrong script
- on relevant page | title search | good suggestion | good result | page exists | post-abandonment autocomplete
- on relevant page | title search | zero suggestions | good result | page exists | post-abandonment autocomplete
- on relevant page | title search | zero suggestions | zero results | dym | page exists | typos
- main/search page | non-title search--non-encyc | no auto | no obvious result | page does not exist | unclear | no google | one-shot
- main/search page | non-title search--non-encyc | zero suggestions | unclear--ambiguous | no google | one-shot
- main/search page | non-title search--question | zero suggestions | no obvious result | page exists | google--AI
- main/search page | non-title search--title + details | more specific info | zero suggestions | no obvious result | snippet hint | page does not exist | reformulation | google
- main/search page | non-title search--title + details | zero suggestions | no obvious result | page does not exist | reformulation | no google
- main/search page | non-title search--url | no auto | zero results | page does not exist | one-shot
- main/search page | non-title search--url | zero suggestions | good result | page exists | typos | one-shot
- main/search page | non-title search--wrong site | no auto | zero results | page exists | wrong language | one-shot
- on non-relevant page | non-title search | zero suggestions | zero results | unclear--gibberish | no google
- on non-relevant page | non-title search--non-encyc | no auto | zero results | bad dym | page exists | wrong language | google
- on non-relevant page | non-title search--question | zero suggestions | no obvious result | unclear
- on non-relevant page | non-title search--question | zero suggestions | zero results | dym | section exists | typos
- on non-relevant page | non-title search--title + details | no obvious suggestion | good result | page exists
- on non-relevant page | non-title search--title + details | zero suggestions | decent result | section exists | unclear snippet | google--AI
- on relevant page | non-title search--title + details | good suggestion | good result | page exists | typos
- on relevant page | non-title search--title + details | more specific info | zero suggestions | good result | page exists
- on the same page | non-title search--question | more specific info | zero suggestions | zero results | bad dym | page exists | wrong language | google--AI
- on the same page | non-title search--question | zero suggestions | good result | page exists
- on the same page | non-title search--question | zero suggestions | zero results | bad dym | section exists | google--AI
- on the same page | non-title search--title + details | more specific info | zero suggestions | good result | section exists | unclear snippet
- on non-relevant page | unclear--ambiguous | no google
Dipping a Toe into German, Italian, Portuguese, and Russian
[edit]I took a very brief look at the fulltext searches in sessions with abandoned queries in four of the other languages we have taken samples from. I looked at German (de), Italian (it), and Portuguese (pt) because I can kind of read them, and Russian (ru) because I can kind of read the script (but not the language). I skipped Persian (fa), Japanese (ja), and Chinese (zh) as too much effort for this super-brief review.
I did not look at page context, autocomplete, or anything else to divine the searcher's intent, nor the number of suggestions or results, or anything else—just the text of the fulltext queries.
I was just looking for anything that is obviously wildly different from the general patterns we've seen in English, Spanish, and French (like abandoned queries being 90% obvious gibberish).
I did not separate out names written in the script of the language of the wiki like I often do, and I may have missed some queries in reasonably closely related languages. (For Russian, I didn't look closely at anything in Cyrillic, and just lumped it together as "Russian/Cyrillic". I recognize a few Cyrillic characters as non-Russian, but didn't look too hard for them.)
Notes on tags: non-search queries don't seem like "real" queries in human language, and include things like "Do Not Track", and "test_thing12347". numbers are all digits. urls are mostly web domains. Russian wrong keyboard queries come from typing a word in Russian while using an English keyboard.
Raw tags
[edit]de.wiki
- German/Latin/Names: 84%
- English: 8%
- non-search: 6%
- number: 2%
it.wiki
- Italian/Latin/Names: 94%
- English: 4%
- url: 2%
pt.wiki
- Portuguese/Latin/Names: 92%
- English: 6%
- Chinese: 2%
ru.wiki
- Russian/Cyrillic: 62%
- English: 12%
- English + Russian/Cyrillic: 6%
- Spanish: 4%
- Bulgarian: 2%
- German: 2%
- wrong keyboard: 4%
- non-search:4%
- url: 2%
- gibberish: 2%
Observations
[edit]The distributions of German, Italian, and Portuguese are roughly similar to those in English, Spanish, and French. English queries in other languages are sometimes obviously titles of books, movies, etc. (e.g., Back to the Future).
Russian has a lot more queries that are not just normal-looking language in Russian/Cyrillic. Wrong-keyboard queries are 4% (2 of 50), which is in line with our previous findings that wrong-keyboard queries are about 1% of all Russian queries.
Overall, nothing shocking here.