Help:CirrusSearch/CompletionSuggester

Please let us know what is and is not working well with the new completion suggester. Direct bugs can filed into phabricator, surfaced on our [mailto:discovery@lists.wikimedia.org mailing list], or on irc freednode #wikimedia-discovery

Additional word terminators needed?
When typing in CirrusSearch the completion search only looks for pages in the main namespace and seems blind to other namespaces. Many communities utilise namespaces for a logical categorisation or behavioural difference, and this seems to be an unnecessary limitation. Similarly the suggester seems to be inhibited by a forward slash in a page name, so searching for CompletionSearcher shows nothing in the type ahead unless you have started your search term with Extension:...

Both cases almost seem not to identify the colon or the forward slash as logical word terminators in a nomenclature construct. — billinghurst  sDrewth  12:58, 21 December 2015 (UTC)
 * You're correct that the completion suggester is limited to the main namespace; this was explicitly noted in the announcement and is for purely technical reasons during the initial rollout of the beta feature. Any more full rollout of the beta feature would not have this limitation. The reason the completion suggester can't find this page is due to this very limitation; the page is not in its search index. Otherwise, colons and slashes in page titles seem to work just fine for me, provided that the page is in the main namespace. Can you give me another example of this problem so that I can verify it? Thanks! --Dan Garry, Wikimedia Foundation (talk) 04:19, 22 December 2015 (UTC)


 * I was (typically) searching at enWS for a biographical work set at a subpage, so for this example search there for the word "Bickerton". It shows results for the DNB work, though not The Dictionary of Australasian Biography/Bickerton, Alexander William.  That said, I, subsequently, did a  search for "Australasian" and neither the root page of the work, nor its subpages, show. So how far into pagename does CompletionSuggester look for a match? — billinghurst  sDrewth  04:08, 23 December 2015 (UTC)
 * To note that I did look to find a work with a shorter root pagename, and with something shorter it is able to detect a forward slash. Interesting it is something with partial split searches, eg. "Biography/c" gives some results, though "Biography/Ch" only gives one. Searching for "Legendary" looking for Australian Legendary Tales gives no success, nor the word "Mayamah" looking for Australian Legendary Tales/The Mayamah. Finding it hard to find a short enough book title that may enable looking up subpage names. — billinghurst  sDrewth  06:14, 23 December 2015 (UTC)
 * This new beta feature is still, fundamentally, a completion suggester. It's not intended that it's able to find those pages with the search query you're entering, as the page titles don't begin with the query you entered. Building out something to do that is significantly more complex, which is why we've started with what we've got here. --Dan Garry, Wikimedia Foundation (talk) 00:37, 24 December 2015 (UTC)
 * As Deskana stated we can't really do any kind of word termination with the completion suggester, it is still at it's base a prefix search same as it is without the beta feature enabled. This new algorithm has the added benefit of allowing fuzzy search results (typos) along with more programmatic control over the result sorting. Results that show up that are not fuzzy prefixes are done through user generated redirects that do match the prefix. We can do some analysis of memory usage when indexing both Australian Legendary Tales/The Mayamah and The Mayamah but i'm nervous, we are already using over 100G of java heap (>13% of the total memory available to elasticsearch in our codfw cluster) to power the existing completion beta feature without yet indexing the other namespaces and without splitting on subpages. We can look into it, but i'm doubtful we have the hardware necessary to support this use case.
 * The limitations are necessary due to the sheer number of search as you type queries we serve. More advanced usage is supported with the full text search (press enter after typing query) because we only have to run the query once, instead of once for each character the user types (in the worst case). Search as you type has to run incredibly quickly to support several thousand queries per second against a couple dozen servers. Note that another required performance limitation restricts the searches to 50 characters. Our analysis of existing query patterns shows prefix search above that length makes up a fraction of a fraction of search traffic. EBernhardson (WMF) (talk) 19:54, 28 December 2015 (UTC)
 * Thanks for the comments. Something that you might be able to consider is simpler new/distinct typeahead, based on the forward slash as the terminator even without all the fuzzy searching. As I expressed in a phabricator ticket (and in a post to the mailing list,) the Wikisources make high use of subpages. If you consider compilation works (biographies, poetry, etc.) the title of the parent work can is less important than the sub-component for this type of work. Re your commentary about the namespaces, please be aware of a very wikipedia focus to such a comment. While the wikipedias have their content in the main namespace, the sister wikis are quite different in their utilisation of content namespaces, eg. number of the Wikisources utilise an Author: ns. So maybe that concern about broadening namespace inclusions can be more focused on content namespaces and that would not broaden the WPs, though would suit the sister wikis. Actually having scope around how the sister wikis are different and their needs would be useful to be explored. — billinghurst  sDrewth  11:46, 29 December 2015 (UTC)
 * With the completion suggester we tried to keep the same behaviors regarding namespaces, it's why we excluded everything that involve writing a namespace prefix, on wikisource with the default algorithm you have to type Author: in order to switch to this content namespace. I'd like to find a solution to address your comments: all content namespaces (no need to type Author:), subpages, but this would be a breaking change. Leonardo da Vinci will suggest Author:Leonardo da Vinci on wikisource. Another problem will be to make sure that we correctly sort the suggestions in case of collisions/ambiguities between namespaces and/or subpages, and as EBernhardson said the solution will have to be very performant. DCausse (WMF) (talk) 10:41, 31 December 2015 (UTC)
 * perfectly understood, and I am hoping that I am relaying intimate experience of the Wikisource community. We believe that many people don't understand namespaces — well not in depth — so they come to our site and type a name into the search box, desiring a result, so often they will desire both what it is in our main namespace (printed biographical works) and what is in the Author: ns (compiled bibliographical and linkages) and our knowing that there can be multiple hits for the same person and not knowing which they desire. So presenting a result of a biography from the Dictionary of National Biography, the Encyclopaedia Britannica (9th or 11th ed.), ..., or a component from the Alumni Oxonienses, based on the subpage is one part of what is desired that if someone is typing Smith, .... Rather than having them presented with a short form of the title of the book that takes all the visible/presented characters where they are not getting something of purpose. We know that they can still hit the search button and come back with results so it is not about presenting perfection, it is about a usefulness of the typeahead. I understand that there are limitations, though I don't fully grep the complexities you face. I believe that I do understand the usefulness of a functioning typeahead for the WS communities and where we would like it to be. The reflections of the community is that often developments halt once they for the WPs, and sometimes that is due to the initial focus, and sometimes due to the sister communities not being suitably descriptive or persistent. I trying to ensure that we are doing enough from our side. — billinghurst  sDrewth  13:50, 31 December 2015 (UTC)

Different order
I think it can be a good idea it recognize pages put in different order, like if I'm searching for a name I can write the name in another order. Ps: maybe search in alias in wikidata could be a good idea.--Martinligabue (talk) 15:18, 21 December 2015 (UTC) PPs, can you reply me on it.wiki?

@Whoever yes i think that there should be an algorithim that can still determine search results even if the order of the words are mixed up. Lets say that i am searching about Attack on Titan- it would be really useful for those days when a person cannot remember all of the details of the word and can only remember like "anime, titans" and the algorithim would show Attack on Titan! WHOKNEWABOUTTHAT? (talk) 20:22, 27 December 2015 (UTC)

Fundamentally the search as you type serves too many queries to perform very advanced searches, such as answering `anime, titans` or answering with the words out of order. We serve over a hundred million search as you type queries in a day and our servers would melt if we asked them to perform much more than a search against the title itself. We do have an algorithm that does a pretty decent job at answering these queries though, the full text search is designed to specifically answer these queries. Searching for anime, titans on enwiki brings up Attack on Titan as the eighth result on enwiki which seems fairly reasonable. Note that even google only does autocompletion from a prefix (in their case to other searches, rather than titles) for search as you type and not a full on search query. EBernhardson (WMF) (talk) 19:29, 28 December 2015 (UTC)

great
nice great article