Topic on Extension talk:CirrusSearch

Searching with partial word/ngram matching

2
Longphile (talkcontribs)

Does CirrusSearch currently support partial word/ngram matching? So for example, the page name I am looking is PageName2017. When searching, I would like 'ame20' to match to this page as a suggestion. I dug into the AnalysisConfigBuilder.php file and it does not appear to support ngram tokenizing. I'm looking for some analyzer setting like this:

https://keyholesoftware.com/2015/11/02/anatomy-of-setting-up-an-elasticsearch-n-gram-word-analyzer/

If I just updated the analyzer settings in AnalysisConfigBuilder.php to include an ngram tokenizer and perhaps also a corresponding setting in MappingConfigBuilder.php, will partial word matching work in the search bar?

EBernhardson (WMF) (talkcontribs)

We currently only use ngram tokenizing for the insource regex search, everything else is tokenized either by words or not at all (keywords). The use the custom trigram analyzer and is done by the SourceTextIndexField class. Something similar could be done to make a title field have a trigram index. This search query performed would also have to be adjusted to query this trigram field and weight it appropriately.

You mention the search bar, and title suggestions. The title suggestions (in the top right corner of vector skin) are provided by the completion suggester if enabled. This is different from standard search and uses it's own special index. You would probably need to make sure this is disabled if you want to provide filtering/scoring based on title trigrams.

Reply to "Searching with partial word/ngram matching"