Extension:CirrusSearch/Profiles

CirrusSearch has a lot of tunable parameters that influence various aspect of the indexing, such as search rankings, indexing, etc. These parameters are organized in data sets called "profiles", which are named sets of data defining the settings for a given profile type and context. Each profile type and context has a default profile name, which can be overridden by setting config variables, URL parameters or user settings.

Profile type
Profile type is a kind of data that is used for configuration or tuning - such as rescore configuration, similarity configuration, ranking functions set, etc. Different profile types contain different data and usually are not compatible with each other. The following profile types are defined in the code:


 * COMPLETION
 * Defines settings for the completion suggester.
 * Defaults in file : profiles/SuggestProfiles.config.php
 * Configuration variable : $wgCirrusSearchCompletionProfiles


 * CROSS_PROJECT_BLOCK_SCORER
 * Defines settings for merging results from cross-wiki searches.
 * Defaults in file : profiles/CrossProjectBlockScorerProfiles.config.php
 * Configuration variable : $wgCirrusSearchCrossProjectBlockScorerProfiles


 * FT_QUERY_BUILDER
 * Defines settings for building the elasticsearch query during fulltext searches
 * Default in file : profiles/FullTextQueryBuilderProfiles.config.php
 * Configuration variable : $wgCirrusSearchFullTextQueryBuilderProfiles


 * PHRASE_SUGGESTER
 * Defines settings for building the elasticsearch phrase suggest query (did you mean? suggestions)
 * Default in file : profiles/PhraseSuggesterProfiles.config.php
 * Configuration variable : $wgCirrusSearchPhraseSuggestProfiles


 * RESCORE
 * Defines configuration for ranking search results.
 * Defaults in file : profiles/RescoreProfiles.config.php
 * Configuration variable : $wgCirrusSearchRescoreProfiles


 * RESCORE_FUNCTION_CHAINS
 * Defines functional expressions to be used in scoring search results.
 * Defaults in file : profiles/RescoreFunctionChains.config.php
 * Configuration variable : $wgCirrusSearchRescoreFunctionScoreChains


 * SANEITIZER
 * Defines settings for the sanitization process running in the background (check for missing updates)
 * Defaults in file : profiles/SaneitizeProfiles.config.php


 * SIMILARITY
 * Defines similarity configurations
 * Defaults in file : profiles/SimilarityProfiles.config.php
 * Configuration variable : $wgCirrusSearchSimilarityProfiles

Note that profiles defined in both default files and config settings, and other repositories, should have unique names across the type. Extensions can define their own profile types and add profiles to the list of available profiles of existing types, either through using the variables above or by defining their own profile repositories.

Wikibase
Wikibase extensions has its own profiles defined, and also adds some profiles to the ones specified above:
 * RESCORE
 * Added rescore profiles that are used for Wikibase entities.
 * Defaults in file: repo/config/ElasticSearchRescoreProfiles.php
 * Configuration variable : $wgWBRepoSettings['entitySearch']['rescoreProfiles']


 * RESCORE_FUNCTION_CHAINS
 * Added functional expressions to be used in scoring search results.
 * Defaults in file : repo/config/ElasticSearchRescoreFunctions.php

Wikibase types:
 * WIKIBASE_QUERY_BUILDER_PROFILE_TYPE
 * Configuration for wikibase query builder prefix search
 * Defaults in file: repo/config/EntityPrefixSearchProfiles.php
 * Configuration variable : $wgWBRepoSettings['entitySearch']['prefixSearchProfiles']

Context
Context defines in which kind of environment a profile is being used - i.e., a rescore profile can be applied to regular search, prefix search, Wikibase search, etc., which may require different settings (though still the same type of settings, thus the same data structure). Context is secondary to profile type - the same profile type always uses the same data structure, but can use different profile names and thus different settings in different contexts.

The following contexts are defined out of the box:
 * - default context that is applied unless some other context is specified
 * - used when prefix search is performed
 * - Wikibase prefix search (wbsearchentities).

Profile selection
The profile to use for specific operation is defined by the following procedure: The set of overrides is as follows:
 * 1) Define the profile type and context in which we are operating (see above).
 * 2) For the profile type,context pair scan the set of override possibilities that are available - such as URI overrides, user preference overrides, config overrides, etc. in order of priority. Default priority is URI override on the top, then user preference, then config.
 * 3) If override is set, use that value as the profile name.
 * 4) Otherwise, use the default value for the profile,context pair.
 * 5) Fetch the profile with this name. If the profile with the overridden name does not exist, use the default profile (i.e., the profile with the default name).

Note that the same type can use the same override setting in different contexts, especially for URI override. This is a good practice for URI overrides, since they are per-request and thus used only in one context, but not a good practice for persistent overrides, like user or config overrides.