Jump to content

Extension:CirrusSearch/Profiles

From mediawiki.org

CirrusSearch has a lot of tunable parameters that influence various aspects of the indexing, such as search rankings, indexing, etc. These parameters are organized in data sets called "profiles", which are named sets of data defining the settings for a given profile type and context. Each profile type and context has a default profile name, which can be overridden by setting config variables, URL parameters or user settings.

Profile type

[edit]

Profile type is a kind of data that is used for configuration or tuning - such as rescore configuration, similarity configuration, ranking functions set, etc. Different profile types contain different data and usually are not compatible with each other. The following profile types are defined in the code:

COMPLETION
Defines settings for the completion suggester.
Defaults in file[1]: profiles/SuggestProfiles.config.php
Configuration variable[2]: $wgCirrusSearchCompletionProfiles
CROSS_PROJECT_BLOCK_SCORER
Defines settings for merging results from cross-wiki searches.
Defaults in file[1]: profiles/CrossProjectBlockScorerProfiles.config.php
Configuration variable[2]: $wgCirrusSearchCrossProjectBlockScorerProfiles
FT_QUERY_BUILDER
Defines settings for building the elasticsearch query during fulltext searches
Default in file[1]: profiles/FullTextQueryBuilderProfiles.config.php
Configuration variable[2]: $wgCirrusSearchFullTextQueryBuilderProfiles
PHRASE_SUGGESTER
Defines settings for building the elasticsearch phrase suggest query (did you mean? suggestions)
Default in file[1]: profiles/PhraseSuggesterProfiles.config.php
Configuration variable[2]: $wgCirrusSearchPhraseSuggestProfiles
RESCORE
Defines configuration for ranking search results.
Defaults in file[1]: profiles/RescoreProfiles.config.php
Configuration variable[2]: $wgCirrusSearchRescoreProfiles
RESCORE_FUNCTION_CHAINS
Defines functional expressions to be used in scoring search results.
Defaults in file[1]: profiles/RescoreFunctionChains.config.php
Configuration variable[2]: $wgCirrusSearchRescoreFunctionScoreChains
SANEITIZER
Defines settings for the sanitization process running in the background (check for missing updates)
Defaults in file[1]: profiles/SaneitizeProfiles.config.php
SIMILARITY
Defines similarity configurations
Defaults in file[1]: profiles/SimilarityProfiles.config.php
Configuration variable[2]: $wgCirrusSearchSimilarityProfiles

Note that profiles defined in both default files and config settings, and other repositories, should have unique names across the type. Extensions can define their own profile types and add profiles to the list of available profiles of existing types, either through using the variables above or by defining their own profile repositories.

Wikibase

[edit]

Wikibase extensions, such as WikibaseCirrusSearch has its own profiles defined, and also adds some profiles to the ones specified above:

RESCORE
Added rescore profiles that are used for Wikibase entities.
Defaults in file: src/config/ElasticSearchRescoreProfiles.php
Configuration variable[2]: $wgWBCSRescoreProfiles
RESCORE_FUNCTION_CHAINS
Added functional expressions to be used in scoring search results.
Defaults in file[1]: src/config/ElasticSearchRescoreFunctions.php

Wikibase types:

WIKIBASE_PREFIX_QUERY_BUILDER
Configuration for wikibase query builder prefix search
Defaults in file: src/config/EntityPrefixSearchProfiles.php
Configuration variable[2]: $wgWBCSPrefixSearchProfiles

Context

[edit]

Context defines in which kind of environment a profile is being used - i.e., a rescore profile can be applied to regular search, prefix search, Wikibase search, etc., which may require different settings (though still the same type of settings, thus the same data structure). Context is secondary to profile type - the same profile type always uses the same data structure, but can use different profile names and thus different settings in different contexts.

The following contexts are defined out of the box:

  • CONTEXT_DEFAULT — default context that is applied unless some other context is specified
  • CONTEXT_PREFIXSEARCH — used when prefix search is performed
  • CONTEXT_WIKIBASE_PREFIX — Wikibase prefix search (wbsearchentities).

Profile selection

[edit]

The profile to use for a specific operation is defined by the following procedure:

  1. Define the profile type and context in which we are operating (see above).
  2. For the profile type / context pair scan the set of override possibilities that are available - such as URI overrides, user preference overrides, config overrides, etc. in order of priority. Default priority is URI override on the top, then user preference, then config.
  3. If override is set, use that value as the profile name.
  4. Otherwise, use the default value for the profile / context pair.
  5. Fetch the profile with this name. If the profile with the overridden name does not exist, use the default profile (i.e., the profile with the default name).

The set of overrides is as follows:

Type Context Default[3] URI Override[4] User override Config override
COMPLETION CONTEXT_DEFAULT fuzzy cirrussearch-pref-completion-profile $wgCirrusSearchCompletionSettings
CROSS_PROJECT_BLOCK_SCORER CONTEXT_DEFAULT static $wgCirrusSearchCrossProjectOrder
FT_QUERY_BUILDER CONTEXT_DEFAULT default cirrusFTQBProfile $wgCirrusSearchFullTextQueryBuilderProfile
PHRASE_SUGGESTER CONTEXT_DEFAULT default $wgCirrusSearchPhraseSuggestSettings
RESCORE CONTEXT_DEFAULT classic fulltextQueryIndepProfile, cirrusRescoreProfile $wgCirrusSearchRescoreProfile
RESCORE CONTEXT_PREFIXSEARCH classic cirrusRescoreProfile $wgCirrusSearchPrefixSearchRescoreProfile
RESCORE_FUNCTION_CHAINS CONTEXT_DEFAULT n/a[5]
SANEITIZER n/a[6]
SIMILARITY CONTEXT_DEFAULT default $wgCirrusSearchSimilarityProfile
RESCORE CONTEXT_WIKIBASE_PREFIX wikibase_prefix cirrusRescoreProfile $wgWBCSDefaultPrefixRescoreProfile
WIKIBASE_PREFIX_QUERY_BUILDER CONTEXT_WIKIBASE_PREFIX default cirrusWBProfile $wgWBCSDefaultPrefixProfile

Note that the same type can use the same override setting in different contexts, especially for URI override. This is a good practice for URI overrides, since they are per-request and thus used only in one context, but not a good practice for persistent overrides, like user or config overrides.

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 This file contains basic profiles of that type.
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 This configuration variable can contain additional profiles of this type.
  3. The name of the profile used by default.
  4. This is entered in the URI of the request, e.g., cirrusOverride=profileName.
  5. These profiles are always referenced explicitly by RESCORE data, so there is no default.
  6. The sanitizer will choose the best profile to use at runtime based on wiki size