Extension:CirrusSearch/Query Construction

This page describes how the user query is manipulated to be reconstructed as a structured elasticsearch query.

Overview
CirrusSearch interacts with MediaWiki core by extending SearchEngine. This class expose 3 main ways to query the index and find pages (called SearchEngine entry points in CirrusSearch):
 * fulltext: the classic full text search provided by Special:Search or the search module of API:Query
 * near match: not directly exposed though an interface nor an API, this call is responsible for the "go feature", when typing a text that matches nearly perfectly a page it goes directly to that page instead of Special:Search.
 * completion: used by all autocomplete (search as you type).

When the query string and its associated metadata enter Cirrus it follows various transformation steps:
 * 1) Parsing
 * 2) Profile selection
 * 3) Elasticsearch query building
 * 4) Elasticsearch responses transformation
 * 5) Fallback methods evaluation
 * 6) CrossProject searches

Parsing
Parsing is responsible of extracting features from the user query string. Note that while parsing is particularly important for fulltext search queries it is also present for other search entry points, for instance the namespace prefix extraction is present in all searches and can be considered as a parsing step.

Parsing produces a  instance that contains all the information known about the query and its context.
 * the search engine entry point
 * all its metadata (size, offset, ...)
 * contextual filters (e.g. the prefix option provided by Extension:InputBox)
 * the parsed query (AST)

The  is immutable.

Profile selection
Profile is the process responsible from deciding what are the best profiles to use for a given.

Elasticsearch query building
This is the process of building the elasticsearch search request body.

Retrieval query
Meant to extract a first set of document from the index. This query is split into two parts.

Scoring part
Elements of the query that affect scoring. Changing something here should not change the set of hits found by the retrieval query. This section of the query must only affect the initial ranking of the results. The scoring part of a query is controlled by a FullTextQueryBuilder currently only supported by the fulltext SearchEngine point.

Filtering part
Elements of the query that do not affect ranking. Changing something here do not affect ranking but change the set of hits found by the retrieval query. Filtering is also controlled by FullTextQueryBuilder but will change similarly to have a  as input.

Rescore query
Fine-tuning of the ranking, depending on the needs multiple rescore queries can be assembled, their scores can also be assembled. Some searches may prefer to combine the score from the scoring part of the retrieval query with some rescore components.