Extension:CirrusSearch/Scoring

This page aims to provide some insights on the scoring functions and techniques used by CirrusSearch to rank search results.

Basics
Cirrus follows a very basic concept used by many search engines, a document score combines two types of sub-scores:
 * 1) A score that computes the similarity of the query with the document
 * 2) Scores that depend only on the document metadata (e.g recency, number of incoming links, language...)

Query Architecture
The whole purpose of CirrusSearch is to parse the user query into an ElasticSearch Query using the functionalities available in the ElasticSearch Query DSL.

ElasticSearch Query
This is the query we send to the cluster in order to retrieve ranked results. The full query is rather large even for a single word query (e.g. single word query). The query components can be grouped into several components which serves different purpose. Note that the small number etiquette (1 or 2) on the diagram indicates if this component produces a sub-score of the types mentioned in Basics section.

Retrieval
The purpose of this step is to retrieve documents in the index that match the user query. There are 2 different way to retrieve documents :
 * the full-text queries that computes a score for each document.
 * the filters that do not compute any score.

The fulltext query in cirrus is currently composed of a specific query on the title and redirects. The query must contain all the words title or redirect in the same order to match. Its impact on the score is very high (boost 2), example for the query index :