Help:CirrusSearch/Logical operators

CirrusSearch, the MediaWiki extension that uses Elasticsearch to provide enhanced search features over the default MediaWiki search, does not currently support classic boolean queries, and the logical operators  and   should be used with great care, if at all.

Negation and parentheses
Cirrus Search does support several ways of indicating negation. The following queries are all equivalent:  (minus sign),   (exclamation point), and   (  operator).

Cirrus Search does not support parentheses, and they are removed from the query.

Lucene, and
CirrusSearch is built on top of Elasticsearch, which in turn is built on Lucene. Lucene does not support classic boolean  or , though it does offer those words as binary operators.

Instead Lucene converts  and   to a different formalism—unary   and   operators—giving results that sometimes mimic the expected boolean results, but which can also be wildly divergent from them.

In Lucene,  indicates that a search term is required and must be present in any results. So, a query like  would only return results that contain some form of "dog" in them.

On the other hand,  terms are optional but should be present if possible; while they are not strictly required they do effect ranking. So    requires "dog" in every result, but will ranks those that also contain "cat" as better.

The one exception to  terms not being required is that if there are no   terms, then at least one   term will be present in each result. Thus,  will actually give results that have at least one of "dog", "cat", or "fish" present—though any results with all three will likely rank higher.

Classic boolean operators often have an implicit, meaning that any query terms without an explicit boolean operator between them is assumed to have an   between them. In Lucene, any query term without an explicit  or   is assumed to have an implicit   applied to it.

Converting and
Lucene converts  and   to   and   in a way that sometimes gives the expected results, but often leads to very unexpected results.

When Lucene encounters, it applies   to the terms on each side of the. When it encounters, it applies   to the terms on each side of the. The query is processed left to right, and later /  operators override earlier ones.

This effectively gives a "backward order precedence" to the operators, and the results can be quite unexpected if you are used to classic boolean operators.

Examples that go wrong
These are some examples where the conversion from /  to  /  gives wildly divergent results from the expectations of classic boolean operators.

A few worked examples are below:


 * convert  to   before and after, giving:
 * convert  to   before and after (in this case overriding an existing  ), giving:
 * The result set is thus the same as, with   being not required, and only affecting ranking.
 * convert  to   before and after (in this case overriding an existing  ), giving:
 * The result set is thus the same as, with   being not required, and only affecting ranking.
 * The result set is thus the same as, with   being not required, and only affecting ranking.


 * convert  to   before and after, giving:
 * apply an implicit  to any term without a   or , giving:
 * In a classic boolean system with implicit, we would expect that   and   to be the same, but compare this to the example above to see the difference—only   is required here, while   and   are both required above.
 * apply an implicit  to any term without a   or , giving:
 * In a classic boolean system with implicit, we would expect that   and   to be the same, but compare this to the example above to see the difference—only   is required here, while   and   are both required above.
 * In a classic boolean system with implicit, we would expect that   and   to be the same, but compare this to the example above to see the difference—only   is required here, while   and   are both required above.


 * convert  to   before and after, giving:
 * convert  to   before and after, giving:
 * The result set is thus the same as simply searching for blue, with red and green only affecting ranking. This also means that if, in this example, there are no documents with either red or green in them, you will get the same results searching for  as you would for just searching for  —which is not what you would expect from any classic boolean system.
 * convert  to   before and after, giving:
 * The result set is thus the same as simply searching for blue, with red and green only affecting ranking. This also means that if, in this example, there are no documents with either red or green in them, you will get the same results searching for  as you would for just searching for  —which is not what you would expect from any classic boolean system.
 * The result set is thus the same as simply searching for blue, with red and green only affecting ranking. This also means that if, in this example, there are no documents with either red or green in them, you will get the same results searching for  as you would for just searching for  —which is not what you would expect from any classic boolean system.

In general, mixing  with  —including implicit  —in one query gives results that are unintuitive in a classic boolean framework. It also can be very difficult to detect these cases where the boolean logic goes awry, unless you already know exactly how many documents contain each possible combination of your query terms.

Useful use cases
If you have no explicit operators, then the boolean default is  and the Lucene default is  —which are equivalent if they are the only operators present in the query:


 * —all three terms must be present in any results
 * —all three terms must be present in any results
 * —all three terms must be present in any results

However, since  is implicit, nothing is gained by making it explicit by using , other than the potential for later boolean confusion.

If the only operator in the query is —crucially meaning that there are no implicit  s, either—then it is the same as everything having a   (recall that if a query has no   terms, than at least one of the   terms will be in any result):


 * —at least one of the three terms must be present in any results
 * —at least one of the three terms must be present in any results

Be very careful with implicit / ! In the example above— —the implicit  means that neither   nor   are strictly required to be in the results.

Future plans
Of course we are not very happy with this state of affairs.

In the short term we are creating this document and updating the Help:Searching documentation to reflect the reality of our current system.

Longer term, we plan to implement a new layer in CirrusSearch that will properly construct a Lucene /  query that is equivalent to a given classic boolean query—including proper support for parentheses!—and return the expected results. (For those who are interested, it is possible to specify in Lucene that at least one of a set of query terms or clauses is a required to match, which is equivalent to a boolean ; requiring that all of a set of query terms or clauses match is the same as a boolean  .)

Beyond that, we may also make explicit the  and   operators, possibly using the unary syntax shown in this document, but also possibly using some other syntax, to be determined.