Help:CirrusSearch/Logical operators/ja

From mediawiki.org
This page is a translated version of the page Help:CirrusSearch/Logical operators and the translation is 35% complete.
PD 注意: このページを編集すると、編集内容が CC0 のもとで公開されることに同意したと見なされます。詳細はパブリック・ドメインのヘルプ ページを参照してください。 PD
CirrusSearch currently does not support classic boolean searching, and the logical operators AND and OR should be used with great care, if at all.

Negation and parentheses

CirrusSearch does support several ways of indicating negation. The following queries are all equivalent: -dog (minus sign), !dog (exclamation point), and NOT dog (NOT operator).

CirrusSearch does not support parentheses, and they are removed from the query.

Lucene, MUST, and SHOULD

CirrusSearch is built on top of Elasticsearch, which in turn is built on Lucene. Our Lucene implementation does not support the classic boolean AND or OR operators, though it does offer those keywords as binary operators.

Instead Lucene converts AND and OR to a different formalism—unary MUST and SHOULD operators—giving results that sometimes mimic the expected boolean results, but which can also be very divergent from them. (Note that CirrusSearch does not currently support MUST or SHOULD operators in user queries. They are used here only to demonstrate the internal workings of Lucene.)

In Lucene, MUST indicates that a search term is required and must be present in any results. So, a query like MUST dog would only return results that contain some form of dog in them (note that this would also be equivalent to just searching for dog).

On the other hand, SHOULD terms are optional but should be present if possible; while they are not strictly required, they do effect ranking. So MUST dog SHOULD cat would require dog in every result, but would generally rank those that also contain cat as better matches.

The one exception to SHOULD terms being optional is that if there are zero MUST terms, then at least one SHOULD term would be present in each result. Thus, SHOULD dog SHOULD cat SHOULD fish would actually give results that have at least one of dog, cat, or fish present—though any results with all three would generally rank higher.

Classic boolean search often has an implicit AND, meaning that any query terms without an explicit boolean operator between them are assumed to have an AND between them. In Lucene, any query term without an explicit MUST or SHOULD is assumed to have an implicit MUST applied to it.

<span id="Converting_AND_and_OR">

ANDOR の変換

Lucene は ANDORMUSTSHOULD に変換しますが、これは時に期待通りの結果をもたらすものの、しばしば非常に予期せぬ結果をもたらします。

Lucene は AND に遭遇すると、AND の前後にある語句に MUST を適用します。 OR に遭遇すると、OR の前後の語句に SHOULD を適用します。 クエリは左から右へと処理され、後の AND または OR の演算子は前の演算子をオーバーライドします (下記の例を参照)。

このため、演算子には通常とは異なる「逆順の優先順位」が与えられ、従来のブーリアン検索と比較すると、かなり予想外の結果が得られることがあります。

失敗する例

以下に、AND/OR から MUST/SHOULD への変換が、古典的なブーリアン演算子の期待値とは異なる結果をもたらす例をいくつか示します。


  • blue OR red AND green
    • OR の前後を SHOULD に変換すると以下のようになります:
  • SHOULD blue SHOULD red AND green
    • AND の前後を MUST に変換 (この場合、以前に適用された SHOULD をオーバーライドします) すると以下のようになります:
  • SHOULD blue MUST red MUST green
    • したがって、結果集合は red green と同じであり、blue は省略可能です (順位にのみ影響します)。

  • blue OR red green
    • OR の前後を SHOULD に変換すると以下のようになります:
  • SHOULD blue SHOULD red green
    • 明示的な MUSTSHOULD のない語句に暗黙の MUST を適用すると以下のようになります:
  • SHOULD blue SHOULD red MUST green
    • 暗黙の AND がある古典的なブーリアン システムでは、blue OR red AND greenblue OR red green が同じであることが期待されますが、上の例と比較するとその違いが分かります。ここでは green だけが必須で、上記では redgreen の両方が必須です。

  • blue AND red OR green
    • AND の前後を MUST に変換すると以下のようになります:
  • MUST blue MUST red OR green
    • OR の前後を SHOULD に変換すると以下のようになります:
  • MUST blue SHOULD red SHOULD green
    • したがって、結果集合は単に blue を検索した場合と同じで、redgreen が順位に影響するだけの違いです。 これはまた、redgreen のいずれかを含む文書がない場合は、blue AND red OR green を検索しても blue を検索しただけの場合と同じ結果が得られることを意味し、古典的なブーリアン システムに期待されるものとは異なります。

一般に、1 つのクエリで ORAND を混ぜたり、暗黙の AND を含めたりすると、古典的なブーリアン フレームワークでは直感的でない結果が得られます。 また、クエリ語句の正負の組み合わせがいくつの文書に含まれるかを正確に把握していない限り、ブーリアン論理が破綻したケースを検出するのは非常に困難です。

Common use cases

If you have no explicit operators, then the boolean default is AND and the Lucene default is MUST, which are equivalent if they are the only operators present in the query:

  • blue red greenuser intent: all three terms must be present in any results
  • blue AND red AND greenexplicit classic boolean query: all three terms must be present in any results
  • MUST blue MUST red MUST greenLucene interpretation: all three terms must be present in any results

However, since MUST is implicit, nothing is gained by making it explicit by using AND, other than the potential for later boolean confusion.

If the only operator in the query is OR—crucially meaning that there is no implicit AND, then it is the same as everything having a SHOULD (recall that if a query has SHOULD terms but no MUST terms, than at least one of the SHOULD terms will be present in any result):

  • blue OR red OR greenclassic boolean query: at least one of the three terms must be present in any results
  • SHOULD blue SHOULD red SHOULD greenLucene interpretation: at least one of the three terms must be present in any results

Be very careful with implicit AND/MUST! In the example above, blue OR red green the implicit MUST applied to green means that neither blue nor red are strictly required to be in the results.

Booleans, keywords, and prefixes

AND and OR do not interact predictably with special keywords (like insource: or hastemplate:) or with namespaces (like Talk: or User:) and probably should not be used in conjunction with either.

今後の計画

Of course, the Search Platform team is not very happy with this state of affairs.

In the short term we are creating this document and updating the Help:CirrusSearch documentation to reflect the reality of our current system.

Longer term, we plan to implement a new layer in CirrusSearch that will properly construct a Lucene MUST/SHOULD query that is equivalent to a given classic boolean query, including proper support for parentheses and return the expected results. (It is possible to specify in Lucene that at least one of a set of query terms or clauses is a required to match, which is equivalent to a boolean OR; requiring that all of a set of query terms or clauses match is the same as a boolean AND.)

Beyond that, we may also make explicit the MUST and SHOULD operators, possibly using the unary syntax shown in this document, but also possibly using some other syntax, as yet to be determined.

更なる情報

  • BooleanQuerySyntaxa summary of a mailing list discussion about the problem, going back to 2005, with a link to a bug report on the problem from 2003. (The 2003 bug was closed in 2009, and claims there is a different Lucene query parser that does the right thing with boolean queries, but we don't have access to it in CirrusSearch.)