Help:CirrusSearch/RegexTooComplex

The  syntax implements reasonably efficient regular expression searches written in Lucene's dialect. For efficiency reasons there is a limit to how complex these regexes can be.

What syntax is considered complex?
The biggest increases in complexity come from non-determinism followed by repetition. That looks like this: insource:/[ac]*a[ac]{50,200}/ The  part is non-deterministic and the   is a repeat. On the other hand this is ok: insource:/[ac]*a[de]{50,200}/ because  doesn't overlap with.

Generally you can avoid complexity by using limiting the acceptable characters. So: insource:/[ac]*a.*[^"]+\"/ is much less complex than: insource:/[ac]*a.*[^"]{50,100}/

Why?
Lucene compiles regular expressions to DFAs. It does this by converting the regular expressions to NFAs and then converting those to DFAs. The worst case complexity for that operation is exponential on the number of states in the NFA and the NFA's number of states is related to the. Non-determinism followed by repetition followed by repetition triggers that exponential state growth. We limit the number of states to 20,000 to prevent them from eating all of our memory.