Speed performance tuning for Lucene-search extension
Hi, we're trying to build a highly scalable Wikipedia search mirror, which is required to handle 10,000 requests per second for searches. I tried using Lucene-search extension, but just couldn't get the average fulltext search time go up, our current average search time is around 500ms.
It seems that the deviation of search time is large too, with some searches being 4ms and others being 2,000ms. These performances tests were carried out using JMeter with 10 user request thread and 500 loops. We've tried tuning the JVM memory, tried storing the whole index (index built on the 20111007 wikipedia snapshot XML dump) in RAM, tried increasing/decreasing the SearcherPool.size parameter. So our question is that whether this extension is designed to perform better speedwise? Thanks very much
Have you split your index into frequently searched namespaces and other namespaces (e.g. main vs rest)? This should make the index much smaller and heavily decrease the median time. Some searches will take longer however (if a user searched all of the content). Also note that 10k req/s is quite large. At wikimedia we get maybe 500 req/s.