Topic on Extension talk:Lucene-search

Index step slows to an unusable speed on full Wikipedia dump

2
64.236.163.23 (talkcontribs)

Hi,

I am trying to run the 'build' step from the instructions on a full dump of the English Wikipedia site.

I find that it runs at a reasonable rate until what appears to be a spell-correction step. This starts at ~50,000 terms/second, but slows down, and I killed it at ~600 terms/second after about a week, and only about half way through at "mo...".

Are there configuration settings I should be changing to run the build step against such a big corpus?

Thanks, Barry

64.236.163.23 (talkcontribs)

We removed the 'spell correction' step in the indexer, and the time came down to a manageable level.

We would rather have this in, so if anyone has a better solution than stripping out functionality, I would still love to hear it.

Barry

Reply to "Index step slows to an unusable speed on full Wikipedia dump"