Topic on User talk:MWJames

Svemir Brkic (talkcontribs)
MWJames (talkcontribs)

Thanks for effort, right now we are testing and it looks good. Maybe their should be a note that this change will impact the index time since the main index will take around four to five times longer to build the index.

For the incremental index we changed to DATE_FORMAT(CURRENT_TIMESTAMP() - INTERVAL 12 HOUR, '%Y%m%d%h%i%s') to minimize the incremental index time while running the merge function more often.

/path/to/sphinx/bin/indexer --config /path/to/sphinx.conf --rotate --merge wiki_main wiki_incremental --merge-dst-range deleted 0 0

Svemir Brkic (talkcontribs)

There are probably faster ways to gather the same data. For example, the temporary table could be changed into a regular table and updated on a separate schedule. Or it could be replaced with a different kind of join. I need to update some of the larger wikis I am managing to be able to test with enough data.

BTW, why are you merging the two indexes? For me sphinx works just fine when I tell it to use both of them, and merging process needs to load both of them into RAM to merge them, so it is probably slow on a large wiki. What is the benefit for you?

MWJames (talkcontribs)

For us the incremental index is just a temporary index that contains the last 12 hours of changed content. To avoid a main re-indexing (happen once a month for 2GB wiki db) the merge process is running every 6th hour to store the delta in the main index. The IO for the merge process might be higher but it is relatively fast and does have only a minimal impact on the server workload.

Svemir Brkic (talkcontribs)

That makes sense. How long do these 3 processes take in your case (main index. incremental index, and merge)? Svemir Brkic 11:02, 12 September 2011 (UTC)

MWJames (talkcontribs)

On the test system with the new ranking method

main index = 15 ~ 20 min
incremental index = 30 sec ~ 2 min (depending on existing changes, or template change)
merge index = < 30 sec