Extension:SphinxSearch/Page rank

This is to document the work on better sorting for SphinxSearch results. Current default is to sort by sphinx internal weight, which is calculated based on number of matches within the text, whether they are in the title or in the body, etc. This is how it would be possible to also sort by number of incoming links to the article and article popularity.

Default SearchSphinx results
Here is an example of the default SphinxSearch search results pulled via SphinxQL. Notice that in this wiki many of the pages ended up with the same weight. The examples below show how to improve the ordering of the results.

Basic example - Adding sort parameters
The example below will sort results first by Sphinx weight, but after that by number of links TO that article and by number of article views. Feel free to experiment with the order of these arguments, or to add additional ones.

In sphinx.conf file, make the following changes:

In LocalSettings.php file, make the following changes after the SphinxSearch.php inclusion:

And finally:
 * Reindex the wiki content
 * Restart the Sphinx search service
 * Restart the Apache service

Basic results
Verify the query results using Extension:SphinxSearch/SphinxQL. Notice that the results have changed slightly and have aligned to the ORDER BY parameters in LocalSettings.php.

Advanced example
The basic example works fairly well and provides some additional sorting options but it still relies heavily on the Sphinx @weight. For some wikis, @weight might not be representative of the true value of the page. For instance, the examples on this page are from a wiki that had a large number of technical documents imported from an external system. The results that were listed at the top of the default and basic searches are actually of very little use to the end user. A better formula for calculating a page rank might be:

In sphinx.conf file, make the following changes:

In LocalSettings.php file, make the following changes after the SphinxSearch.php inclusion:

And finally:
 * Reindex the wiki content
 * Restart the Sphinx search service
 * Restart the Apache service

Advanced results
Verify the query results using Extension:SphinxSearch/SphinxQL. While it is hard to tell by just looking at the page_ids in the results, the new search results are filtered out high scoring but older irrelevant pages, and prioritized slightly lower scoring pages that are actively being updated and maintained by the users.

Column weights
The weights assigned to individual columns in SphinxSearch.php may need to be tweaked to account for the content in the wiki. The defaults are:

To update the weights, update LocalSettings.php with the variables and modify the values:

Use the following query to replicate the column weights from the settings file:

Indexing performance
Adding new columns and JOINs will definitely indexing performance and care must be taken to ensure that the new parameters do not have a negative impact. Listed below are the indexing times for a wiki with ~57,000 articles:
 * Default: ~2 minutes
 * Basic example (1 join): ~14 minutes
 * Advanced example (3 joins): ~43 minutes