Extension:SphinxSearch/SQLite configuration
First off, many thanks for the extension. For me it is a real problem-solver.
With a few changes to the sphinx config file and a couple of helper scripts (php), SphinxSearch works with SQLite based MediaWikis, too. Thought this might be of interest, seeing as vanilla search does not work with SQLite in the current stable MediaWiki (1.15.1).
My setup:
- MediaWiki 1.15.1
- PHP 5.1.6
- SphinxSearch 0.6.1
- Sphinx 0.9.8.1
First, the helper scripts. These are php scripts that are run by the sphinx indexer and do the following:
- Connect to the MediaWiki SQLite database file using php_pdo (which must be available to run MW on SQLite anyhow)
- Run the indexer queries
- Translate the results into XML for sphinx to process as an xmlpipe2 type source
These can go anywhere you want - I put mine in the ./data/ directory alongside the wikidb.sqlite file. Probably they should go in ./maintenance where all the other command-line PHP scripts are.
This is the main update script:
/path/to/wiki/data/sphinx_sqlite_main.php:
<?php
// This is the path to your SQLite MediaWiki database file
$wikidb='/path/to/wiki/data/wikidb.sqlite';
// Bail if $wikidb is not a file
if (!is_file($wikidb)) {
exit;
}
// Bail if PDO constuctor fails
if (!$db = new PDO("sqlite:".$wikidb)) {
exit;
}
// Build the query
$qry = "SELECT page_id,page_title,page_namespace,old_id,old_text ";
$qry .= "FROM page,revision,text WHERE rev_id=page_latest AND old_id=rev_text_id";
// Run the query
$res=$db->query($qry);
// Parse the results into XML
$xmlout = '<?xml version="1.0" encoding="utf-8"?>'."\n"; // Should be true as SQLite downconverts UTF-16
$xmlout .= '<sphinx:docset>'."\n";
while($arr=$res->fetch(PDO::FETCH_ASSOC)) {
$xmlout .= '<sphinx:document id="'.$arr['page_id'].'">'."\n";
foreach ($arr as $strField => $strVal) {
$xmlout .= '<'.$strField.'>';
$xmlout .= $strVal;
$xmlout .= '</'.$strField.'>'."\n";
}
$xmlout .= '</sphinx:document>'."\n";
}
$xmlout .= '</sphinx:docset>'."\n";
// Deconstruct
$db = null;
// Return the XML
if ($xmlout != '') {
print $xmlout;
}
?>
This is the incremental update script. The only difference is the extra AND clause in the query. Probably these should be combined, but I didn't want to bother with command line options. I'm lazy.
/path/to/wiki/data/sphinx_sqlite_incremental.php:
<?php
// This is the path to your SQLite MediaWiki database file
$wikidb='/path/to/wiki/data/wikidb.sqlite';
// Bail if $wikidb is not a file
if (!is_file($wikidb)) {
exit;
}
// Bail if PDO constuctor fails
if (!$db = new PDO("sqlite:".$wikidb)) {
exit;
}
// Build the query
$qry = "SELECT page_id,page_title,page_namespace,old_id,old_text ";
$qry .= "FROM page,revision,text WHERE rev_id=page_latest AND old_id=rev_text_id ";
$qry .= "AND page_touched >= ".strftime("%Y%m%d")."070000 "; // adjust to the time of the main indexing job in UTC
// Run the query
$res=$db->query($qry);
// Parse the results into XML
$xmlout = '<?xml version="1.0" encoding="utf-8"?>'."\n"; // Should be true as SQLite downconverts UTF-16
$xmlout .= '<sphinx:docset>'."\n";
while($arr=$res->fetch(PDO::FETCH_ASSOC)) {
$xmlout .= '<sphinx:document id="'.$arr['page_id'].'">'."\n";
foreach ($arr as $strField => $strVal) {
$xmlout .= '<'.$strField.'>';
$xmlout .= $strVal;
$xmlout .= '</'.$strField.'>'."\n";
}
$xmlout .= '</sphinx:document>'."\n";
}
$xmlout .= '</sphinx:docset>'."\n";
// Deconstruct
$db = null;
// Return the XML
if ($xmlout != '') {
print $xmlout;
}
?>
You can test these with php -e script.php and they should spit out XML.
We will use these to feed the xmlpipe2 sources in sphinx.conf. I chose to define the fields and attributes in the config file, though you can also do this in the XML itself. Here is what your source containers look like. These replace the MySQL ones of the same names.
sphinx.conf:
source src_wiki_main
{
type = xmlpipe2
xmlpipe_command = php -e /path/to/wiki/data/sphinx_sqlite_main.php
xmlpipe_field = page_id
xmlpipe_field = page_title
xmlpipe_field = old_text
xmlpipe_attr_uint = page_namespace
xmlpipe_attr_uint = old_id
}
# data source definition for the incremental index
source src_wiki_incremental : src_wiki_main
{
xmlpipe_command = php -e /path/to/wiki/data/sphinx_sqlite_incremental.php
# all other parameters are copied from the parent source,
}
You will likely need to specify the full path to the PHP command line executable in the xmlpipe_command directives if this isn't in cron's exection path. (i.e., xmlpipe_command = /usr/bin/php -e [...]).
That's it. All else is as it appears on the Extension Page.
NB: This is functional on a dev box with a ten page wiki. It has not been production tested. That is your job <wink>.
-Jef (jef at lfaccess dot net - checked infrequently)
UPDATES -- Eshe 1/05/2011:
The following were modifications that I made in order to get the SQLite installation working correctly:
- In sphinx_sqlite_incremental.php and sphinx_sqlite_main.php, you will need to update 2 areas of the code, see below with comments marked enpicket
....
// fix to resolve Warning: strftime() -- enpicket 1/04/2011
date_default_timezone_set('America/Phoenix'); //change to your timezone
...
// Parse the results into XML
$xmlout = '<?xml version="1.0" encoding="utf-8"?>'."\n"; // Should be true as SQLite downconverts UTF-16
// Fix for error: XML Parsing Error: prefix not bound to a namespace -- enpicket 1/05/2011
// Define name spaces
$xmlout .= '<sphinx:docset xmlns:sphinx="http:<your site>/sphinx">'."\n"; //the url does not need to be valid, the parser doesn't check
while($arr=$res->fetch(PDO::FETCH_ASSOC)) {
$xmlout .= '<sphinx:document id="'.$arr['page_id'].'">'."\n";
foreach ($arr as $strField => $strVal) {
$xmlout .= '<'.$strField.'>';
// added CDATA because html data in the pages breaks the xml parser. -- enpicket 1/05/2011
$xmlout .= "<![CDATA[\n" . $strVal . "\n]]>";
$xmlout .= '</'.$strField.'>'."\n";
}
$xmlout .= '</sphinx:document>'."\n";
}
$xmlout .= '</sphinx:docset>'."\n";
Hope this helps anyone else who is using a SQLite MediaWiki Installation!
- Eshe (eshe dot n dot pickett at intel dot com)
