Extension talk:SphinxSearch/LQT Archive 1

Showing X of Y documents in search results, but document links not showing
I've got everything working and it is apparently using sphinxsearch as my default search engine, but the results simply say "myword found 8 times in 8 documents", and then nothing is listed underneath it for those documents where the word is in the title only. However, if the word is contained in the body of the entry, those entries show up.

I also had to comment out a php error, it was in sphinxsearch_body.php, which stated something about a variable that was undeclared called 'time', and I had to comment it out. It was in this function

$preamble = sprintf(wfMsg('sphinxSearchPreamble'),               (($page-1)*$wgSphinxSearch_matches+1 > $res['total']) ? $res['total'] : ($page-1)*$wgSphinxSearch_matches+1,                ($page*$wgSphinxSearch_matches > $res['total']) ? $res['total'] :  $page*$wgSphinxSearch_matches,                $res['total'],                $term,'' //              $res[time]


 * Proper fix for this is to change it to $res['time'] which I just did in the CVS version of the code. Please do so locally and let me know if there are other problems. Svemir Brkic 19:05, 2 February 2008 (UTC)

You can see where I commented $res[time] out and replaced with empty single quotes. This didn't seem to break anything so I'm not sure what it was supposed to do. I'm on a windows machine btw. All I really care about is that the document listing appear when the word appears in the document title.
 * "Displaying 0-0 of 0 matches for query specifi* retrieved in 0.000 sec with following stats: specifi* found 8 times in 8 documents".  This is all I'm getting when searching for a particular wildcard.  It appears the main 0-0 of 0 matches is wrong, as the next line shows that it is indeed finding the word in 8 of 8 documents.  When I use the console, here are the results:
 * C:\Sphinx>search --config sphinx.conf speci*
 * Sphinx 0.9.8-dev (r985)
 * Copyright (c) 2001-2007, Andrew Aksyonoff
 * using config file 'sphinx.conf'...
 * index 'wiki_main': query 'speci* ': returned 5 matches of 5 total in 0.000 sec
 * displaying matches:
 * 1. document=23, weight=1, page_namespace=8, :::old_id=47 page_title=Sidebar page_namespace=8
 * 5. document=58, weight=1, page_namespace=6, :::old_id=116 page_title=SIF_Implementation_Specification_1.5r1.pdf page_namespace=6
 * words: 1. 'speci*': 5 documents, 5 hits
 * words: 1. 'speci*': 5 documents, 5 hits


 * Your query indicates that you are using a "star" search, which is not enabled by default in sphinx (it is a new, almost undocumented feature.) Our default sphinx.conf does not enable it either. Maybe you are using a different config file for command line search? To enable it in the config file used by the extension, you need to add this to the main index section:

min_infix_len = 1 enable_star = 1


 * After this, you need to do a full reindex and restart the searchd (--rotate alone is not enough when you change the config file.) Svemir Brkic 20:57, 2 February 2008 (UTC)

Everything is OK it seems, but no search-results
I've installed the Sphinx-extension on a 1.11 mediawiki with the most recent version of Sphinx. Everything seems to work OK, I installed the service (windows-server), tested the search on the commandprompt, which gives results. When I go to special:searchSphinx, it displays OK.

The only thing is that nothing happens when I try to search something, it reloads and displays nothing. Do you have any idea what might be causing the problem? It also seems it cannot create a searchd.pid file & logfiles, although the search-indexes are created without a problem.


 * Please clarify "reloads and displays nothing". Do you get a blank screen? In that case, you would need to check your php error log for clues. Or you get the same thing as Mark describes below? Svemir Brkic 15:44, 5 January 2008 (UTC)


 * Hi Svemir, Thanks for your response. I have the same problem as Mark. I discovered that if I disable the internal search, the sphinx special pages suddenly is gone (it also isn't visible in Special:Specialpages). If internalsearch isn't disabled, the sphinxsearch special page is visible, but it doesn't return results. (i've got the same config as mark, only the mediawikiversion is 1.11) 213.132.179.227 10:03, 7 January 2008 (UTC)


 * We have the same problem. We're on Sphinx 0.9.7 (Win32) and MediaWiki 1.9.7 installed on Win2003 server.  After starting search daemon, I can run test.php from command line and get back results there, but searches entered from Sphinx Special Page in the wiki just bounce you back to the main page.  The URL looks like the search took place, however.  For example, a search on the term "SAM" bounces you back to the wiki's main page, but now this query string appears on the URL: "sphinxsearch=SAM&fulltext=Search&match_all=0&ns0=1" --Mark price 01:26, 14 December 2007 (UTC)


 * Did you make any changes to you sphinx search configuration file - SphinxSearch.php? You could also try making Sphinx the default search, just to verify if the problem is in the search itself or the way your wiki handles the paths in the "special page only" case. Svemir Brkic 15:44, 5 January 2008 (UTC)


 * We had the same when we tried using Sphinx-0.9.8-svn-r1112 (Jan 28, 2008 snapshot). Getting the previous version (Sphinx 0.9.7) solved the problem for us.130.234.189.190 11:52, 30 January 2008 (UTC)


 * I just tested it with 0.9.8-svn-r1112 on MW 1.11 on Linux and it works correctly. I will try it on Windows at some point as well. Would you please make sure you were using the correct version of sphinxapi.php? It needs to be copied from your sphinx download/api folder into the SphinxSearch extension folder each time you change the version of sphinx on your system. If you still have issues with 0.9.8, please post your sphinx.conf somewhere so we can take a look. Svemir Brkic 19:55, 2 February 2008 (UTC)

Major failures -- help!
Line #s 304 and 305 error out on SphinxSearch_body.php in version 1.8 of Mediawiki. Either i need to comment it out or upgrade to v1.11

Even after upgrading line # 171 errors with Fatal error: Call to undefined method SphinxClient::SetFilter in C:\wamp\www\wiki\extensions\SphinxSearch_body.php on line 171

Commenting doesn't help since it gives me a "Fatal error on the DB" or something like that. I'm thinking i need to install some package, but don't know which one (something in PEAR?)

Help :(


 * It looks like you are running this extension on Windows. To the best of my knowledge, this extension has not been tried as such yet. But, at least in theory, there is nothing that should prevent it from working. Of course, the big differences are all path related. So, let's first start by making sure your setup is correct. Were you successfully able to perform step 3, step 4, and step 5? Can you also please verify that step 7 was done correctly and you have the sphinxapi.php file in your C:\wamp\www\wiki\extensions\ directory? --Gri6507 12:20, 17 October 2007 (UTC)


 * I kinda solved this; upgrading to v1.11 off course takes care of the lines 304 & 305. As for the setFilter method, apparently the rc1 of sphinxapi doesn't have this method. I'm trying to copy-paste the method into my API as first option and then will try and build the current non-production API. Will keep you informed


 * Thanks for looking into this. I am running my installation with MW 1.9.3 and I don't know which version Svemir (the other developer) is running. I will start a new section on the main page with information about known supported MW versions. As for sphinxapi.php being incorrect, I am assuming you are using v0.9.8rc1? Both Svemir and I based this extension on 0.9.7 (the latest stable release). We'll keep a keen eye on the Sphinx project to make sure that our extension will be completely compatible with future version of Sphinx.


 * On a side note, I was wondering if you have implemented the windows equivalent of setting up the cron jobs to keep the indexes up to date. If you have, can you please add that information to the documentation? We would much appreciate it! --Gri6507 12:36, 18 October 2007 (UTC)


 * I'm still stuck and couldn't get much progress. Apparently the line
 * $sql = "SELECT old_text FROM ".$db->tableName('text')." WHERE old_id=".$docinfo['attrs']['old_id']; ends up with the value of $sql being Select old_text from 'text' where old_id=


 * I'm not sure why in the first place text is in single quotes (looks like some bug to me) and why the old_id is not getting picked up. Searchd does show the hit coming to it, but it could failing because i'm using a hacked version of the API


 * Also to answer Gri6507's question, i'm using 0.9.6 rc1 because that's the one that has the windows binaries. I don't have Visual Studio or VC++ to compile from the source code, so even my step #2 (using latest version and compiling) is at hold.


 * Adding the Windows cron job shouldn't be too tough (my guess); but i'll try it and let you know -- ALl the above posts bought to you by the guy who had the so useful signature Help :


 * According to Sphinx's website, http://www.sphinxsearch.com/downloads/sphinx-0.9.7-win32-release.zip is a windows release of 0.9.7. Is there any reason you are not using it? --Gri6507 11:42, 19 October 2007 (UTC)


 * Doesn't seem to contain the sphinxapi.php -- that's the reason why i had to choose an older version; this should probably be posted on that developer's website saying the API is missing from the 0.9.7 windows release, but i'm too lazy...any helpers? :)


 * I have updated Step #1 and Step #7 with details of how to obtain the sphinxapi.php for Windows. It seems that the intent of the Win32 release binaries package is to only contain the binary EXEs. The PHP files are in either the source code or the API packages. --Gri6507 11:34, 25 October 2007 (UTC)


 * Does it even work on Windows? Installation instruction (step 1) of the Sphinx site has this to say "At the moment, Windows version of Sphinx's searchd daemon is not intended to be used in production because it can only handle one client at a time."

Directory Structure
How about making it so that Sphinx is in its own subdirectory in the extension directory so that things will be cleaner? (I have a lot of extensions and each has their own directory) Also could the sphinxapi.php be also included in the SphinxSearch tar so that its one stop shopping? Although this could just be implemented in a script if you guys go that way. --SellFone 21:22, 30 October 2007 (UTC)


 * This is an idea I have toyed around with for some time now. As SphinxSearch has grown to include more and more files, the appeal of having it reside in its own directory has increased. I am currently working on the automatic installation & configuration script for the extension. Perhaps I will make it install the extension in its own directory. Thanks for the suggestion! --Gri6507 23:15, 30 October 2007 (UTC)

Windows --rotate workaround?
I want Sphinx to update our index very often. If I could, I would love for the index to be incrementally updated every time the db changes. Barring that, I'd install a task to run every 15 minutes or so. However, no matter how often I update the index, I need to take down the Sphinx daemon to do it (limitation on Windows). Can anyone suggest a workaround or modification to the code such that a search request, when the daemon isn't running, waits for it to respond and re-searches? I don't so much care about restarting the daemon. I do care about search appearing broken while the daemon is down.
 * I am not sure if it is going to work, but here's what I'd try. Open the sphinxapi.php file. In function _Connect , around line 136, there is a call to


 * change that to


 * where the 30 is the timeout in seconds for establishing the connection. The basic idea is that if searchd is not running, no one will be listening on the other end of the socket until searchd comes back to life. This change should block MW from dieing during that brief period of time. Of course, it would be up to you to make sure that
 * before running the indexer, you must stop searchd
 * after running the indexer, you must restart searchd
 * Let me know if that works :-) --Gri6507 22:22, 3 November 2007 (UTC)


 * That looks promising. I'll give it a try.  Since my post, I installed the search on a separate machine (Linux) and that works pretty good.  But, this procedure may be what I need to reduce the number of servers in the equation.  I'll come back with results. --Cedarrapidsboy 14:25, 5 November 2007 (UTC)
 * UPDATE - the above code change didn't appear to have an effect. The search still timed-out to a blank page.
 * Stop searchd
 * Issue search request
 * Start searchd (within 30 sec)
 * --205.175.225.24 16:30, 5 November 2007 (UTC)


 * Ok. I think I found the issue. According to PHP documentation, the fsockopen function may not honor the timeout ("Note: Depending on the environment, the Unix domain or the optional connect timeout may not be available."). So, to work around that, change the following code in sphinxapi.php


 * to


 * This way, you can set the waiting period yourself via the use of $connect_timeout variable. I tested this on my machine and it seems to work as expected. Please post your results when you try it out. --Gri6507 23:05, 5 November 2007 (UTC)
 * Unfortunately, same result. Blank page.  I did the following:
 * Kill searchd
 * Issue search request
 * Start searchd
 * In this case, searchd was still running on a separate machine.
 * --Cedarrapidsboy 20:20, 6 November 2007 (UTC)
 * UPDATE!
 * Here's a change to the code that works:


 * I added an additional timeout. Without it, a single connection was waiting for 30 seconds, just as long as the entire loop.  The previous code never tried the connection again.  This code *did* work for me using the testing steps above.  --Cedarrapidsboy 20:30, 6 November 2007 (UTC)


 * Glad to see that it's working for you! I will submit this as an improvement suggestion to the developers of Sphinx. --Gri6507 20:41, 6 November 2007 (UTC)

Running on separate machines
The installation directions seem to assume that MediaWiki and sphinxd are on the same machine. How should I configure Sphinx and the extension if MediaWiki (and its database) are hosted separately from where Sphinx is installed? --Emufarmers 04:43, 5 November 2007 (UTC)
 * You are correct. I should update the main page with these instructions. To make this extension work in your case you will need to configure sphinx.conf. In that file, modify the src_wiki_main section to specify the correct sql_host = hostname, where hostname is the name of the machine running the MySQL database for your wiki (default is localhost). This way, you can install Sphinx on the same machine as the web server (as opposed to the MySQL server), and all instructions as listed on the main page are still valid.
 * Please let me know if you have any more questions, or, for that matter, if this worked for you. --Gri6507 13:05, 5 November 2007 (UTC)
 * Hi, sorry for the delay. I should have been more clear: My wiki is on shared hosting, so I can't install Sphinx on its server.  I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home.  With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver.  It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)
 * Ok, I understand your setup. Sphinx.conf file can be configured to make sphinxd run on a different machine (let's call it Machine S, for sphinx) from the machine running MySQL (let's call it Machine M, for mysql). However, in that case Machine S has to have netword access to Machine M. My guess is that something similar to your present setup with SSH tunnels could be done here as well. If you are interested in trying this out, please let me know via email (see extension credits) and we could work through these questions then. --Gri6507 22:17, 11 November 2007 (UTC)

Sphinx Search Terms Limit
It seems that the Sphinx search only accepts 10 search terms. Perhaps it is the same story for the built-in MW search? Any way to change that? Perhaps make it unlimited? Cedarrapidsboy 13:50, 5 November 2007 (UTC)


 * I did not look at the code yet, but the limit seems to happen only in the sense of number of separate words and counts displayed on top of the search results. That lists only up to 10 words, but the eleventh word I used was also used to filter (and rank) the results. Svemir Brkic 14:31, 5 November 2007 (UTC)


 * Ah... I can confirm that.  I tested it, but the 11th and above search terms were not highlighted red, so didn't think they were included in the results.  I'd still be interested in getting all search terms highlighted.  Thanks for the reply!  --Cedarrapidsboy 14:56, 5 November 2007 (UTC)

Wildcard Search
Is there a way to enable wildcard search when using Sphinx?
 * According to http://www.sphinxsearch.com/forum/view.html?id=843, the latest 0.9.8 SVN snapshots available at http://www.sphinxsearch.com/downloads.html should have wildcard support. This feature has not yet been tested with this extension, so I can't guarantee that it works. Svemir and I will be taking a look at it as time permits (probably after the 1st of the year). You are very welcome to try this out on your own and post your results here. --Gri6507 13:25, 10 December 2007 (UTC)


 * Please see a note above about min_infix_len and enable_star config options. That makes * searches work for me. Svemir Brkic

Warnings on MW 1.11
Warning: Call-time pass-by-reference has been deprecated - argument passed by value; If you would like to pass it by reference, modify the declaration of [runtime function name]. If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file. However, future versions may not support this any longer. in extensions/SphinxSearch_PersonalDict.php on line 75 (also lines 142 and 190)

-71.217.0.96 08:18, 5 January 2008 (UTC)


 * Thanks for the notice. I have fixed this in the CVS already. If you are using a public release, all you need to do is remove the ampersands from all calls to readPersonalDictionary. The method is already declared correctly (the ampersands should stay there.) Svemir Brkic 13:15, 2 February 2008 (UTC)

Problems with search within page title / SOLVED
Hi, I installed the sphinxsearch on ubuntu and it is working very good. Thx for this extension! I had some problems with title search. Originally, the suggested query (default value in the installation package) was:

#sql_query	= SELECT page_id, page_title, page_namespace, old_id, old_text \ #                 FROM mw_page, mw_revision, mw_text \ #                  WHERE rev_id=page_latest AND old_id=rev_text_id

The page_title content in the database looks like 'HOW_TO_edit_homepage'. There are underlines between each character sequence (mediawiki replaces blanks through underlines when new page is created). I changed the query (for initial and incremental index):

initial index: sql_query	= SELECT page_id, replace(page_title,'_',' ') as page_title, page_namespace, old_id, old_text \ FROM mw_page, mw_revision, mw_text \ WHERE rev_id=page_latest AND old_id=rev_text_id

incremental updates: SELECT page_id, replace(page_title, '_',' ') as page_title, page_namespace, old_id, old_text FROM mw_page, mw_revision, mw_text WHERE rev_id=page_latest AND old_id=rev_text_id AND page_touched>=DATE_FORMAT(CURDATE, '%Y%m%d070000')

Now it works. --Wikigeil 17:15, 21 January 2008 (UTC)


 * I do not think this is necessary. Perhaps there was another issue with your setup and it got resolved while you were changing the queries. If you look at this line in the suggested spihinx.conf file:

charset_table  = 0..9, A..Z->a..z, _->, a..z, \


 * Among other things, this instructs Sphinx to consider an underscore the same as a space. Perhaps you should try again with the original queries, as they would probably work faster. On the other hand, I might be misunderstanding what were you trying to fix, so please let me know if that is the case. Svemir Brkic 13:05, 2 February 2008 (UTC)


 * Hi Svemir, thx for response. The charset_table property for main index is:

# charset definition and case folding rules "table" charset_table	= 0..9, A..Z->a..z, _->, a..z, \ U+C0->a, U+C1->a, U+C2->a, U+C3->a, U+C4->a, U+C5->a, U+C6->a, \ U+C7->c,U+E7->c, U+C8->e, U+C9->e, U+CA->e, U+CB->e, U+CC->i, \ U+CD->i, U+CE->i, U+CF->i, U+D0->d, U+D1->n, U+D2->o, U+D3->o, \ U+D4->o, U+D5->o, U+D6->o, U+D8->o, U+D9->u, U+DA->u, U+DB->u, \ U+DC->u, U+DD->y, U+DE->t, U+DF->s, \ U+E0->a, U+E1->a, U+E2->a, U+E3->a, U+E4->a, U+E5->a, U+E6->a, \ U+E7->c,U+E7->c, U+E8->e, U+E9->e, U+EA->e, U+EB->e, U+EC->i, \ U+ED->i, U+EE->i, U+EF->i, U+F0->d, U+F1->n, U+F2->o, U+F3->o, \ U+F4->o, U+F5->o, U+F6->o, U+F8->o, U+F9->u, U+FA->u, U+FB->u, \ U+FC->u, U+FD->y, U+FE->t, U+FF->s,
 * It contains underscore too. --Wikigeil 17:34, 4 February 2008 (UTC)