Extension talk:SphinxSearch/LQT Archive 1

Showing X of Y documents in search results, but document links not showing
I've got everything working and it is apparently using sphinxsearch as my default search engine, but the results simply say "myword found 8 times in 8 documents", and then nothing is listed underneath it for those documents where the word is in the title only. However, if the word is contained in the body of the entry, those entries show up.

I also had to comment out a php error, it was in sphinxsearch_body.php, which stated something about a variable that was undeclared called 'time', and I had to comment it out. It was in this function

$preamble = sprintf(wfMsg('sphinxSearchPreamble'),               (($page-1)*$wgSphinxSearch_matches+1 > $res['total']) ? $res['total'] : ($page-1)*$wgSphinxSearch_matches+1,                ($page*$wgSphinxSearch_matches > $res['total']) ? $res['total'] :  $page*$wgSphinxSearch_matches,                $res['total'],                $term,'' //                $res[time]

You can see where I commented $res[time] out and replaced with empty single quotes. This didn't seem to break anything so I'm not sure what it was supposed to do. I'm on a windows machine btw. All I really care about is that the document listing appear when the word appears in the document title.
 * "Displaying 0-0 of 0 matches for query specifi* retrieved in 0.000 sec with following stats: specifi* found 8 times in 8 documents".  This is all I'm getting when searching for a particular wildcard.  It appears the main 0-0 of 0 matches is wrong, as the next line shows that it is indeed finding the word in 8 of 8 documents.  When I use the console, here are the results:
 * C:\Sphinx>search --config sphinx.conf speci*
 * Sphinx 0.9.8-dev (r985)
 * Copyright (c) 2001-2007, Andrew Aksyonoff
 * using config file 'sphinx.conf'...
 * index 'wiki_main': query 'speci* ': returned 5 matches of 5 total in 0.000 sec
 * displaying matches:
 * 1. document=23, weight=1, page_namespace=8, :::old_id=47 page_title=Sidebar page_namespace=8
 * 2. document=47, weight=1, page_namespace=6, :::old_id=92 page_title=ImplementationSpecification_2_1.pdf page_namespace=6
 * 3. document=49, weight=1, page_namespace=6, :::old_id=98 page_title=ImplementationSpecification_2_0r1.pdf page_namespace=6
 * 4. document=52, weight=1, page_namespace=6, :::old_id=102 page_title=ImplementationSpecification_2_0.pdf page_namespace=6
 * 5. document=58, weight=1, page_namespace=6, :::old_id=116 page_title=SIF_Implementation_Specification_1.5r1.pdf page_namespace=6
 * words: 1. 'speci*': 5 documents, 5 hits

Everything is OK it seems, but no search-results
I've installed the Sphinx-extension on a 1.11 mediawiki with the most recent version of Sphinx. Everything seems to work OK, I installed the service (windows-server), tested the search on the commandprompt, which gives results. When I go to special:searchSphinx, it displays OK.

The only thing is that nothing happens when I try to search something, it reloads and displays nothing. Do you have any idea what might be causing the problem? It also seems it cannot create a searchd.pid file & logfiles, although the search-indexes are created without a problem.


 * Please clarify "reloads and displays nothing". Do you get a blank screen? In that case, you would need to check your php error log for clues. Or you get the same thing as Mark describes below? Svemir Brkic 15:44, 5 January 2008 (UTC)


 * Hi Svemir, Thanks for your response. I have the same problem as Mark. I discovered that if I disable the internal search, the sphinx special pages suddenly is gone (it also isn't visible in Special:Specialpages). If internalsearch isn't disabled, the sphinxsearch special page is visible, but it doesn't return results. (i've got the same config as mark, only the mediawikiversion is 1.11) 213.132.179.227 10:03, 7 January 2008 (UTC)


 * We have the same problem. We're on Sphinx 0.9.7 (Win32) and MediaWiki 1.9.7 installed on Win2003 server.  After starting search daemon, I can run test.php from command line and get back results there, but searches entered from Sphinx Special Page in the wiki just bounce you back to the main page.  The URL looks like the search took place, however.  For example, a search on the term "SAM" bounces you back to the wiki's main page, but now this query string appears on the URL: "sphinxsearch=SAM&fulltext=Search&match_all=0&ns0=1" --Mark price 01:26, 14 December 2007 (UTC)


 * Did you make any changes to you sphinx search configuration file - SphinxSearch.php? You could also try making Sphinx the default search, just to verify if the problem is in the search itself or the way your wiki handles the paths in the "special page only" case. Svemir Brkic 15:44, 5 January 2008 (UTC)

Major failures -- help!
Line #s 304 and 305 error out on SphinxSearch_body.php in version 1.8 of Mediawiki. Either i need to comment it out or upgrade to v1.11

Even after upgrading line # 171 errors with Fatal error: Call to undefined method SphinxClient::SetFilter in C:\wamp\www\wiki\extensions\SphinxSearch_body.php on line 171

Commenting doesn't help since it gives me a "Fatal error on the DB" or something like that. I'm thinking i need to install some package, but don't know which one (something in PEAR?)

Help :(


 * It looks like you are running this extension on Windows. To the best of my knowledge, this extension has not been tried as such yet. But, at least in theory, there is nothing that should prevent it from working. Of course, the big differences are all path related. So, let's first start by making sure your setup is correct. Were you successfully able to perform step 3, step 4, and step 5? Can you also please verify that step 7 was done correctly and you have the sphinxapi.php file in your C:\wamp\www\wiki\extensions\ directory? --Gri6507 12:20, 17 October 2007 (UTC)


 * I kinda solved this; upgrading to v1.11 off course takes care of the lines 304 & 305. As for the setFilter method, apparently the rc1 of sphinxapi doesn't have this method. I'm trying to copy-paste the method into my API as first option and then will try and build the current non-production API. Will keep you informed


 * Thanks for looking into this. I am running my installation with MW 1.9.3 and I don't know which version Svemir (the other developer) is running. I will start a new section on the main page with information about known supported MW versions. As for sphinxapi.php being incorrect, I am assuming you are using v0.9.8rc1? Both Svemir and I based this extension on 0.9.7 (the latest stable release). We'll keep a keen eye on the Sphinx project to make sure that our extension will be completely compatible with future version of Sphinx.


 * On a side note, I was wondering if you have implemented the windows equivalent of setting up the cron jobs to keep the indexes up to date. If you have, can you please add that information to the documentation? We would much appreciate it! --Gri6507 12:36, 18 October 2007 (UTC)


 * I'm still stuck and couldn't get much progress. Apparently the line
 * $sql = "SELECT old_text FROM ".$db->tableName('text')." WHERE old_id=".$docinfo['attrs']['old_id']; ends up with the value of $sql being Select old_text from 'text' where old_id=


 * I'm not sure why in the first place text is in single quotes (looks like some bug to me) and why the old_id is not getting picked up. Searchd does show the hit coming to it, but it could failing because i'm using a hacked version of the API


 * Also to answer Gri6507's question, i'm using 0.9.6 rc1 because that's the one that has the windows binaries. I don't have Visual Studio or VC++ to compile from the source code, so even my step #2 (using latest version and compiling) is at hold.


 * Adding the Windows cron job shouldn't be too tough (my guess); but i'll try it and let you know -- ALl the above posts bought to you by the guy who had the so useful signature Help :


 * According to Sphinx's website, http://www.sphinxsearch.com/downloads/sphinx-0.9.7-win32-release.zip is a windows release of 0.9.7. Is there any reason you are not using it? --Gri6507 11:42, 19 October 2007 (UTC)


 * Doesn't seem to contain the sphinxapi.php -- that's the reason why i had to choose an older version; this should probably be posted on that developer's website saying the API is missing from the 0.9.7 windows release, but i'm too lazy...any helpers? :)


 * I have updated Step #1 and Step #7 with details of how to obtain the sphinxapi.php for Windows. It seems that the intent of the Win32 release binaries package is to only contain the binary EXEs. The PHP files are in either the source code or the API packages. --Gri6507 11:34, 25 October 2007 (UTC)


 * Does it even work on Windows? Installation instruction (step 1) of the Sphinx site has this to say "At the moment, Windows version of Sphinx's searchd daemon is not intended to be used in production because it can only handle one client at a time."

fails because trying to search non-existent table Zebee Johnstone 01:48, 26 September 2007 (UTC)
Set up as described, but the table names in the version I have, MediaWiki 1.11 are: +--+ +--+
 * Tables_in_wikidb |
 * mw_archive      |
 * mw_blobs        |
 * mw_brokenlinks  |
 * mw_categorylinks |
 * mw_cur          |
 * mw_hitcounter   |
 * mw_image        |
 * mw_imagelinks   |
 * mw_interwiki    |
 * mw_ipblocks     |
 * mw_links        |
 * mw_linkscc      |
 * mw_logging      |
 * mw_math         |
 * mw_objectcache  |
 * mw_old          |
 * mw_oldimage     |
 * mw_querycache   |
 * mw_recentchanges |
 * mw_searchindex  |
 * mw_site_stats   |
 * mw_user         |
 * mw_user_newtalk |
 * mw_user_rights  |
 * mw_validate     |
 * mw_watchlist    |

so it isn't finding "wiki-page" ERROR: sql_query: Table 'wikidb.wiki_page' doesn't exist (DSN=mysql://wikiuser:***@localhost:3306/wikidb).

So which of my tables are the equivalents you are searching?


 * I believe you are getting this error when trying to run the Sphinx indexer or search tools? If that's the case then you missed a step in the instructions. In step 2, when you create the sphinx.conf file, you need to make sure to replace all instances of wiki_ with whatever your table prefix is (in your case, it looks like mw_). That's what I was trying to explain the the text after the sphinx.conf content listing. Please let me know if I should rephrase that paragraph. --Gri6507 12:02, 26 September 2007 (UTC)

Cleanup required
The step2 is confusing because Sphinx already comes with a config file -- it needs to be made more clear

Steps 1 to Step 9 is also not very proper. Decent headings would be nice.


 * I made the steps 2 and 9 a little more clear. Let us know if you have some additional suggestions. Svemir Brkic 12:39, 19 October 2007 (UTC)

Failed at step 7
Everything worked fine up to step 6

Having require_once( "$IP/extensions/SphinxSearch.php" ); in the localsettings.php brings up the following messages (PHP errLogFile)

PHP Warning: Call-time pass-by-reference has been deprecated - argument passed by value; If you would like to pass it by reference, modify the declaration of [runtime function name]. If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file. However, future versions may not support this any longer. in E:\- Daten -\- MyWebSite -\HydroWiki\extensions\SphinxSearch.php on line 88

and

PHP Fatal error: Cannot redeclare class UnlistedSpecialPage in  wikiroot\includes\SpecialPage.php on line 703

Can you furthermore comment on $wgSphinxSearch_index = "wiki"; Has it to be replaced by the sphinx.conf file or by one of the generated indexes?


 * The original code used a pass-by-reference in one place. This been fixed in recent versions. $wgSphinxSearch_index needs to have the name of one of the generated indexes, as specified by a line such as "index wiki {" in the sphinx.conf file. Svemir Brkic 12:32, 5 October 2007 (UTC)

Alternative version
Modified version of this extension is currently in use at the New World Encyclopedia. Changes are explained at the above link and all the source files are linked from there too. Feel free to comment/use any of the code and let me know if I am not attributing somebody properly. Thanks. Svemir Brkic 03:35, 5 October 2007 (UTC)

Version 0.3 of our modified extension is available at the above link. We have further changed the main sphinx.conf query to use page_id as the primary document key. This avoids duplicate results when incremental indexing is used. Svemir Brkic 01:22, 8 October 2007 (UTC)
 * This version has merged with this extension starting with v0.3. --Gri6507 21:58, 12 October 2007 (UTC)


 * Honestly, I think this is not an alternate version, but rather an example of the search in use. I went to the website some time ago and it didn't give me details of what is modified etc. So i don't really know whether it is modified in the first place


 * It was and alternate version on October 5, when the comment was written. At that time the page also described what changes were made etc. Since then, those changes were merged into the official version as pointed out on the line just above your comment. Svemir Brkic 13:11, 25 October 2007 (UTC)

Apache
Any expectation that Sphinx will come out with a version to work under Apache? Svanslyck 22:42, 29 October 2007 (UTC)


 * I am not aware of any problems with Apache. It certainly works for me (on Apache 2.) There is nothing in the extension code that would make it Apache-specific, and Sphinx search engine itself has no dependency on the web server you use. They do provide an add-on for MySQL that lets you use Sphinx via database queries, but that does not have much to do with Apache either. Perhaps you meant something else? 75.75.36.158 23:35, 29 October 2007 (UTC)


 * I too am running under Apache 2.x without any problems. What kind of issues are you having? --Gri6507 23:55, 29 October 2007 (UTC)

Directory Structure
How about making it so that Sphinx is in its own subdirectory in the extension directory so that things will be cleaner? (I have a lot of extensions and each has their own directory) Also could the sphinxapi.php be also included in the SphinxSearch tar so that its one stop shopping? Although this could just be implemented in a script if you guys go that way. --SellFone 21:22, 30 October 2007 (UTC)


 * This is an idea I have toyed around with for some time now. As SphinxSearch has grown to include more and more files, the appeal of having it reside in its own directory has increased. I am currently working on the automatic installation & configuration script for the extension. Perhaps I will make it install the extension in its own directory. Thanks for the suggestion! --Gri6507 23:15, 30 October 2007 (UTC)

Windows --rotate workaround?
I want Sphinx to update our index very often. If I could, I would love for the index to be incrementally updated every time the db changes. Barring that, I'd install a task to run every 15 minutes or so. However, no matter how often I update the index, I need to take down the Sphinx daemon to do it (limitation on Windows). Can anyone suggest a workaround or modification to the code such that a search request, when the daemon isn't running, waits for it to respond and re-searches? I don't so much care about restarting the daemon. I do care about search appearing broken while the daemon is down.
 * I am not sure if it is going to work, but here's what I'd try. Open the sphinxapi.php file. In function _Connect , around line 136, there is a call to


 * change that to


 * where the 30 is the timeout in seconds for establishing the connection. The basic idea is that if searchd is not running, no one will be listening on the other end of the socket until searchd comes back to life. This change should block MW from dieing during that brief period of time. Of course, it would be up to you to make sure that
 * before running the indexer, you must stop searchd
 * after running the indexer, you must restart searchd
 * Let me know if that works :-) --Gri6507 22:22, 3 November 2007 (UTC)


 * That looks promising. I'll give it a try.  Since my post, I installed the search on a separate machine (Linux) and that works pretty good.  But, this procedure may be what I need to reduce the number of servers in the equation.  I'll come back with results. --Cedarrapidsboy 14:25, 5 November 2007 (UTC)
 * UPDATE - the above code change didn't appear to have an effect. The search still timed-out to a blank page.
 * Stop searchd
 * Issue search request
 * Start searchd (within 30 sec)
 * --205.175.225.24 16:30, 5 November 2007 (UTC)


 * Ok. I think I found the issue. According to PHP documentation, the fsockopen function may not honor the timeout ("Note: Depending on the environment, the Unix domain or the optional connect timeout may not be available."). So, to work around that, change the following code in sphinxapi.php


 * to


 * This way, you can set the waiting period yourself via the use of $connect_timeout variable. I tested this on my machine and it seems to work as expected. Please post your results when you try it out. --Gri6507 23:05, 5 November 2007 (UTC)
 * Unfortunately, same result. Blank page.  I did the following:
 * Kill searchd
 * Issue search request
 * Start searchd
 * In this case, searchd was still running on a separate machine.
 * --Cedarrapidsboy 20:20, 6 November 2007 (UTC)
 * UPDATE!
 * Here's a change to the code that works:


 * I added an additional timeout. Without it, a single connection was waiting for 30 seconds, just as long as the entire loop.  The previous code never tried the connection again.  This code *did* work for me using the testing steps above.  --Cedarrapidsboy 20:30, 6 November 2007 (UTC)


 * Glad to see that it's working for you! I will submit this as an improvement suggestion to the developers of Sphinx. --Gri6507 20:41, 6 November 2007 (UTC)

Running on separate machines
The installation directions seem to assume that MediaWiki and sphinxd are on the same machine. How should I configure Sphinx and the extension if MediaWiki (and its database) are hosted separately from where Sphinx is installed? --Emufarmers 04:43, 5 November 2007 (UTC)
 * You are correct. I should update the main page with these instructions. To make this extension work in your case you will need to configure sphinx.conf. In that file, modify the src_wiki_main section to specify the correct sql_host = hostname, where hostname is the name of the machine running the MySQL database for your wiki (default is localhost). This way, you can install Sphinx on the same machine as the web server (as opposed to the MySQL server), and all instructions as listed on the main page are still valid.
 * Please let me know if you have any more questions, or, for that matter, if this worked for you. --Gri6507 13:05, 5 November 2007 (UTC)
 * Hi, sorry for the delay. I should have been more clear: My wiki is on shared hosting, so I can't install Sphinx on its server.  I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home.  With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver.  It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)
 * Ok, I understand your setup. Sphinx.conf file can be configured to make sphinxd run on a different machine (let's call it Machine S, for sphinx) from the machine running MySQL (let's call it Machine M, for mysql). However, in that case Machine S has to have netword access to Machine M. My guess is that something similar to your present setup with SSH tunnels could be done here as well. If you are interested in trying this out, please let me know via email (see extension credits) and we could work through these questions then. --Gri6507 22:17, 11 November 2007 (UTC)

Sphinx Search Terms Limit
It seems that the Sphinx search only accepts 10 search terms. Perhaps it is the same story for the built-in MW search? Any way to change that? Perhaps make it unlimited? Cedarrapidsboy 13:50, 5 November 2007 (UTC)


 * I did not look at the code yet, but the limit seems to happen only in the sense of number of separate words and counts displayed on top of the search results. That lists only up to 10 words, but the eleventh word I used was also used to filter (and rank) the results. Svemir Brkic 14:31, 5 November 2007 (UTC)


 * Ah... I can confirm that.  I tested it, but the 11th and above search terms were not highlighted red, so didn't think they were included in the results.  I'd still be interested in getting all search terms highlighted.  Thanks for the reply!  --Cedarrapidsboy 14:56, 5 November 2007 (UTC)

Indexer just does not do anything.
When I run sphinx/src/indexer --config sphinx_conf/sphinx.conf, it just does not do anything. It just brings me back to prompt after saying using config file 'sphinx_conf/sphinx.conf'...

I checked and rechecked everything in the conf file. If I change anything in the sql connect area, i get an SQL error, so that all seems to work fine. I made sure to add mw_ to all the tables (not columns).

I installed sphinx in /home/rpedia/sphinx. In there is an src folder that has all of the commands. Is this wrong maybe?

The paths are like so path		= /home/rpedia/sphinx/wiki_main

Any help as to what is going wrong?

Thanks, --72.195.138.162 14:46, 12 November 2007 (UTC)


 * You probably want to create a separate folder for data files created by the indexer. Post your sphinx.conf file somewhere (without the db password, of course) and also please mention the version of sphinx you are using, etc. Svemir Brkic 16:50, 12 November 2007 (UTC)


 * Thanks. Here is the conf file. http://risdpedia.net/sphinx.conf ... Version  0.9.7. . Thanks again, --72.195.138.162 21:45, 12 November 2007 (UTC)


 * It looks like you inadvertently took Svemir's suggestion to place all of the indexer files in /home/rpedia/sphinx/wiki_main. However, my guess is that you have not created that directory. Can you please try to create that directory first and then rerun the indexer? --Gri6507 03:00, 13 November 2007 (UTC)


 * Yeah, I just checked, and the folders are all there.I think im going to try to reinstall.--72.195.138.162 01:01, 14 November 2007 (UTC)
 * So far... it is fixed. It may have been a really stupid mistake on my part. But after reinstalling it seems to be going along. Thanks! --72.195.138.162 01:30, 14 November 2007 (UTC)
 * Glad to hear it's working for you. --Gri6507 12:47, 14 November 2007 (UTC)

Wildcard Search
Is there a way to enable wildcard search when using Sphinx?
 * According to http://www.sphinxsearch.com/forum/view.html?id=843, the latest 0.9.8 SVN snapshots available at http://www.sphinxsearch.com/downloads.html should have wildcard support. This feature has not yet been tested with this extension, so I can't guarantee that it works. Svemir and I will be taking a look at it as time permits (probably after the 1st of the year). You are very welcome to try this out on your own and post your results here. --Gri6507 13:25, 10 December 2007 (UTC)

Warnings on MW 1.11
Warning: Call-time pass-by-reference has been deprecated - argument passed by value; If you would like to pass it by reference, modify the declaration of [runtime function name]. If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file. However, future versions may not support this any longer. in extensions/SphinxSearch_PersonalDict.php on line 75 (also lines 142 and 190)

-71.217.0.96 08:18, 5 January 2008 (UTC)

Problems with search within page title / SOLVED
Hi, I installed the sphinxsearch on ubuntu and it is working very good. Thx for this extension! I had some problems with title search. Originally, the suggested query (default value in the installation package) was:

#sql_query	= SELECT page_id, page_title, page_namespace, old_id, old_text \ #                 FROM mw_page, mw_revision, mw_text \ #                  WHERE rev_id=page_latest AND old_id=rev_text_id

The page_title content in the database looks like 'HOW_TO_edit_homepage'. There are underlines between each character sequence (mediawiki replaces blanks through underlines when new page is created). I changed the query (for initial and incremental index):

initial index: sql_query	= SELECT page_id, replace(page_title,'_',' ') as page_title, page_namespace, old_id, old_text \ FROM mw_page, mw_revision, mw_text \ WHERE rev_id=page_latest AND old_id=rev_text_id

incremental updates: SELECT page_id, replace(page_title, '_',' ') as page_title, page_namespace, old_id, old_text FROM mw_page, mw_revision, mw_text WHERE rev_id=page_latest AND old_id=rev_text_id AND page_touched>=DATE_FORMAT(CURDATE, '%Y%m%d070000')

Now it works. --Wikigeil 17:15, 21 January 2008 (UTC)