Extension talk:SphinxSearch/archive

From MediaWiki.org

Jump to: navigation, search

This page contains old discussion points relevant only to older versions of the extension, Sphinx, or MediaWiki.

Contents

[edit] 2007

[edit] Major failures -- help!

Line #s 304 and 305 error out on SphinxSearch_body.php in version 1.8 of Mediawiki. Either i need to comment it out or upgrade to v1.11

Even after upgrading line # 171 errors with Fatal error: Call to undefined method SphinxClient::SetFilter() in C:\wamp\www\wiki\extensions\SphinxSearch_body.php on line 171

Commenting doesn't help since it gives me a "Fatal error on the DB" or something like that. I'm thinking i need to install some package, but don't know which one (something in PEAR?)

Help :(

--17 October 2007

It looks like you are running this extension on Windows. To the best of my knowledge, this extension has not been tried as such yet. But, at least in theory, there is nothing that should prevent it from working. Of course, the big differences are all path related. So, let's first start by making sure your setup is correct. Were you successfully able to perform step 3, step 4, and step 5? Can you also please verify that step 7 was done correctly and you have the sphinxapi.php file in your C:\wamp\www\wiki\extensions\ directory? --Gri6507 12:20, 17 October 2007 (UTC)
I kinda solved this; upgrading to v1.11 off course takes care of the lines 304 & 305. As for the setFilter method, apparently the rc1 of sphinxapi doesn't have this method. I'm trying to copy-paste the method into my API as first option and then will try and build the current non-production API. Will keep you informed --18 October 2007
Thanks for looking into this. I am running my installation with MW 1.9.3 and I don't know which version Svemir (the other developer) is running. I will start a new section on the main page with information about known supported MW versions. As for sphinxapi.php being incorrect, I am assuming you are using v0.9.8rc1? Both Svemir and I based this extension on 0.9.7 (the latest stable release). We'll keep a keen eye on the Sphinx project to make sure that our extension will be completely compatible with future version of Sphinx.
On a side note, I was wondering if you have implemented the windows equivalent of setting up the cron jobs to keep the indexes up to date. If you have, can you please add that information to the documentation? We would much appreciate it! --Gri6507 12:36, 18 October 2007 (UTC)
I'm still stuck and couldn't get much progress. Apparently the line
$sql = "SELECT old_text FROM ".$db->tableName('text')." WHERE old_id=".$docinfo['attrs']['old_id']; ends up with the value of $sql being Select old_text from 'text' where old_id=
I'm not sure why in the first place text is in single quotes (looks like some bug to me) and why the old_id is not getting picked up. Searchd does show the hit coming to it, but it could failing because i'm using a hacked version of the API
Also to answer Gri6507's question, i'm using 0.9.6 rc1 because that's the one that has the windows binaries. I don't have Visual Studio or VC++ to compile from the source code, so even my step #2 (using latest version and compiling) is at hold.
Adding the Windows cron job shouldn't be too tough (my guess); but i'll try it and let you know -- ALl the above posts bought to you by the guy who had the so useful signature Help :( :)
--19 October 2007
According to Sphinx's website, http://www.sphinxsearch.com/downloads/sphinx-0.9.7-win32-release.zip is a windows release of 0.9.7. Is there any reason you are not using it? --Gri6507 11:42, 19 October 2007 (UTC)
Doesn't seem to contain the sphinxapi.php -- that's the reason why i had to choose an older version; this should probably be posted on that developer's website saying the API is missing from the 0.9.7 windows release, but i'm too lazy...any helpers? :) --25 October 2007
I have updated Step #1 and Step #7 with details of how to obtain the sphinxapi.php for Windows. It seems that the intent of the Win32 release binaries package is to only contain the binary EXEs. The PHP files are in either the source code or the API packages. --Gri6507 11:34, 25 October 2007 (UTC)
Does it even work on Windows? Installation instruction (step 1) of the Sphinx site has this to say "At the moment, Windows version of Sphinx's searchd daemon is not intended to be used in production because it can only handle one client at a time." --22 November 2007

[edit] Windows --rotate workaround?

I want Sphinx to update our index very often. If I could, I would love for the index to be incrementally updated every time the db changes. Barring that, I'd install a task to run every 15 minutes or so. However, no matter how often I update the index, I need to take down the Sphinx daemon to do it (limitation on Windows). Can anyone suggest a workaround or modification to the code such that a search request, when the daemon isn't running, waits for it to respond and re-searches? I don't so much care about restarting the daemon. I do care about search appearing broken while the daemon is down. --Cedarrapidsboy 2 November 2007

I am not sure if it is going to work, but here's what I'd try. Open the sphinxapi.php file. In function _Connect (), around line 136, there is a call to
if (!( $fp = @fsockopen ( $this->_host, $this->_port ) ) )</php>
change that to
if (!( $fp = @fsockopen ( $this->_host, $this->_port, $errno, $errstr, 30 ) ) )</php></php>
where the 30 is the timeout in seconds for establishing the connection. The basic idea is that if searchd is not running, no one will be listening on the other end of the socket until searchd comes back to life. This change should block MW from dieing during that brief period of time. Of course, it would be up to you to make sure that
  1. before running the indexer, you must stop searchd
  2. after running the indexer, you must restart searchd
Let me know if that works :-) --Gri6507 22:22, 3 November 2007 (UTC)
That looks promising. I'll give it a try. Since my post, I installed the search on a separate machine (Linux) and that works pretty good. But, this procedure may be what I need to reduce the number of servers in the equation. I'll come back with results. --Cedarrapidsboy 14:25, 5 November 2007 (UTC)
UPDATE - the above code change didn't appear to have an effect. The search still timed-out to a blank page.
  1. Stop searchd
  2. Issue search request
  3. Start searchd (within 30 sec)
--205.175.225.24 16:30, 5 November 2007 (UTC)
Ok. I think I found the issue. According to PHP documentation, the fsockopen() function may not honor the timeout ("Note: Depending on the environment, the Unix domain or the optional connect timeout may not be available."). So, to work around that, change the following code in sphinxapi.php
if (!( $fp = @fsockopen ( $this->_host, $this->_port ) ) )
{
$this->_error = "connection to {$this->_host}:{$this->_port} failed";
return false;
}
to
$connect_timeout = 30; # wait this long in seconds
$start_connect = time();
do {
$fp = @fsockopen ($this->_host, $this->_port, $errno, $errstr, $connect_timeout);
} while (!$fp && !sleep(1) && ((time()-$start_connect) < $connect_timeout));
if (!$fp)
{
$this->_error = "connection to {$this->_host}:{$this->_port} failed";
return false;
}
This way, you can set the waiting period yourself via the use of $connect_timeout variable. I tested this on my machine and it seems to work as expected. Please post your results when you try it out. --Gri6507 23:05, 5 November 2007 (UTC)
Unfortunately, same result. Blank page. I did the following:
  • Kill searchd
  • Issue search request
  • Start searchd
In this case, searchd was still running on a separate machine.
--Cedarrapidsboy 20:20, 6 November 2007 (UTC)
UPDATE!
Here's a change to the code that works:
$connect_timeout = 5; # wait this long in seconds each loop iteration
$loop_timeout = 30; #wait this long in seconds for the entire loop
$start_connect = time();
do {
$fp = @fsockopen ($this->_host, $this->_port, $errno, $errstr, $connect_timeout);
} while (!$fp && !sleep(1) && ((time()-$start_connect) < $loop_timeout));
if (!$fp)
{
$this->_error = "connection to {$this->_host}:{$this->_port} failed";
return false;
}
I added an additional timeout. Without it, a single connection was waiting for 30 seconds, just as long as the entire loop. The previous code never tried the connection again. This code *did* work for me using the testing steps above. --Cedarrapidsboy 20:30, 6 November 2007 (UTC)
Glad to see that it's working for you! I will submit this as an improvement suggestion to the developers of Sphinx. --Gri6507 20:41, 6 November 2007 (UTC)

The solution developed on this talk page seems to be a part of the official sphinxapi.php file in 0.9.8 release :-) Svemir Brkic 14:37, 25 August 2008 (UTC)

[edit] Sphinx Search Terms Limit

It seems that the Sphinx search only accepts 10 search terms. Perhaps it is the same story for the built-in MW search? Any way to change that? Perhaps make it unlimited? Cedarrapidsboy 13:50, 5 November 2007 (UTC)

I did not look at the code yet, but the limit seems to happen only in the sense of number of separate words and counts displayed on top of the search results. That lists only up to 10 words, but the eleventh word I used was also used to filter (and rank) the results. Svemir Brkic 14:31, 5 November 2007 (UTC)
Ah... I can confirm that. I tested it, but the 11th and above search terms were not highlighted red, so didn't think they were included in the results. I'd still be interested in getting all search terms highlighted. Thanks for the reply! --Cedarrapidsboy 14:56, 5 November 2007 (UTC)
This seems to be fixed in the latest version of sphinx (0.9.8-rc2) - according to change log: fixed highlighting (uses 256 words by default now instead of former 10) Svemir Brkic 16:05, 28 March 2008 (UTC)

[edit] Everything is OK it seems, but no search-results

I've installed the Sphinx-extension on a 1.11 mediawiki with the most recent version of Sphinx. Everything seems to work OK, I installed the service (windows-server), tested the search on the command prompt, which gives results. When I go to special:searchSphinx, it displays OK.

The only thing is that nothing happens when I try to search something, it reloads and displays nothing. Do you have any idea what might be causing the problem? It also seems it cannot create a searchd.pid file & logfiles, although the search-indexes are created without a problem. --11 December 2007

Please clarify "reloads and displays nothing". Do you get a blank screen? In that case, you would need to check your php error log for clues. Or you get the same thing as Mark describes below? Svemir Brkic 15:44, 5 January 2008 (UTC)
Hi Svemir, Thanks for your response. I have the same problem as Mark. I discovered that if I disable the internal search, the sphinx special pages suddenly is gone (it also isn't visible in Special:Specialpages). If internalsearch isn't disabled, the sphinxsearch special page is visible, but it doesn't return results. (i've got the same config as mark, only the mediawikiversion is 1.11) 213.132.179.227 10:03, 7 January 2008 (UTC)

We have the same problem. We're on Sphinx 0.9.7 (Win32) and MediaWiki 1.9.7 installed on Win2003 server. After starting search daemon, I can run test.php from command line and get back results there, but searches entered from Sphinx Special Page in the wiki just bounce you back to the main page. The URL looks like the search took place, however. For example, a search on the term "SAM" bounces you back to the wiki's main page, but now this query string appears on the URL: "sphinxsearch=SAM&fulltext=Search&match_all=0&ns0=1" --Mark price 01:26, 14 December 2007 (UTC)

Did you make any changes to you sphinx search configuration file - SphinxSearch.php? You could also try making Sphinx the default search, just to verify if the problem is in the search itself or the way your wiki handles the paths in the "special page only" case. Svemir Brkic 15:44, 5 January 2008 (UTC)

We had the same when we tried using Sphinx-0.9.8-svn-r1112 (Jan 28, 2008 snapshot). Getting the previous version (Sphinx 0.9.7) solved the problem for us.130.234.189.190 11:52, 30 January 2008 (UTC)

I just tested it with 0.9.8-svn-r1112 on MW 1.11 on Linux and it works correctly. I will try it on Windows at some point as well. Would you please make sure you were using the correct version of sphinxapi.php? It needs to be copied from your sphinx download/api folder into the SphinxSearch extension folder each time you change the version of sphinx on your system. If you still have issues with 0.9.8, please post your sphinx.conf somewhere so we can take a look. Svemir Brkic 19:55, 2 February 2008 (UTC)
I also tested it with 0.9.8-svn-r1112 on MW 1.11 on Linux, but mine is giving me the same error as the above users are experiencing. I am doing this on a shared hosting plan. I installed Sphinx to /home/myusername/local/sphinx. I can get everything to work on the command line via SSH, but not on the actual site over http -- it just returns me to the main page, despite the URL looking as though it should have performed the query. I have double-checked the version of sphinxapi.php and gone through all the installation instructions twice but I am dumbfounded. Also, the search daemon seems to stop running after about 5 or 10 minutes. The process will not stay alive any longer than that. I am not that great with Linux yet, but I am slowly becoming more familiar. I'd rather not bother installing an earlier version, so I would appreciate any suggestions. My sphinx.conf file appears below [UPDATE: removed to save space]. --Wikitonic 18:40, 13 February 2008 (UTC)
There seem to be two separate issues here. One is related to how your wiki is setup in general regarding the URLs etc. - please send me a link if you do not mind. Another issue is with the search deamon being stopped. Perhaps your shared host does not allow user processes to run longer than certain amount of time, or use more than certain amount of memory, etc. You should probably ask - maybe they are willing to make an exception if they know exactly what are you doing. Svemir Brkic 00:59, 14 February 2008 (UTC)
Ok, I will check with my webhost about user processes. In the meantime, you can find my wiki here. Unfortunately, it is locked to the general public right now since we haven't officially launched. Even viewing is disabled for anonymous users. I'll get in touch with you about letting you in, after I take care of the search daemon disruption issue. --Wikitonic 16:03, 14 February 2008 (UTC)
Well, I meant to give an update sooner but I guess now is better than never. My shared hosting plan doesn't allow user processes, so until it does, I guess everything is a moot point. I have removed my sphinx conf file so as not to take up unnecessary space. Thanks for your help anyway, Svemir Brkic. -Wikitonic 21:18, 3 March 2008 (UTC)

We have the same problem. SphinxSearch 0.5.3 with sphinx 0.9.8-svn-r1112 on MediaWiki 1.11 on Windows2003 Server.Rroblem is MediaWiki 1.11 needs "title" argument in URL.Like "/mediawiki/index.php?title=XXXX:SphinxSearch&search=foo&....". I change SphinxSearch_body.php line 351 like this,and it goes well.

          $wgOut->addHTML("<form action='$kiAction' method='GET'>
                 <input type='hidden' name='title' value='$titleObj'>
                 <input type='text' name='$searchField' maxlength='100' value='$SearchWord'>

--219.121.144.178 15:26, 13 March 2008 (UTC)

Interesting. This made me think I did not notice the problem because I am using "short" links. However, even after changing to the default index.php?title=... configuration, the search still worked. Can anyone else confirm that this fixes the search in their configuration? Svemir Brkic 11:26, 15 March 2008 (UTC)
This did not fix for me and I started getting parse errors. I changed the end like this:
          $wgOut->addHTML("<form action='$kiAction' method='GET'>
                 <input type='hidden' name='title' value='$titleObj'>
                 <input type='text' name='$searchField' maxlength='100' value='$SearchWord'>");
which cleared the parse errors but still just a blank page. 18:56, 12 May 2008 (UTC)

Issues discussed here are probably fixed in the version 0.6 (released today.) There was an issue with a ? vs. & in the URL, as well as using the correct special page title when internal search is disabled. Svemir Brkic 14:34, 25 August 2008 (UTC)

[edit] 2008

[edit] Warnings on MW 1.11

Warning: Call-time pass-by-reference has been deprecated - argument passed by value; If you would like to pass it by reference, modify the declaration of [runtime function name](). If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file. However, future versions may not support this any longer. in extensions/SphinxSearch_PersonalDict.php on line 75 (also lines 142 and 190) -71.217.0.96 08:18, 5 January 2008 (UTC)

Thanks for the notice. I have fixed this in the CVS already. If you are using a public release, all you need to do is remove the ampersands from all calls to readPersonalDictionary. The method is already declared correctly (the ampersands should stay there.) Svemir Brkic 13:15, 2 February 2008 (UTC)

[edit] Installation issues

WARNING: key 'sql_group_column' is deprecated in /var/sphinx/sphinx.conf line 35; use 'sql_attr_uint' instead.
WARNING: key 'sql_group_column' is deprecated in /var/sphinx/sphinx.conf line 36; use 'sql_attr_uint' instead.

solution: replace sql_group_column by sql_attr_uint --24 March 2008

This depends on the version of Sphinx you are using. CVS version of the extension is already using sql_attr_uint, as it requires the latest version of Sphinx. Svemir Brkic 02:05, 6 April 2008 (UTC)
ERROR: unknown key name 'FROM' in /var/sphinx/sphinx.conf line 48 col 10.
FATAL: failed to parse config file '/var/sphinx/sphinx.conf'.

solution: merge line 48 with 47 making sure to remove the change of line / (not sure why this error) --24 March 2008

Weird. Maybe some invisible extra character sneaked in somehow. Anyway, I merged the lines in CVS, just in case. Svemir Brkic 02:05, 6 April 2008 (UTC)

[edit] Making title match appear first

Is there a way to do this? --25 March 2008

Greater weight is already given to title matches. Do you need any title match (even a partial one) to appear before any content match (even an exact one?) Svemir Brkic 02:19, 26 March 2008 (UTC)
YES. All direct title match should appear first. This is not the case. --207.96.208.130 22:12, 1 April 2008 (UTC)

Sphinx API has changed recently, and that probably caused title matches to show so far in the search results. SetWeights method has been deprecated and the method to use now is SetFieldWeights. The new method expects an associative array of index names and weights. Until we release the next version, here is what you should try:

  • In SphinxSearch.php, change the line that sets $wgSphinxSearch_weights to:
$wgSphinxSearch_weights = array('old_text'=>1, 'page_title'=>100);
  • In SphinxSearch_body.php, change the $cl->SetWeights call in wfSphinxSearch method to:
$cl->SetFieldWeights($wgSphinxSearch_weights);

This will not make all the title matches appear before all the text matches, but it will probably give much better results. Svemir Brkic 21:13, 5 April 2008 (UTC)

Note: Above has been committed to CVS and released in 0.6beta3 package. Svemir Brkic 03:13, 10 May 2008 (UTC)

[edit] Did you mean?

I have SphinxSearch working on my site and it is fast! Now I am attempting to implement the "Did you mean?" function and get the following error on the search: " Uninitialized string offset: 0 in C:\path\to\SphinxSearch_spell.php on line 96 " line 96 is this:

           if ($value[0] == "&") {
               $correction = explode(" ",$value);
               $word = $correction[1];
               $suggstart = strpos($value, ":") + 2;
               $suggestions = substr($value, $suggstart);
               $suggestionarray = explode(", ", $suggestions);
               $guess = $this->bestguess($word, $suggestionarray);
               
               if (strtolower($word) != strtolower($guess)) {
                   $word_suggestions[$word] = $guess;
                   $this->suggestion_needed = true;
               }
           }
       }

Can anyone help with this? Steve Goble14:06, 30 May 2008 (UTC)

You seem to have your PHP error reporting level set too high. If you have to have it that way, change the line to:
           if (substr($value, 0, 1) == "&") {
P. S. There will probably be other places where you get a similar warning... Svemir Brkic 14:51, 30 May 2008 (UTC)
P. P. S. Above tweak is now in the CVS and the latest 0.6 release. Svemir Brkic 15:10, 25 August 2008 (UTC)

You are the man! That fixed the error but "Did you mean" does not work. The search is conducted but returns 0 matches for a mispelling. Crap! One thing after another. Steve Goble 15:16, 30 May 2008 (UTC)

I got it working. I have ASPELL on my FCKEditor and trying to use that with CLI. I just enabled the php-pspell and that works good enough for me!! Thanks again, Svemir! BTW I am on a Windows 2008 box. Works great! Steve Goble 15:48, 30 May 2008 (UTC)

[edit] Problem with SphinxSearch_body?

I have installed Sphinx Search on a wiki (MW1.10), running on Ubuntu Hardy. When I go to the Special:SphinxSearch page, I get the following error...

Catchable fatal error: Object of class Title could not be converted to string in /var/www/wiki/extensions/SphinxSearch_body.php on line 456

Please advise. Eric --87.86.35.34 15:10, 11 July 2008 (UTC)

What version of PHP are you using? Title class has a __toString method, which in PHP 5 makes conversion to string possible. If you are using PHP 4, try changing any place that does echo $title; (or something like that) to echo $title->getPrefixedText(); Svemir Brkic 15:29, 11 July 2008 (UTC)
root@wiki:/var/www/wiki/extensions# php --version
PHP 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2 (cli) (built: May  9 2008 16:34:16) 
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

Eric --87.86.35.34 15:37, 11 July 2008 (UTC)

In that case maybe your version of MW does not have the __toString method in Title.class? If so, above fix would still work. I think I will update extension code to use that anyway. Svemir Brkic 16:50, 11 July 2008 (UTC)

Thanks for your continued help Svemir. I have tried to implement your fix: As far as I can tell I changed any line that dereferences $title to a call to getPrefixedText(). I still get the error above though. The offending line 456 has no reference to class Title, and so far as I can tell neither do it's backlinks.

454              $wgOut->addHTML("<form action='$kiAction' method='GET'>
455              <input type='hidden' name='title' value='$titleObj'>
465              <input type='text' name='$searchField' maxlength='100' value='$SearchWord'>
457              <input type='submit' name='fulltext' value='" . wfMsg('sphinxSearchButton') ."'>");

Do you have any suggestions on what I can change further? Thanks again, Eric --87.86.35.34 16:04, 16 July 2008 (UTC)

It seems to be the value='$titleObj' part. Try value='".$titleObj->getPrefixedText()."' 216.52.121.66 16:58, 16 July 2008 (UTC)
Problem solved! Good job mystery user! Thanks again Svemir - great job. Eric --87.86.35.34 08:55, 17 July 2008 (UTC)