Extension talk:SphinxSearch/LQT Archive 1

Major failures -- help!
Line #s 304 and 305 error out on SphinxSearch_body.php in version 1.8 of Mediawiki. Either i need to comment it out or upgrade to v1.11

Even after upgrading line # 171 errors with Fatal error: Call to undefined method SphinxClient::SetFilter in C:\wamp\www\wiki\extensions\SphinxSearch_body.php on line 171

Commenting doesn't help since it gives me a "Fatal error on the DB" or something like that. I'm thinking i need to install some package, but don't know which one (something in PEAR?)

Help :(


 * It looks like you are running this extension on Windows. To the best of my knowledge, this extension has not been tried as such yet. But, at least in theory, there is nothing that should prevent it from working. Of course, the big differences are all path related. So, let's first start by making sure your setup is correct. Were you successfully able to perform step 3, step 4, and step 5? Can you also please verify that step 7 was done correctly and you have the sphinxapi.php file in your C:\wamp\www\wiki\extensions\ directory? --Gri6507 12:20, 17 October 2007 (UTC)


 * I kinda solved this; upgrading to v1.11 off course takes care of the lines 304 & 305. As for the setFilter method, apparently the rc1 of sphinxapi doesn't have this method. I'm trying to copy-paste the method into my API as first option and then will try and build the current non-production API. Will keep you informed


 * Thanks for looking into this. I am running my installation with MW 1.9.3 and I don't know which version Svemir (the other developer) is running. I will start a new section on the main page with information about known supported MW versions. As for sphinxapi.php being incorrect, I am assuming you are using v0.9.8rc1? Both Svemir and I based this extension on 0.9.7 (the latest stable release). We'll keep a keen eye on the Sphinx project to make sure that our extension will be completely compatible with future version of Sphinx.


 * On a side note, I was wondering if you have implemented the windows equivalent of setting up the cron jobs to keep the indexes up to date. If you have, can you please add that information to the documentation? We would much appreciate it! --Gri6507 12:36, 18 October 2007 (UTC)


 * I'm still stuck and couldn't get much progress. Apparently the line
 * $sql = "SELECT old_text FROM ".$db->tableName('text')." WHERE old_id=".$docinfo['attrs']['old_id']; ends up with the value of $sql being Select old_text from 'text' where old_id=


 * I'm not sure why in the first place text is in single quotes (looks like some bug to me) and why the old_id is not getting picked up. Searchd does show the hit coming to it, but it could failing because i'm using a hacked version of the API


 * Also to answer Gri6507's question, i'm using 0.9.6 rc1 because that's the one that has the windows binaries. I don't have Visual Studio or VC++ to compile from the source code, so even my step #2 (using latest version and compiling) is at hold.


 * Adding the Windows cron job shouldn't be too tough (my guess); but i'll try it and let you know -- ALl the above posts bought to you by the guy who had the so useful signature Help :


 * According to Sphinx's website, http://www.sphinxsearch.com/downloads/sphinx-0.9.7-win32-release.zip is a windows release of 0.9.7. Is there any reason you are not using it? --Gri6507 11:42, 19 October 2007 (UTC)


 * Doesn't seem to contain the sphinxapi.php -- that's the reason why i had to choose an older version; this should probably be posted on that developer's website saying the API is missing from the 0.9.7 windows release, but i'm too lazy...any helpers? :)


 * I have updated Step #1 and Step #7 with details of how to obtain the sphinxapi.php for Windows. It seems that the intent of the Win32 release binaries package is to only contain the binary EXEs. The PHP files are in either the source code or the API packages. --Gri6507 11:34, 25 October 2007 (UTC)

fails because trying to search non-existent table Zebee Johnstone 01:48, 26 September 2007 (UTC)
Set up as described, but the table names in the version I have, MediaWiki 1.11 are: +--+ +--+
 * Tables_in_wikidb |
 * mw_archive      |
 * mw_blobs        |
 * mw_brokenlinks  |
 * mw_categorylinks |
 * mw_cur          |
 * mw_hitcounter   |
 * mw_image        |
 * mw_imagelinks   |
 * mw_interwiki    |
 * mw_ipblocks     |
 * mw_links        |
 * mw_linkscc      |
 * mw_logging      |
 * mw_math         |
 * mw_objectcache  |
 * mw_old          |
 * mw_oldimage     |
 * mw_querycache   |
 * mw_recentchanges |
 * mw_searchindex  |
 * mw_site_stats   |
 * mw_user         |
 * mw_user_newtalk |
 * mw_user_rights  |
 * mw_validate     |
 * mw_watchlist    |

so it isn't finding "wiki-page" ERROR: sql_query: Table 'wikidb.wiki_page' doesn't exist (DSN=mysql://wikiuser:***@localhost:3306/wikidb).

So which of my tables are the equivalents you are searching?


 * I believe you are getting this error when trying to run the Sphinx indexer or search tools? If that's the case then you missed a step in the instructions. In step 2, when you create the sphinx.conf file, you need to make sure to replace all instances of wiki_ with whatever your table prefix is (in your case, it looks like mw_). That's what I was trying to explain the the text after the sphinx.conf content listing. Please let me know if I should rephrase that paragraph. --Gri6507 12:02, 26 September 2007 (UTC)

Cleanup required
The step2 is confusing because Sphinx already comes with a config file -- it needs to be made more clear

Steps 1 to Step 9 is also not very proper. Decent headings would be nice.


 * I made the steps 2 and 9 a little more clear. Let us know if you have some additional suggestions. Svemir Brkic 12:39, 19 October 2007 (UTC)

Failed at step 7
Everything worked fine up to step 6

Having require_once( "$IP/extensions/SphinxSearch.php" ); in the localsettings.php brings up the following messages (PHP errLogFile)

PHP Warning: Call-time pass-by-reference has been deprecated - argument passed by value; If you would like to pass it by reference, modify the declaration of [runtime function name]. If you would like to enable call-time pass-by-reference, you can set allow_call_time_pass_reference to true in your INI file. However, future versions may not support this any longer. in E:\- Daten -\- MyWebSite -\HydroWiki\extensions\SphinxSearch.php on line 88

and

PHP Fatal error: Cannot redeclare class UnlistedSpecialPage in  wikiroot\includes\SpecialPage.php on line 703

Can you furthermore comment on $wgSphinxSearch_index = "wiki"; Has it to be replaced by the sphinx.conf file or by one of the generated indexes?


 * The original code used a pass-by-reference in one place. This been fixed in recent versions. $wgSphinxSearch_index needs to have the name of one of the generated indexes, as specified by a line such as "index wiki {" in the sphinx.conf file. Svemir Brkic 12:32, 5 October 2007 (UTC)

Alternative version
Modified version of this extension is currently in use at the New World Encyclopedia. Changes are explained at the above link and all the source files are linked from there too. Feel free to comment/use any of the code and let me know if I am not attributing somebody properly. Thanks. Svemir Brkic 03:35, 5 October 2007 (UTC)

Version 0.3 of our modified extension is available at the above link. We have further changed the main sphinx.conf query to use page_id as the primary document key. This avoids duplicate results when incremental indexing is used. Svemir Brkic 01:22, 8 October 2007 (UTC)
 * This version has merged with this extension starting with v0.3. --Gri6507 21:58, 12 October 2007 (UTC)


 * Honestly, I think this is not an alternate version, but rather an example of the search in use. I went to the website some time ago and it didn't give me details of what is modified etc. So i don't really know whether it is modified in the first place


 * It was and alternate version on October 5, when the comment was written. At that time the page also described what changes were made etc. Since then, those changes were merged into the official version as pointed out on the line just above your comment. Svemir Brkic 13:11, 25 October 2007 (UTC)

Apache
Any expectation that Sphinx will come out with a version to work under Apache? Svanslyck 22:42, 29 October 2007 (UTC)


 * I am not aware of any problems with Apache. It certainly works for me (on Apache 2.) There is nothing in the extension code that would make it Apache-specific, and Sphinx search engine itself has no dependency on the web server you use. They do provide an add-on for MySQL that lets you use Sphinx via database queries, but that does not have much to do with Apache either. Perhaps you meant something else? 75.75.36.158 23:35, 29 October 2007 (UTC)


 * I too am running under Apache 2.x without any problems. What kind of issues are you having? --Gri6507 23:55, 29 October 2007 (UTC)

Directory Structure
How about making it so that Sphinx is in its own subdirectory in the extension directory so that things will be cleaner? (I have a lot of extensions and each has their own directory) Also could the sphinxapi.php be also included in the SphinxSearch tar so that its one stop shopping? Although this could just be implemented in a script if you guys go that way. --SellFone 21:22, 30 October 2007 (UTC)


 * This is an idea I have toyed around with for some time now. As SphinxSearch has grown to include more and more files, the appeal of having it reside in its own directory has increased. I am currently working on the automatic installation & configuration script for the extension. Perhaps I will make it install the extension in its own directory. Thanks for the suggestion! --Gri6507 23:15, 30 October 2007 (UTC)

Windows --rotate workaround?
I want Sphinx to update our index very often. If I could, I would love for the index to be incrementally updated every time the db changes. Barring that, I'd install a task to run every 15 minutes or so. However, no matter how often I update the index, I need to take down the Sphinx daemon to do it (limitation on Windows). Can anyone suggest a workaround or modification to the code such that a search request, when the daemon isn't running, waits for it to respond and re-searches? I don't so much care about restarting the daemon. I do care about search appearing broken while the daemon is down.
 * I am not sure if it is going to work, but here's what I'd try. Open the sphinxapi.php file. In function _Connect , around line 136, there is a call to


 * change that to


 * where the 30 is the timeout in seconds for establishing the connection. The basic idea is that if searchd is not running, no one will be listening on the other end of the socket until searchd comes back to life. This change should block MW from dieing during that brief period of time. Of course, it would be up to you to make sure that
 * before running the indexer, you must stop searchd
 * after running the indexer, you must restart searchd
 * Let me know if that works :-) --Gri6507 22:22, 3 November 2007 (UTC)


 * That looks promising. I'll give it a try.  Since my post, I installed the search on a separate machine (Linux) and that works pretty good.  But, this procedure may be what I need to reduce the number of servers in the equation.  I'll come back with results. --Cedarrapidsboy 14:25, 5 November 2007 (UTC)
 * UPDATE - the above code change didn't appear to have an effect. The search still timed-out to a blank page.
 * Stop searchd
 * Issue search request
 * Start searchd (within 30 sec)
 * --205.175.225.24 16:30, 5 November 2007 (UTC)


 * Ok. I think I found the issue. According to PHP documentation, the fsockopen function may not honor the timeout ("Note: Depending on the environment, the Unix domain or the optional connect timeout may not be available."). So, to work around that, change the following code in sphinxapi.php


 * to


 * This way, you can set the waiting period yourself via the use of $connect_timeout variable. I tested this on my machine and it seems to work as expected. Please post your results when you try it out. --Gri6507 23:05, 5 November 2007 (UTC)
 * Unfortunately, same result. Blank page.  I did the following:
 * Kill searchd
 * Issue search request
 * Start searchd
 * In this case, searchd was still running on a separate machine.
 * --Cedarrapidsboy 20:20, 6 November 2007 (UTC)
 * UPDATE!
 * Here's a change to the code that works:


 * I added an additional timeout. Without it, a single connection was waiting for 30 seconds, just as long as the entire loop.  The previous code never tried the connection again.  This code *did* work for me using the testing steps above.  --Cedarrapidsboy 20:30, 6 November 2007 (UTC)


 * Glad to see that it's working for you! I will submit this as an improvement suggestion to the developers of Sphinx. --Gri6507 20:41, 6 November 2007 (UTC)

Running on separate machines
The installation directions seem to assume that MediaWiki and sphinxd are on the same machine. How should I configure Sphinx and the extension if MediaWiki (and its database) are hosted separately from where Sphinx is installed? --Emufarmers 04:43, 5 November 2007 (UTC)
 * You are correct. I should update the main page with these instructions. To make this extension work in your case you will need to configure sphinx.conf. In that file, modify the src_wiki_main section to specify the correct sql_host = hostname, where hostname is the name of the machine running the MySQL database for your wiki (default is localhost). This way, you can install Sphinx on the same machine as the web server (as opposed to the MySQL server), and all instructions as listed on the main page are still valid.
 * Please let me know if you have any more questions, or, for that matter, if this worked for you. --Gri6507 13:05, 5 November 2007 (UTC)
 * Hi, sorry for the delay. I should have been more clear: My wiki is on shared hosting, so I can't install Sphinx on its server.  I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home.  With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver.  It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)
 * Ok, I understand your setup. Sphinx.conf file can be configured to make sphinxd run on a different machine (let's call it Machine S, for sphinx) from the machine running MySQL (let's call it Machine M, for mysql). However, in that case Machine S has to have netword access to Machine M. My guess is that something similar to your present setup with SSH tunnels could be done here as well. If you are interested in trying this out, please let me know via email (see extension credits) and we could work through these questions then. --Gri6507 22:17, 11 November 2007 (UTC)

Sphinx Search Terms Limit
It seems that the Sphinx search only accepts 10 search terms. Perhaps it is the same story for the built-in MW search? Any way to change that? Perhaps make it unlimited? Cedarrapidsboy 13:50, 5 November 2007 (UTC)


 * I did not look at the code yet, but the limit seems to happen only in the sense of number of separate words and counts displayed on top of the search results. That lists only up to 10 words, but the eleventh word I used was also used to filter (and rank) the results. Svemir Brkic 14:31, 5 November 2007 (UTC)


 * Ah... I can confirm that.  I tested it, but the 11th and above search terms were not highlighted red, so didn't think they were included in the results.  I'd still be interested in getting all search terms highlighted.  Thanks for the reply!  --Cedarrapidsboy 14:56, 5 November 2007 (UTC)