Extension talk:SphinxSearch
From MediaWiki.org
Old discussion points relevant only to older versions of the extension, Sphinx, or MediaWiki have been moved to the archive page.
[edit] Running on separate machines
The installation directions seem to assume that MediaWiki and sphinxd are on the same machine. How should I configure Sphinx and the extension if MediaWiki (and its database) are hosted separately from where Sphinx is installed? --Emufarmers 04:43, 5 November 2007 (UTC)
- You are correct. I should update the main page with these instructions. To make this extension work in your case you will need to configure sphinx.conf. In that file, modify the src_wiki_main section to specify the correct sql_host = hostname, where hostname is the name of the machine running the MySQL database for your wiki (default is localhost). This way, you can install Sphinx on the same machine as the web server (as opposed to the MySQL server), and all instructions as listed on the main page are still valid.
- Please let me know if you have any more questions, or, for that matter, if this worked for you. --Gri6507 13:05, 5 November 2007 (UTC)
- Hi, sorry for the delay. I should have been more clear: My wiki is on shared hosting, so I can't install Sphinx on its server. I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home. With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver. It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)
- Ok, I understand your setup. Sphinx.conf file can be configured to make sphinxd run on a different machine (let's call it Machine S, for sphinx) from the machine running MySQL (let's call it Machine M, for mysql). However, in that case Machine S has to have netword access to Machine M. My guess is that something similar to your present setup with SSH tunnels could be done here as well. If you are interested in trying this out, please let me know via email (see extension credits) and we could work through these questions then. --Gri6507 22:17, 11 November 2007 (UTC)
- Hi, sorry for the delay. I should have been more clear: My wiki is on shared hosting, so I can't install Sphinx on its server. I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home. With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver. It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)
[edit] Showing X of Y documents in search results, but document links not showing
- "Displaying 0-0 of 0 matches for query specifi* retrieved in 0.000 sec with following stats: specifi* found 8 times in 8 documents".
-
- Your query indicates that you are using a "star" search, which is not enabled by default in sphinx (it is a new, almost undocumented feature.) Our default sphinx.conf does not enable it either. Maybe you are using a different config file for command line search? To enable it in the config file used by the extension, you need to add this to the main index section:
min_infix_len = 1 enable_star = 1
-
- After this, you need to do a full reindex and restart the searchd (--rotate alone is not enough when you change the config file.) Svemir Brkic 20:57, 2 February 2008 (UTC)
[edit] Problems with search within page title / SOLVED
Hi, I installed the sphinxsearch on ubuntu and it is working very good. Thx for this extension! I had some problems with title search. Originally, the suggested query (default value in the installation package) was:
#sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text \
# FROM mw_page, mw_revision, mw_text \
# WHERE rev_id=page_latest AND old_id=rev_text_id
The page_title content in the database looks like 'HOW_TO_edit_homepage'. There are underlines between each character sequence (mediawiki replaces blanks through underlines when new page is created). I changed the query (for initial and incremental index):
initial index:
sql_query = SELECT page_id, replace(page_title,'_',' ') as page_title, page_namespace, old_id, old_text \
FROM mw_page, mw_revision, mw_text \
WHERE rev_id=page_latest AND old_id=rev_text_id
incremental updates:
SELECT page_id, replace(page_title, '_',' ') as page_title, page_namespace, old_id, old_text FROM mw_page, mw_revision, mw_text WHERE rev_id=page_latest AND old_id=rev_text_id AND page_touched>=DATE_FORMAT(CURDATE(), '%Y%m%d070000')
Now it works. --Wikigeil 17:15, 21 January 2008 (UTC)
- I do not think this is necessary. Perhaps there was another issue with your setup and it got resolved while you were changing the queries. If you look at this line in the suggested spihinx.conf file:
charset_table = 0..9, A..Z->a..z, _-> , a..z, \
- Among other things, this instructs Sphinx to consider an underscore the same as a space. Perhaps you should try again with the original queries, as they would probably work faster. On the other hand, I might be misunderstanding what were you trying to fix, so please let me know if that is the case. Svemir Brkic 13:05, 2 February 2008 (UTC)
-
- Hi Svemir, thx for response. The charset_table property for main index is:
# charset definition and case folding rules "table" charset_table = 0..9, A..Z->a..z, _-> , a..z, \ U+C0->a, U+C1->a, U+C2->a, U+C3->a, U+C4->a, U+C5->a, U+C6->a, \ U+C7->c,U+E7->c, U+C8->e, U+C9->e, U+CA->e, U+CB->e, U+CC->i, \ U+CD->i, U+CE->i, U+CF->i, U+D0->d, U+D1->n, U+D2->o, U+D3->o, \ U+D4->o, U+D5->o, U+D6->o, U+D8->o, U+D9->u, U+DA->u, U+DB->u, \ U+DC->u, U+DD->y, U+DE->t, U+DF->s, \ U+E0->a, U+E1->a, U+E2->a, U+E3->a, U+E4->a, U+E5->a, U+E6->a, \ U+E7->c,U+E7->c, U+E8->e, U+E9->e, U+EA->e, U+EB->e, U+EC->i, \ U+ED->i, U+EE->i, U+EF->i, U+F0->d, U+F1->n, U+F2->o, U+F3->o, \ U+F4->o, U+F5->o, U+F6->o, U+F8->o, U+F9->u, U+FA->u, U+FB->u, \ U+FC->u, U+FD->y, U+FE->t, U+FF->s,
-
- It contains underscore too. --Wikigeil 17:34, 4 February 2008 (UTC)
-
-
- Yes, that is why I think the replace function in the query is not necessary. Everything works fine without it, with the original suggested queries. Svemir Brkic 04:21, 6 February 2008 (UTC)
-
-
-
-
- Ugh, I guess I was too convinced that searches worked correctly in our case - or that they could not be much better. Having the underscore in the charset table makes it a regular character, so titles do not get indexed properly. If "_-> ," is removed, titles get indexed correctly and replace function is not needed anymore. Of course, if you do want to index words with underscores (maybe if your wiki contains lots of code examples with underscores in function names?) you should replace "_-> ," with just "_," and still use the replace function in the query. Svemir Brkic 15:19, 8 May 2008 (UTC)
-
-
[edit] Searching multiple wikis
I currently have 3 wikis indexed with Sphinx. The search works well, but it is returning results for all of them. I've set the "$wgSphinxSearch_index = X"; line in the SphinxSearch.php. Am I missing something? --N0ctrnl 20:28, 24 March 2008 (UTC)
- The default setup assumes there will be one main index and one incremental index. Searches are performed with an "*" to indicate all available indexes ($wgSphinxSearch_index indicates which is the main one.) With the latest versions on spinx API it is possible to do this in a better way, but for now perhaps you can work around it by running separate sphinx client for each wiki - each on its own port. Svemir Brkic 02:14, 26 March 2008 (UTC)
-
- I kinda suspected that'd be the answer. Not what I'd hoped, but I suppose it's not all that bad. Thanks very much for the reply. --N0ctrnl 13:02, 26 March 2008 (UTC)
[edit] Installation issues
[edit] Problem 1
Query failed: connection to localhost:3312 failed
I can search from the command line but it does not seem to work from the site. Any idea what i am missing??
Solution:
used 3306, i think it's ubuntu's default
make sure to run /usr/local/bin/searchd and it works! also 0.0.0.0 is used as should be by default (but is not) --24 March 2008
[edit] Problem 2
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 775173422 bytes) in .../extensions/SphinxSearch/sphinxapi.php on line 311
Solution: This problem was cause by the solution to problem 2, that is using port 3306 which is used by mysql already...
see http://sphinxsearch.com/forum/view.html?id=1178 --24 March 2008
[edit] Namespaces for weight
Is it possible to set a weight to different namespace that way the checkboxes can be removed all together... --25 March 2008
- This is possible and I will look into it for the next release. It will not replace the need for checkboxes, as in some cases you want the users to specify which namespaces they are interested in. Svemir Brkic 02:17, 26 March 2008 (UTC)
- Amazing! --207.96.208.130 21:24, 26 March 2008 (UTC)
[edit] match all words as default
not sure if this is something that is controlled by Sphinx or not, but it would be better for me if "match all words" was the default. thx! --27 March 2008
matching mode set to:
SPH_MATCH_ALL
by default it is SPH_MATCH_EXTENDED, but does not work properly as http://www.sphinxsearch.com/doc.html#extended-syntax explains --1 April 2008
- The point of having the SphinxSearch.php file separately is so that you can set things such as $wgSphinxSearch_mode for yourself. The reason we use SPH_MATCH_EXTENDED is to be able to modify the query internally and do "match all" vs. "match any" with a radio box instead of teaching users the Sphinx query syntax. Svemir Brkic 03:08, 6 April 2008 (UTC)
[edit] @page_title DOES NOT WORK
for some reason @old_text works but @page_title never returns any results... maybe it has something to do with the new releases of sphinx. --207.96.208.130 22:22, 1 April 2008 (UTC)
- This syntax is only mentioned in a TODO list on the main page of this extension. We do not pass on everything you enter to sphinx. We generally try to create web interface to specify advanced options. Svemir Brkic 02:07, 6 April 2008 (UTC)
-
- Until we provide user-friendly interface for @page_title searches, make sure to select "match all words" when submitting your search. You can make that the default if you download version 0.6beta3 and uncomment the line in SphinxSearxh.php that sets $wgSphinxMatchAll to '1'. Svemir Brkic 03:10, 10 May 2008 (UTC)
[edit] Can someone show how to edit sphinx.conf correctly
I am having issues with my sphinx.conf. I have it mostly working, however when I attempt to index my wiki I get an error regarding my sql_query_pre = wiki_ entry in the sphinx.conf: "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'wiki_' at line 1". wiki_ is a standard MediaWiki table prefix so I am unsure why it throws an error. I found the other presets here. --137.71.23.54 5 August 2008
# pre-query, executed before the main fetch query
sql_query_pre = SET NAMES utf8
-
- Now, I don't know what's in your sphinx.conf, but based on mine, sql_query_pre is not the place to put your database prefix. Take another look at step 2; you have to go through sphinx.conf and tack your prefix onto the table names (but not the row names!). It's annoying, but I eventually figured out that there are annoyances all over the place with MediaWiki if you use a prefix. My advice is to bite the bullet now and get rid of the prefix now if it's an option. —Emufarmers(T|C) 03:51, 6 August 2008 (UTC)
[edit] Is there any way to prevent Sphinx from indexing particular pages?
I realize this runs counter to what most people would want, but some pages don't need to be indexed. I've made some reasonable searches here and on the Sphinx site, and believe this is more relevant to a MediaWiki discussion than Sphinx in general. Jon Doran, 9 May 2008
- You could modify the query in sphinx.conf to filter out any pages you do not want. It could be done based on namespace, a join with some other table (e.g. categorylinks,) or some new field or table you would create yourself. Svemir Brkic 01:23, 10 May 2008 (UTC)
-
- Thanks for the suggestions. I did not consider the query, but now that you mention it, there is a lot I can do with it. Jon Doran, 10 May 2008
[edit] How to search Chinese
I'm chinese wiki's admin, and I have configed sphinx.conf follow all the steps. Now I can search English words correctly but if I just search only chinese words like "注册", I can get nothing? That means i must search mixed words like "sbc 注册",so it can give my right results. Who can help me? --Fzy 163 01:21, 15 May 2008 (UTC)
- Initial term check was too strict and it would not let Unicode-only strings through. I have changed it in the CVS, but you can see below how to fix it in the version you are currently using. Svemir Brkic 13:14, 16 June 2008 (UTC)
-
- thanks a lot, it get works ^_^ --23 June 2008
[edit] More Windows Install Issues
Please excuse my ignorance, how do we poor windows users perform steps 5 and 6?
Steve Goble18:34, 14 May 2008 (UTC)
Anyone?????
- Answer is:
- Step 5 =
C:\path\to\searchd.exe --install --config C:\path\to\sphinx.conf
- --30 May 2008
- I tried using this option by adding to startup group, made a service out of it but it didn't work for me.
- I recommend using the scheduler, create a batch file and setup to run each time windows boots up. instead of install( given above) use
path/to/sphinx/installation/searchd --config /path/to/sphinx.conf
-
- --16 July 2008
- Step 6 = Windows Task Scheduler --30 May 2008
[edit] Having trouble with the Sort By command
I'm trying to use the Sort By command so that I can see the latest postings on our wiki in the search , but I'm not sure of the syntax - what should it be? $wgSphinxSearch_sortby = "SPH_SORT_TIME_SEGMENTS, 'rev_timestamp'"; doesn't seem to work - I added rev_timestamp to the SQL query for the indexing, but no luck - can someone help me? --6 June 2008
- The way the code currently works, it always uses SPH_SORT_EXTENDED as the sort mode, and only uses $wgSphinxSearch_sortby as a second argument in the SetSortMode call. I will make this more flexible, but until then you can edit SphinxSearch_body.php and find this line:
$cl->SetSortMode(SPH_SORT_EXTENDED, $wgSphinxSearch_sortby);
- Set your $wgSphinxSearch_sortby to 'rev_timestamp' and change above line to:
$cl->SetSortMode(SPH_SORT_TIME_SEGMENTS, $wgSphinxSearch_sortby);
- Svemir Brkic 02:52, 13 August 2008 (UTC)
[edit] Blank search check
In function «wfSphinxSearch», please, replace line
if (!preg_match('/[\w\d]/', $term)) {
to line
if (!preg_match('/[\w\pL\d]/u', $term)) {
Because first variant ignore non-latin unicode quieries (for example russian search terms). --StasFomin 16:25, 11 June 2008 (UTC)
- I could not get this to work on my installation for some reason - \pL would not match any random Russian word I copy-pasted. Internally, PHP saw those words as \x... sequences, but they just would not match - maybe they were not fully valid UTF-8. However, I am not sure why we would go to such great lengths here anyway. I have changed it to:
if (trim($term) === '') {
- and it works fine now. I committed it to CVS and will make it a part of the next release. Thanks for pointing this out. Svemir Brkic 13:04, 16 June 2008 (UTC)
[edit] Special:Search not recognized
I'm running SphinxSearch 0.6 with MediaWiki 1.12, and Special:SphinxSearch works great. However, when I set $wgDisableInternalSearch = true; Special:Search yields a No such special page error. —Emufarmers(T|C) 11:22, 12 June 2008 (UTC)
$wgDisableSearchUpdate = true;
$wgSearchType = 'SphinxSearch';
- Are your above lines followed by this:
if ( !function_exists( 'extAddSpecialPage' ) ) {
# Download from http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/ExtensionFunctions.php
require_once( dirname(dirname(__FILE__)) . '/ExtensionFunctions.php' );
}
extAddSpecialPage( dirname(__FILE__) . '/SphinxSearch_body.php', ($wgDisableInternalSearch ? 'Search' : 'SphinxSearch'), 'SphinxSearch' );
- Svemir Brkic 12:24, 16 June 2008 (UTC)
-
- Er, no, I hadn't seen that code before; I had ExtensionFunctions installed, which I assumed was sufficient. Adding those lines (and adjusting the paths) does seem to make things work, but I'm a bit confused about why they're necessary. —Emufarmers(T|C) 22:29, 16 June 2008 (UTC)
-
-
- require line is necessary to make sure extAddSpecialPage function is available. It is a function inside ExtensionFunctions.php file. Basically, it provides a backwards-compatible way of adding a sepcial page. The conditional in extAddSpecialPage call specifies whether Sphinx replaces the default Search special page or is used as a stand-alone search page. Svemir Brkic 01:05, 17 June 2008 (UTC)
-
[edit] Question when searching for IP's
We use the Wiki here in an IT setting so many of our articles refer IP addresses. The default search does not find any variation of IPs when searched (for example 102., 102.160.2.2, 106..etc.) Can anyone tell me if this search does a better job with this? Thanks. --Comalia 19:37, 15 July 2008 (UTC)
- It would certainly do a better job than MySQL full-text index - even in default configuration. You could also tweak it further, but I am not sure I fully understand what exactly you need. If you provide a some specific examples of data and search strings that should match it, I can test it. Svemir Brkic 22:45, 15 July 2008 (UTC)
Sure. Say that I have a few articles that have the line of text 192.165.1.0 in them. So, if searching for 192.165.1.0, would it return any results? Or variations of it, such as "192.165"? --Comalia 13:41, 18 July 2008 (UTC)
- Yes, both searches will match that article. It will consider 192, 165, 1, and 0 as separate "words". You can tell it whether to search for all those words or any of them (it is an option on the search page, but you can also change the default.) Since proximity of the matched words is an important factor, you will get the articles that have entire IP in them first. Svemir Brkic 16:46, 18 July 2008 (UTC)
[edit] Installing issues
I am trying to install SphinxSearch 0.9.8 on Linux RHEL with mySQL. I did the ./configure and everything seemed fine. Then when build the binaries with make, I get the follwing:
sphinx.h:54:19: error: mysql.h: No such file or directory
--Comalia 19:50, 22 July 2008 (UTC)
- SOLUTION
- I had a similar issue on FC9. I did "yum install mysql-devel" and that fixed it. Try installing the mysql-devel version for your mysql install and then building sphinx. --5 August 2008
[edit] Database error when running on MediaWiki 1.13.0
I get the following error:
A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was:
(SQL query hidden)
from within function "SphinxSearch::wfSphinxSearch". MySQL returned error "1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near at line 1 (localhost)".
Searching from the command line DOES work, however. It only fails when I use the extension from within my Wiki.
Any ideas on how to fix this appreciated! --Zerbey 19:37, 11 September 2008 (UTC)
To display the hidden SQL statement place $wgShowSQLErrors = true; in your LocalSettings.php file. When you try the search again it will display something like:
SELECT old_text FROM `wikitext` WHERE old_id=
And this is a result of what is identified by Svemir Brkic in the $sql statement below found in the SphinxSearch_body.php file. warens 23:00, 18 November 2008 (UTC)
- The only query in that method is generated with this line:
$sql = "SELECT old_text FROM ".$db->tableName('text')." WHERE old_id=".$docinfo['attrs']['old_id'];
- Find the line and error_log or echo the query to see what might be wrong with it. Svemir Brkic 17:07, 12 September 2008 (UTC)
I have the same error in MW 1.11.0, if i enter a search string, which appears in the wiki ( like foo ). But when i enter something like dhfbvjhsfdbvjhsb, sphinx says, that 0 results ar found. On the commandline the search works very well an fast. --212.87.151.131 15:29, 16 September 2008 (UTC)
- Well here's the error:
[Wed Oct 08 10:16:19 2008] [error] [client 10.20.60.26] PHP Notice: Undefined variable: wgSphinxSuggestMode in /var/www/htdocs/wiki/extensions/SphinxSearch/SphinxSearch.php on line 85, referer: http://wiki.galaxy.invalid/wiki/index.php/Special:SphinxSearch [Wed Oct 08 10:16:19 2008] [error] [client 10.20.60.26] PHP Notice: Undefined variable: wgSphinxSuggestMode in /var/www/htdocs/wiki/extensions/SphinxSearch/SphinxSearch.php on line 89, referer: http://wiki.galaxy.invalid/wiki/index.php/Special:SphinxSearch [Wed Oct 08 10:16:20 2008] [error] [client 10.20.60.26] PHP Notice: Undefined index: old_id in /var/www/htdocs/wiki/extensions/SphinxSearch/SphinxSearch_body.php on line 347, referer: http://wiki.galaxy.invalid/wiki/index.php/Special:SphinxSearch
- Not sure were to go from here. Zerbey 14:20, 8 October 2008 (UTC)
[edit] Answer
The first two notices can be ignored, but I have fixed them in the CVS anyway. To fix them in your code, find this line in SphinxSearch.php:
#$wgSphinxSuggestMode = true;
Change it to:
$wgSphinxSuggestMode = false;
Unless, of course, you want to turn the suggestions on. The third notice indicates that something is wrong with your sphinx.conf file, or with the version of sphinx you are using. Recent versions of sphinx require this to be set in sphinx.conf:
sql_attr_uint = old_id
Older versions used sql_group_column instead. If your sphinx.conf has one of these, try the other one (or try upgrading sphinx itself.) Svemir Brkic 14:54, 8 October 2008 (UTC)
- I have the exact same problem, i have checked my sphinx.conf file and it already has the sql_attr_uint = old_id line already, replacing it doesnt seem to help either. Im using Sphinx 0.9.8.1. Would there be any other ways to correct this problem? --213.122.168.100 15:08, 27 November 2008 (UTC)
[edit] init.d script for FC users
Here is a chkconfig compatible script I created for FC users. It is a modification on a script by Vladimir Fedorkov. This Script assumes you've put the pid file (config in sphinx.conf) in /var/run for selinux purposes. Speaking of selinux, you'll need to add port 3312 to the http port context.
[edit] Keyword Priority in Query String
It seems that the order of keywords actually changes the results. In my case, if I send a space delimited list of keywords, I get different search results depending on the position of my most important keywords. Am I missing a setting that prioritizes keywords based on there position in the query string? Cedarrapidsboy 13:45, 26 September 2008 (UTC)
- Yes, order of keywords matters, as well as the order and proximity of the matches in searched text. That is the function of the Sphinx itself, but you can affect it by changing the matching mode in SphinxSearch.php. Svemir Brkic 17:01, 26 September 2008 (UTC)
-
- Thanks. Is the order of keywords documented? I mean to say, where can I find information on where to put my most important words, and then my least important words? I understand the matching modes, but haven't read any mention of keyword position (with no operators joining them) affecting priority. I'm sure I'm likely blind. 12.207.221.230 00:23, 27 September 2008 (UTC)
[edit] getActionURL did not preserve query terms
getActionURL did not preserve query terms when clicking on «page URL». So query like "Foo+Bar" on page 2 transforms to query "Foo Bar". You have to replace (at function getActionURL) line
$qry = $kiaction . "?$searchField={$term}&fulltext=".wfMsg('sphinxSearchButton')."&";
to something like this
$sterm=urlencode($term);
$qry = $kiaction . "?$searchField={$sterm}&fulltext=".wfMsg('sphinxSearchButton')."&";
--StasFomin 12:50, 10 November 2008 (UTC)
- Thanks! I guess nobody reported this until now because in some browsers it works anyway. Fixed and committed to CVS. Svemir Brkic 00:25, 11 November 2008 (UTC)
[edit] Feature Request: Excluding Selected Categories from search
Will be useful to filter results not only by pointing to desired categories, but also by setting undesired categories
$cl->SetFilter('category', $categories_to_exclude, true);
The Search Form will be like this:
Include Exclude
Category1 [x] [ ]
Category2 [ ] [ ]
...
Category7 [ ] [x]
Category8 [ ] [ ]
--StasFomin 14:59, 10 November 2008 (UTC)
- Thanks for the suggestion. I will try adding this in the next release. Svemir Brkic 00:17, 11 November 2008 (UTC)
[edit] Version 0.6.1 released
Version 0.6.1 can now be downloaded from SourceForge. See the main page for details. Svemir Brkic 15:34, 11 November 2008 (UTC)

