Extension talk:SphinxSearch
From MediaWiki.org
Old discussion points relevant only to older versions of the extension, Sphinx, or MediaWiki have been moved to the archive page.
[edit] Working well in MW1.5!
Everyone I have spoken to that uses our internal Wiki has nothing but positives to say about this. I really think MediaWiki should adopt Sphinx as the DEFAULT search, as the bundled one is so bad. --195.75.83.25 08:45, 6 July 2009 (UTC)
[edit] Abysmal setup/configuration instructions
One moment it's talking about win32 builds, then next rc.local on crontabs. This REALLY needs seperating out into Unix and Windows setup and configuring.
[edit] Grouping results by Namespace
I have the extension running but since it doesn't support the weighting of results, is there a way to group the results by the namespace they belong to? -- 20:48, 14 April 2009 (UTC)
Not without changing the code yourself. By weighting you mean to give each namespace a different weight? Svemir Brkic 17:03, 12 May 2009 (UTC)
- That's right, weighing each namespace differently. 19:13, 20 May 2009 (UTC)
[edit] Running on separate machines
The installation directions seem to assume that MediaWiki and sphinxd are on the same machine. How should I configure Sphinx and the extension if MediaWiki (and its database) are hosted separately from where Sphinx is installed? --Emufarmers 04:43, 5 November 2007 (UTC)
- You are correct. I should update the main page with these instructions. To make this extension work in your case you will need to configure sphinx.conf. In that file, modify the src_wiki_main section to specify the correct sql_host = hostname, where hostname is the name of the machine running the MySQL database for your wiki (default is localhost). This way, you can install Sphinx on the same machine as the web server (as opposed to the MySQL server), and all instructions as listed on the main page are still valid.
- Please let me know if you have any more questions, or, for that matter, if this worked for you. --Gri6507 13:05, 5 November 2007 (UTC)
- Hi, sorry for the delay. I should have been more clear: My wiki is on shared hosting, so I can't install Sphinx on its server. I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home. With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver. It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)
- Ok, I understand your setup. Sphinx.conf file can be configured to make sphinxd run on a different machine (let's call it Machine S, for sphinx) from the machine running MySQL (let's call it Machine M, for mysql). However, in that case Machine S has to have netword access to Machine M. My guess is that something similar to your present setup with SSH tunnels could be done here as well. If you are interested in trying this out, please let me know via email (see extension credits) and we could work through these questions then. --Gri6507 22:17, 11 November 2007 (UTC)
- Hi, sorry for the delay. I should have been more clear: My wiki is on shared hosting, so I can't install Sphinx on its server. I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home. With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver. It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)
I am very interested to know how this configuration has worked for users. I have read many articles that state the problem of "I run a wiki, but it's through a shared hosting and installing the Sphinx daemon is out of the question". This is also my case - so in an effort to get better search capabilities than the standard search, this appears to be one of my only options. A few questions I have: how much traffic is involved when the indexing is performed? Is it a problem to have the daemon going across the wire to access the SQL database for indexing? I have concerns that it will drastically increase my bandwidth usage. Second, I think it would be a great addition to the extension to redirect/use the standard search if the sphinxd cannot be found (if the daemon machine goes offline). Just some ramblings but I am interested in anyone's thoughts - Blac0177 06:04, 12 December 2008 (UTC)
- Daemon does not do the indexing. That is done with a separate process which you run on a schedule. Daemon simply searches the index and returns the results. Search requests are small, and search results depend on the actual data being searched. You could have a replica of your database on the machine that runs the indexer and the daemon - just as it is described above in the Lucene example. You just need to make sure that your web server can communicate on the specified host and port to your sphinx daemon (and that nobody else can, as a security precaution.) Your bandwidth usage will depend on two things - the way you use to replicate the database, and the amount of search queries you get.
- For replication, if it is MySQL and you can turn on binary logs, you can just replay those logs on your local copy. This is in case the database is too big to copy entire thing over for every indexer run. You could also dump just the records modified since the last run, since indexer does not really need your entire database - it only needs those tables that it actually indexes. Svemir Brkic 18:57, 17 January 2009 (UTC)
[edit] Showing X of Y documents in search results, but document links not showing
- "Displaying 0-0 of 0 matches for query specifi* retrieved in 0.000 sec with following stats: specifi* found 8 times in 8 documents".
-
- Your query indicates that you are using a "star" search, which is not enabled by default in sphinx (it is a new, almost undocumented feature.) Our default sphinx.conf does not enable it either. Maybe you are using a different config file for command line search? To enable it in the config file used by the extension, you need to add this to the main index section:
min_infix_len = 1 enable_star = 1
-
- After this, you need to do a full reindex and restart the searchd (--rotate alone is not enough when you change the config file.) Svemir Brkic 20:57, 2 February 2008 (UTC)
[edit] Problems with search within page title / SOLVED
Hi, I installed the sphinxsearch on ubuntu and it is working very good. Thx for this extension! I had some problems with title search. Originally, the suggested query (default value in the installation package) was:
#sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text \
# FROM mw_page, mw_revision, mw_text \
# WHERE rev_id=page_latest AND old_id=rev_text_id
The page_title content in the database looks like 'HOW_TO_edit_homepage'. There are underlines between each character sequence (mediawiki replaces blanks through underlines when new page is created). I changed the query (for initial and incremental index):
initial index:
sql_query = SELECT page_id, replace(page_title,'_',' ') as page_title, page_namespace, old_id, old_text \
FROM mw_page, mw_revision, mw_text \
WHERE rev_id=page_latest AND old_id=rev_text_id
incremental updates:
SELECT page_id, replace(page_title, '_',' ') as page_title, page_namespace, old_id, old_text FROM mw_page, mw_revision, mw_text WHERE rev_id=page_latest AND old_id=rev_text_id AND page_touched>=DATE_FORMAT(CURDATE(), '%Y%m%d070000')
Now it works. --Wikigeil 17:15, 21 January 2008 (UTC)
- I do not think this is necessary. Perhaps there was another issue with your setup and it got resolved while you were changing the queries. If you look at this line in the suggested spihinx.conf file:
charset_table = 0..9, A..Z->a..z, _-> , a..z, \
- Among other things, this instructs Sphinx to consider an underscore the same as a space. Perhaps you should try again with the original queries, as they would probably work faster. On the other hand, I might be misunderstanding what were you trying to fix, so please let me know if that is the case. Svemir Brkic 13:05, 2 February 2008 (UTC)
-
- Hi Svemir, thx for response. The charset_table property for main index is:
# charset definition and case folding rules "table" charset_table = 0..9, A..Z->a..z, _-> , a..z, \ U+C0->a, U+C1->a, U+C2->a, U+C3->a, U+C4->a, U+C5->a, U+C6->a, \ U+C7->c,U+E7->c, U+C8->e, U+C9->e, U+CA->e, U+CB->e, U+CC->i, \ U+CD->i, U+CE->i, U+CF->i, U+D0->d, U+D1->n, U+D2->o, U+D3->o, \ U+D4->o, U+D5->o, U+D6->o, U+D8->o, U+D9->u, U+DA->u, U+DB->u, \ U+DC->u, U+DD->y, U+DE->t, U+DF->s, \ U+E0->a, U+E1->a, U+E2->a, U+E3->a, U+E4->a, U+E5->a, U+E6->a, \ U+E7->c,U+E7->c, U+E8->e, U+E9->e, U+EA->e, U+EB->e, U+EC->i, \ U+ED->i, U+EE->i, U+EF->i, U+F0->d, U+F1->n, U+F2->o, U+F3->o, \ U+F4->o, U+F5->o, U+F6->o, U+F8->o, U+F9->u, U+FA->u, U+FB->u, \ U+FC->u, U+FD->y, U+FE->t, U+FF->s,
-
- It contains underscore too. --Wikigeil 17:34, 4 February 2008 (UTC)
-
-
- Yes, that is why I think the replace function in the query is not necessary. Everything works fine without it, with the original suggested queries. Svemir Brkic 04:21, 6 February 2008 (UTC)
-
-
-
-
- Ugh, I guess I was too convinced that searches worked correctly in our case - or that they could not be much better. Having the underscore in the charset table makes it a regular character, so titles do not get indexed properly. If "_-> ," is removed, titles get indexed correctly and replace function is not needed anymore. Of course, if you do want to index words with underscores (maybe if your wiki contains lots of code examples with underscores in function names?) you should replace "_-> ," with just "_," and still use the replace function in the query. Svemir Brkic 15:19, 8 May 2008 (UTC)
-
-
[edit] Searching multiple wikis
I currently have 3 wikis indexed with Sphinx. The search works well, but it is returning results for all of them. I've set the "$wgSphinxSearch_index = X"; line in the SphinxSearch.php. Am I missing something? --N0ctrnl 20:28, 24 March 2008 (UTC)
- The default setup assumes there will be one main index and one incremental index. Searches are performed with an "*" to indicate all available indexes ($wgSphinxSearch_index indicates which is the main one.) With the latest versions on spinx API it is possible to do this in a better way, but for now perhaps you can work around it by running separate sphinx client for each wiki - each on its own port. Svemir Brkic 02:14, 26 March 2008 (UTC)
-
- I kinda suspected that'd be the answer. Not what I'd hoped, but I suppose it's not all that bad. Thanks very much for the reply. --N0ctrnl 13:02, 26 March 2008 (UTC)
I think this can be achieved. You need to modify the SphinxSearch_body.php to only search only the specified index. Line 294 should change to:
$res = $cl->Query($search_term, $wgSphinxSearch_index);
Make sure the $wgSphinxSearch_index variable is set to the main index in the localsettings or somewhere. Also, assuming you are using the main+delta scheme outlined on main page, you need to roll the two indexes into one. Something like (stripped paths for brevity-haha):
0 9,15,21 * * * indexer --quiet --config sphinx.conf wiki1_incremental wiki2_incremental --rotate && indexer --quiet --config sphinx.conf --merge wiki1_main wiki1_incremental && indexer --quiet --config sphinx.conf --merge wiki2_main wiki2_incremental
From the sphinx docs it says that merging the indexs does take time, but is normally faster than reindexing. --UnwashedMeme 22:41, 22 October 2009 (UTC)
- Playing around with it a bit more: it works better if you sleep for a few seconds(I'm using 5) between running the indexer on the 2 incremental indexes and then running the merge command. Otherwise the .new version of the index isn't there yet when the indexer sends searchd the sighup that causes it to rotate and you don't actually get the updated index until the next time cron rotates the indexes. I don't think that sleep is related to index size; i.e. a large db that takes longer to index would still only need to sleep long enough to give the kernel a chance to get things synced again? UnwashedMeme 19:28, 23 October 2009 (UTC)
[edit] Installation issues
[edit] Problem 1
Query failed: connection to localhost:3312 failed
I can search from the command line but it does not seem to work from the site. Any idea what i am missing??
Solution:
used 3306, i think it's ubuntu's default
make sure to run /usr/local/bin/searchd and it works! also 0.0.0.0 is used as should be by default (but is not) --24 March 2008
--213.220.226.220 15:25, 28 April 2009 (UTC)milan.m.masek@gmail.com
In case there is SELinux enabled on your server, check it´s configuration or disable it.
[root@levi ~]# getenforce [root@levi ~]# setenforce 0
[edit] Problem 2
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 775173422 bytes) in .../extensions/SphinxSearch/sphinxapi.php on line 311
Solution: This problem was cause by the solution to problem 2, that is using port 3306 which is used by mysql already...
see http://sphinxsearch.com/forum/view.html?id=1178 --24 March 2008
[edit] Namespaces for weight
Is it possible to set a weight to different namespace that way the checkboxes can be removed all together... --25 March 2008
- This is possible and I will look into it for the next release. It will not replace the need for checkboxes, as in some cases you want the users to specify which namespaces they are interested in. Svemir Brkic 02:17, 26 March 2008 (UTC)
- Amazing! --207.96.208.130 21:24, 26 March 2008 (UTC)
[edit] match all words as default
not sure if this is something that is controlled by Sphinx or not, but it would be better for me if "match all words" was the default. thx! --27 March 2008
matching mode set to:
SPH_MATCH_ALL
by default it is SPH_MATCH_EXTENDED, but does not work properly as http://www.sphinxsearch.com/doc.html#extended-syntax explains --1 April 2008
- The point of having the SphinxSearch.php file separately is so that you can set things such as $wgSphinxSearch_mode for yourself. The reason we use SPH_MATCH_EXTENDED is to be able to modify the query internally and do "match all" vs. "match any" with a radio box instead of teaching users the Sphinx query syntax. Svemir Brkic 03:08, 6 April 2008 (UTC)
[edit] @page_title DOES NOT WORK
for some reason @old_text works but @page_title never returns any results... maybe it has something to do with the new releases of sphinx. --207.96.208.130 22:22, 1 April 2008 (UTC)
- This syntax is only mentioned in a TODO list on the main page of this extension. We do not pass on everything you enter to sphinx. We generally try to create web interface to specify advanced options. Svemir Brkic 02:07, 6 April 2008 (UTC)
-
- Until we provide user-friendly interface for @page_title searches, make sure to select "match all words" when submitting your search. You can make that the default if you download version 0.6beta3 and uncomment the line in SphinxSearxh.php that sets $wgSphinxMatchAll to '1'. Svemir Brkic 03:10, 10 May 2008 (UTC)
[edit] Can someone show how to edit sphinx.conf correctly
I am having issues with my sphinx.conf. I have it mostly working, however when I attempt to index my wiki I get an error regarding my sql_query_pre = wiki_ entry in the sphinx.conf: "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'wiki_' at line 1". wiki_ is a standard MediaWiki table prefix so I am unsure why it throws an error. I found the other presets here. --137.71.23.54 5 August 2008
# pre-query, executed before the main fetch query
sql_query_pre = SET NAMES utf8
-
- Now, I don't know what's in your sphinx.conf, but based on mine, sql_query_pre is not the place to put your database prefix. Take another look at step 2; you have to go through sphinx.conf and tack your prefix onto the table names (but not the row names!). It's annoying, but I eventually figured out that there are annoyances all over the place with MediaWiki if you use a prefix. My advice is to bite the bullet now and get rid of the prefix now if it's an option. —Emufarmers(T|C) 03:51, 6 August 2008 (UTC)
-
- Add the prefix to the names listed after FROM in all cases, typically three or four. Say your prefix is cm_ you use:
-
-
FROM cm_page, cm_revision, cm_text
- Works fine, no big deal. 76.243.138.215 20:39, 8 July 2009 (UTC)
-
[edit] Is there any way to prevent Sphinx from indexing particular pages?
I realize this runs counter to what most people would want, but some pages don't need to be indexed. I've made some reasonable searches here and on the Sphinx site, and believe this is more relevant to a MediaWiki discussion than Sphinx in general. Jon Doran, 9 May 2008
- You could modify the query in sphinx.conf to filter out any pages you do not want. It could be done based on namespace, a join with some other table (e.g. categorylinks,) or some new field or table you would create yourself. Svemir Brkic 01:23, 10 May 2008 (UTC)
-
- Thanks for the suggestions. I did not consider the query, but now that you mention it, there is a lot I can do with it. Jon Doran, 10 May 2008
[edit] How to search Chinese
I'm chinese wiki's admin, and I have configed sphinx.conf follow all the steps. Now I can search English words correctly but if I just search only chinese words like "注册", I can get nothing? That means i must search mixed words like "sbc 注册",so it can give my right results. Who can help me? --Fzy 163 01:21, 15 May 2008 (UTC)
- Initial term check was too strict and it would not let Unicode-only strings through. I have changed it in the CVS, but you can see below how to fix it in the version you are currently using. Svemir Brkic 13:14, 16 June 2008 (UTC)
-
- thanks a lot, it get works ^_^ --23 June 2008
[edit] More Windows Install Issues
Please excuse my ignorance, how do we poor windows users perform steps 5 and 6?
Steve Goble18:34, 14 May 2008 (UTC)
Anyone?????
- Answer is:
- Step 5 =
C:\path\to\searchd.exe --install --config C:\path\to\sphinx.conf
- --30 May 2008
- I tried using this option by adding to startup group, made a service out of it but it didn't work for me.
- I recommend using the scheduler, create a batch file and setup to run each time windows boots up. instead of install( given above) use
path/to/sphinx/installation/searchd --config /path/to/sphinx.conf
-
- --16 July 2008
- Step 6 = Windows Task Scheduler --30 May 2008
If you get "ERROR: index 'wiki_main': column number 1 has no name." when trying to index, copy libmysql.dll from MySQL 5.0.37 into Sphinx bin directory. For some reason 5.1 version does not work with Sphinx on Windows.
[edit] Having trouble with the Sort By command
I'm trying to use the Sort By command so that I can see the latest postings on our wiki in the search , but I'm not sure of the syntax - what should it be? $wgSphinxSearch_sortby = "SPH_SORT_TIME_SEGMENTS, 'rev_timestamp'"; doesn't seem to work - I added rev_timestamp to the SQL query for the indexing, but no luck - can someone help me? --6 June 2008
- The way the code currently works, it always uses SPH_SORT_EXTENDED as the sort mode, and only uses $wgSphinxSearch_sortby as a second argument in the SetSortMode call. I will make this more flexible, but until then you can edit SphinxSearch_body.php and find this line:
$cl->SetSortMode(SPH_SORT_EXTENDED, $wgSphinxSearch_sortby);
- Set your $wgSphinxSearch_sortby to 'rev_timestamp' and change above line to:
$cl->SetSortMode(SPH_SORT_TIME_SEGMENTS, $wgSphinxSearch_sortby);
- Svemir Brkic 02:52, 13 August 2008 (UTC)
[edit] Blank search check
In function «wfSphinxSearch», please, replace line
if (!preg_match('/[\w\d]/', $term)) {
to line
if (!preg_match('/[\w\pL\d]/u', $term)) {
Because first variant ignore non-latin unicode quieries (for example russian search terms). --StasFomin 16:25, 11 June 2008 (UTC)
- I could not get this to work on my installation for some reason - \pL would not match any random Russian word I copy-pasted. Internally, PHP saw those words as \x... sequences, but they just would not match - maybe they were not fully valid UTF-8. However, I am not sure why we would go to such great lengths here anyway. I have changed it to:
if (trim($term) === '') {
- and it works fine now. I committed it to CVS and will make it a part of the next release. Thanks for pointing this out. Svemir Brkic 13:04, 16 June 2008 (UTC)
[edit] Special:Search not recognized
I'm running SphinxSearch 0.6 with MediaWiki 1.12, and Special:SphinxSearch works great. However, when I set $wgDisableInternalSearch = true; Special:Search yields a No such special page error. —Emufarmers(T|C) 11:22, 12 June 2008 (UTC)
$wgDisableSearchUpdate = true;
$wgSearchType = 'SphinxSearch';
- Are your above lines followed by this:
if ( !function_exists( 'extAddSpecialPage' ) ) {
# Download from http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/ExtensionFunctions.php
require_once( dirname(dirname(__FILE__)) . '/ExtensionFunctions.php' );
}
extAddSpecialPage( dirname(__FILE__) . '/SphinxSearch_body.php', ($wgDisableInternalSearch ? 'Search' : 'SphinxSearch'), 'SphinxSearch' );
- Svemir Brkic 12:24, 16 June 2008 (UTC)
-
- Er, no, I hadn't seen that code before; I had ExtensionFunctions installed, which I assumed was sufficient. Adding those lines (and adjusting the paths) does seem to make things work, but I'm a bit confused about why they're necessary. —Emufarmers(T|C) 22:29, 16 June 2008 (UTC)
-
-
- require line is necessary to make sure extAddSpecialPage function is available. It is a function inside ExtensionFunctions.php file. Basically, it provides a backwards-compatible way of adding a sepcial page. The conditional in extAddSpecialPage call specifies whether Sphinx replaces the default Search special page or is used as a stand-alone search page. Svemir Brkic 01:05, 17 June 2008 (UTC)
-
-
-
-
- Shouldn't this be added to Extension:SphinxSearch#Mode_Of_Operation then? --Patrick Nagel 09:40, 10 April 2009 (UTC)
-
-
-
-
-
-
- I think I know now what happened to Emufarmers and to me as well: we overread that the settings must be adjusted in SphinxSearch.php, and not (as usual with most extensions) added to LocalSettings.php. I put a note into Extension:SphinxSearch#Mode_Of_Operation - but maybe the author(s) of the SphinxSearch extension should consider changing to the usual way of letting the user change an extension's behaviour? It's easier to update the extensions that way, since the whole directory can just be replaced with a new version, all settings are in LocalSettings.php and don't get overwritten. --Patrick Nagel 10:13, 10 April 2009 (UTC)
-
-
-
[edit] Question when searching for IP's
We use the Wiki here in an IT setting so many of our articles refer IP addresses. The default search does not find any variation of IPs when searched (for example 102., 102.160.2.2, 106..etc.) Can anyone tell me if this search does a better job with this? Thanks. --Comalia 19:37, 15 July 2008 (UTC)
- It would certainly do a better job than MySQL full-text index - even in default configuration. You could also tweak it further, but I am not sure I fully understand what exactly you need. If you provide a some specific examples of data and search strings that should match it, I can test it. Svemir Brkic 22:45, 15 July 2008 (UTC)
Sure. Say that I have a few articles that have the line of text 192.165.1.0 in them. So, if searching for 192.165.1.0, would it return any results? Or variations of it, such as "192.165"? --Comalia 13:41, 18 July 2008 (UTC)
- Yes, both searches will match that article. It will consider 192, 165, 1, and 0 as separate "words". You can tell it whether to search for all those words or any of them (it is an option on the search page, but you can also change the default.) Since proximity of the matched words is an important factor, you will get the articles that have entire IP in them first. Svemir Brkic 16:46, 18 July 2008 (UTC)
[edit] Installing issues
I am trying to install SphinxSearch 0.9.8 on Linux RHEL with mySQL. I did the ./configure and everything seemed fine. Then when build the binaries with make, I get the follwing:
sphinx.h:54:19: error: mysql.h: No such file or directory
--Comalia 19:50, 22 July 2008 (UTC)
- SOLUTION
- I had a similar issue on FC9. I did "yum install mysql-devel" and that fixed it. Try installing the mysql-devel version for your mysql install and then building sphinx. --5 August 2008
[edit] Database error when running on MediaWiki 1.13.0
I get the following error:
A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was:
(SQL query hidden)
from within function "SphinxSearch::wfSphinxSearch". MySQL returned error "1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near at line 1 (localhost)".
Searching from the command line DOES work, however. It only fails when I use the extension from within my Wiki.
Any ideas on how to fix this appreciated! --Zerbey 19:37, 11 September 2008 (UTC)
To display the hidden SQL statement place $wgShowSQLErrors = true; in your LocalSettings.php file. When you try the search again it will display something like:
SELECT old_text FROM `wikitext` WHERE old_id=
And this is a result of what is identified by Svemir Brkic in the $sql statement below found in the SphinxSearch_body.php file. warens 23:00, 18 November 2008 (UTC)
- The only query in that method is generated with this line:
$sql = "SELECT old_text FROM ".$db->tableName('text')." WHERE old_id=".$docinfo['attrs']['old_id'];
- Find the line and error_log or echo the query to see what might be wrong with it. Svemir Brkic 17:07, 12 September 2008 (UTC)
I have the same error in MW 1.11.0, if i enter a search string, which appears in the wiki ( like foo ). But when i enter something like dhfbvjhsfdbvjhsb, sphinx says, that 0 results ar found. On the commandline the search works very well an fast. --212.87.151.131 15:29, 16 September 2008 (UTC)
- Well here's the error:
[Wed Oct 08 10:16:19 2008] [error] [client 10.20.60.26] PHP Notice: Undefined variable: wgSphinxSuggestMode in /var/www/htdocs/wiki/extensions/SphinxSearch/SphinxSearch.php on line 85, referer: http://wiki.galaxy.invalid/wiki/index.php/Special:SphinxSearch [Wed Oct 08 10:16:19 2008] [error] [client 10.20.60.26] PHP Notice: Undefined variable: wgSphinxSuggestMode in /var/www/htdocs/wiki/extensions/SphinxSearch/SphinxSearch.php on line 89, referer: http://wiki.galaxy.invalid/wiki/index.php/Special:SphinxSearch [Wed Oct 08 10:16:20 2008] [error] [client 10.20.60.26] PHP Notice: Undefined index: old_id in /var/www/htdocs/wiki/extensions/SphinxSearch/SphinxSearch_body.php on line 347, referer: http://wiki.galaxy.invalid/wiki/index.php/Special:SphinxSearch
- Not sure were to go from here. Zerbey 14:20, 8 October 2008 (UTC)
[edit] Answer
The first two notices can be ignored, but I have fixed them in the CVS anyway. To fix them in your code, find this line in SphinxSearch.php:
#$wgSphinxSuggestMode = true;
Change it to:
$wgSphinxSuggestMode = false;
Unless, of course, you want to turn the suggestions on. The third notice indicates that something is wrong with your sphinx.conf file, or with the version of sphinx you are using. Recent versions of sphinx require this to be set in sphinx.conf:
sql_attr_uint = old_id
Older versions used sql_group_column instead. If your sphinx.conf has one of these, try the other one (or try upgrading sphinx itself.) Svemir Brkic 14:54, 8 October 2008 (UTC)
- I have the exact same problem, i have checked my sphinx.conf file and it already has the sql_attr_uint = old_id line already, replacing it doesnt seem to help either. Im using Sphinx 0.9.8.1. Would there be any other ways to correct this problem? --213.122.168.100 15:08, 27 November 2008 (UTC)
- If you are having exact same problem as above, with all of those warnings and error messages, you first need to download the latest version of the SphinxSearch extension. Also, make sure your index has been rebuilt and search deamon restarted. If none of that helps, perhaps post your sphinx.conf somewhere (without the db password...) and someone may be able to tell you where the problem is. Svemir 13:45, 4 December 2008 (UTC)
Sphinx.conf file below without passwords and comments. I apologise if this is the wrong place to post code. For the searchd settings near the bottom of the file, should the address be 127.0.0.1 if the search deamon is on a remote ubuntu server. Regardless though of the entry there, it still has the same error. Any help people can give would be gratefully received!!
source src_wiki_main
{
- type = mysql
- sql_host = 10.150.2.71
- sql_user = wikiuser
- sql_pass = password
- sql_db = wikidb
- sql_query_pre = SET NAMES utf8
- sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id
- sql_attr_uint = page_namespace
- sql_attr_uint = old_id
- sql_query_info = SELECT page_title, page_namespace FROM page WHERE page_id=$id
}
source src_wiki_incremental : src_wiki_main
{
- sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id AND page_touched>=DATE_FORMAT(CURDATE(), '%Y%m%d070000')
}
index wiki_main
{
- source = src_wiki_main
- path = /var/data/sphinx/wiki_main
- docinfo = extern
- morphology = stem_en
- min_word_len = 1
- charset_type = utf-8
- charset_table = 0..9, A..Z->a..z, a..z, \
U+C0->a, U+C1->a, U+C2->a, U+C3->a, U+C4->a, U+C5->a, U+C6->a, \ U+C7->c,U+E7->c, U+C8->e, U+C9->e, U+CA->e, U+CB->e, U+CC->i, \ U+CD->i, U+CE->i, U+CF->i, U+D0->d, U+D1->n, U+D2->o, U+D3->o, \ U+D4->o, U+D5->o, U+D6->o, U+D8->o, U+D9->u, U+DA->u, U+DB->u, \ U+DC->u, U+DD->y, U+DE->t, U+DF->s, \ U+E0->a, U+E1->a, U+E2->a, U+E3->a, U+E4->a, U+E5->a, U+E6->a, \ U+E7->c,U+E7->c, U+E8->e, U+E9->e, U+EA->e, U+EB->e, U+EC->i, \ U+ED->i, U+EE->i, U+EF->i, U+F0->d, U+F1->n, U+F2->o, U+F3->o, \ U+F4->o, U+F5->o, U+F6->o, U+F8->o, U+F9->u, U+FA->u, U+FB->u, \ U+FC->u, U+FD->y, U+FE->t, U+FF->s,
}
index wiki_incremental : wiki_main
{
- path = /var/data/sphinx/wiki_incremental
- source = src_wiki_incremental
}
indexer
{
- mem_limit = 64M
}
searchd
{
- address = 127.0.0.1
- port = 3312
- log = /var/log/sphinx/searchd.log
- query_log = /var/log/sphinx/query.log
- read_timeout = 5
- max_children = 30
- pid_file = /var/log/sphinx/searchd.pid
- max_matches = 1000
}
- --eof--
[edit] init.d script for FC users
Here is a chkconfig compatible script I created for FC users. It is a modification on a script by Vladimir Fedorkov. This Script assumes you've put the pid file (config in sphinx.conf) in /var/run for selinux purposes. Speaking of selinux, you'll need to add port 3312 to the http port context.
[edit] Keyword Priority in Query String
It seems that the order of keywords actually changes the results. In my case, if I send a space delimited list of keywords, I get different search results depending on the position of my most important keywords. Am I missing a setting that prioritizes keywords based on there position in the query string? Cedarrapidsboy 13:45, 26 September 2008 (UTC)
- Yes, order of keywords matters, as well as the order and proximity of the matches in searched text. That is the function of the Sphinx itself, but you can affect it by changing the matching mode in SphinxSearch.php. Svemir Brkic 17:01, 26 September 2008 (UTC)
-
- Thanks. Is the order of keywords documented? I mean to say, where can I find information on where to put my most important words, and then my least important words? I understand the matching modes, but haven't read any mention of keyword position (with no operators joining them) affecting priority. I'm sure I'm likely blind. 12.207.221.230 00:23, 27 September 2008 (UTC)
[edit] getActionURL did not preserve query terms
getActionURL did not preserve query terms when clicking on «page URL». So query like "Foo+Bar" on page 2 transforms to query "Foo Bar". You have to replace (at function getActionURL) line
$qry = $kiaction . "?$searchField={$term}&fulltext=".wfMsg('sphinxSearchButton')."&";
to something like this
$sterm=urlencode($term);
$qry = $kiaction . "?$searchField={$sterm}&fulltext=".wfMsg('sphinxSearchButton')."&";
--StasFomin 12:50, 10 November 2008 (UTC)
- Thanks! I guess nobody reported this until now because in some browsers it works anyway. Fixed and committed to CVS. Svemir Brkic 00:25, 11 November 2008 (UTC)
[edit] Feature Request: Excluding Selected Categories from search
Will be useful to filter results not only by pointing to desired categories, but also by setting undesired categories
$cl->SetFilter('category', $categories_to_exclude, true);
The Search Form will be like this:
Include Exclude
Category1 [x] [ ]
Category2 [ ] [ ]
...
Category7 [ ] [x]
Category8 [ ] [ ]
--StasFomin 14:59, 10 November 2008 (UTC)
- Thanks for the suggestion. I will try adding this in the next release. Svemir Brkic 00:17, 11 November 2008 (UTC)
[edit] Version 0.6.1 released
Version 0.6.1 can now be downloaded from SourceForge. See the main page for details. Svemir Brkic 15:34, 11 November 2008 (UTC)
[edit] how to change the search page language
I want to translate the search page to chinese, I add some codes in sphinxsearch_body.php
$allMessages = array(
'en' => array(
'sphinxsearch' => 'Search Wiki Using Sphinx',
'sphinxSearchInNamespaces' => 'Search in namespaces:',
'sphinxSearchInCategories' => 'Search in categories:',
'sphinxResultPage' => 'Result Page: ',
'sphinxPreviousPage' => 'Previous',
'sphinxNextPage' => 'Next',
'sphinxSearchPreamble' => "Displaying %d-%d of %d matches for query %s retrieved in %0.3f sec with following stats:",
'sphinxSearchStats' => "* %s found %d times in %d documents",
'sphinxSearchButton' => 'Search',
'sphinxSearchEpilogue' => 'Additional database time was %0.3f sec.',
'sphinxSearchDidYouMean' => 'Did you mean',
'sphinxMatchAny' => 'match any word',
'sphinxMatchAll' => 'match all words',
'sphinxLoading' => 'Loading...'
)
'zh-cn' => array(
'sphinxsearch' => '使用斯芬克斯搜索工具',
'sphinxSearchInNamespaces' => '在名字空间内搜索:',
'sphinxSearchInCategories' => '在目录内搜索:',
'sphinxResultPage' => '结果页数: ',
'sphinxPreviousPage' => '前一页',
'sphinxNextPage' => '下一页',
'sphinxSearchPreamble' => "Displaying %d-%d of %d matches for query %s retrieved in %0.3f sec with following stats:",
'sphinxSearchStats' => "* %s 被发现 %d 次在 %d 个文件里",
'sphinxSearchButton' => '搜索',
'sphinxSearchEpilogue' => 'Additional database time was %0.3f sec.',
'sphinxSearchDidYouMean' => '你的意思是',
'sphinxMatchAny' => '匹配任何词',
'sphinxMatchAll' => '匹配全部词',
'sphinxLoading' => '导入...'
)
but it can't work. The page still has english word display, not chinese. can somebody help me? kgbkgbkgb 15:34, 5 December 2008 (UTC)
- Perhaps you already used the extension before adding those translations? I think in that case you probably need to translate the strings in your Special page -> System messages. Svemir Brkic 18:59, 17 January 2009 (UTC)
[edit] Wikipedia
Can anyone tell me why Wikipedia has not installed this extension? According to the main article, it works with Wikipedia. --Robinson Weijman 09:59, 21 January 2009 (UTC)
- Wikipedia already uses a Lucene search engine. —Emufarmers(T|C) 11:53, 21 January 2009 (UTC)
-
- OK, thanks. So when and why would SphinxSearch be better than Lucene-Search - and vice versa? --Robinson Weijman 07:34, 22 January 2009 (UTC)
-
-
- Lucene has more features and is a more stable and mature product. It also needs more resources and is harder to install and maintain. Sphinx is still evolving - both the search engine itself and MediaWiki extension. It may not have all the features of Lucene yet, but it is much easier to setup and try out. If it does not do something you need, by all means go for Lucene. Svemir Brkic 14:00, 22 January 2009 (UTC)
-
-
-
-
- Thanks both of you for your feedback.--Robinson Weijman 08:57, 23 January 2009 (UTC)
-
-
[edit] Searching multiple namespaces as a default
I've had problems getting this working. The following has been specified in the LocalSettings.php with no luck:
$wgNamespacesToBeSearchedDefault = array(
NS_MAIN => true,
NS_HowTo => true,
Will SphinxSearch pick this up? I've tried applying this before and after the SphinxSearch extension loading lines in LocalSettings.php.
CaliVW78 06:07, 8 February 2009 (UTC)
- SphinxSearch will pick it up the same way the default MediaWiki search picks it up - for new and for anonymous users. Existing users already have their own default set of namespaces saved in their user preferences and have to change it manually. See Manual:$wgNamespacesToBeSearchedDefault for details (and how to update existing user preferences with a maintenance script.) Svemir Brkic 12:58, 8 February 2009 (UTC)
- Thanks for the reply, I clearly had missed some items when looking at the wgNamespacesToBeSearchedDefualt page earlier. I am however, still running into an issue where those who are not logged in, are not getting search results from the additional namespace that I am wanting. Below is the code I have in my LocalSettings.php, and the new name space I am wanting to be searched is "HowTo".
$wgNamespacesToBeSearchedDefault = array(
NS_MAIN => true,
NS_HowTo => true,
);
In addition to that, this is what what placed in the LocalSettings.php to create that namespace...
$wgExtraNamespaces =
array(
100 => "HowTo"
);
define('NS_HowTo', 100);
Thanks again for the help. CaliVW78 13:12, 17 February 2009 (UTC)
- Ok, here's how I got this to work. Adding a line to $wgNamespacesToBeSearchedDefault was clearly NOT working. Instead, I added the following to LocalSettings.php to define the default user option. "100" is the ID that was given to my newly created namespace that I wanted to have included in the search results. CaliVW78 09:16, 18 March 2009 (UTC)
$wgDefaultUserOptions = array(
'searchNs100' => 1,
);
[edit] Orphaned pages not indexed?
It seems SphinxSearch is unable to find orphaned pages in my wiki. Only things linked somehow (also indirectly) from the Main Page can be found. Is that something that can be configured away? I am using sphinx-0.9.9-rc1, SphinxSearch extension 0.6.1, MediaWiki 1.13.3. Thanks! — User:Trohlfing Feb 8, 2009
- SphinxSearch extension is not using wiki links in any way. It is running direct database queries, as specified in sphinx.conf. You need to check the namespace of those orphaned pages. Svemir Brkic 21:44, 8 February 2009 (UTC)
-
- There is a case in which it only finds linked-to pages. If a page title has spaces in it, these are stored as underscores in the MediaWiki database (in the "page" table). Sphinx doesn't know about this, so if you search for a page called "Sphinx search engine", by typing "Sphinx search engine" instead of "Sphinx_search_engine", you'll only see the links to that page -- so, if your page is orphaned (no pages reference your page title using spaces), you won't get the result. User:mphasak 23:30, 9 February 2009 (UTC)
-
-
- Check your sphinx.conf file. If it has "_-> ," in the charset_table, you should remove it and reindex. Svemir Brkic 02:34, 10 February 2009 (UTC)
-
-
-
-
- Perfect!! Thanks, Svemir. User:mphasak 19:25, 10 February 2009 (UTC)
-
-
[edit] How to Get the Did you mean feature working
Im trying to get the Did you mean feature of Sphinx working, i have aspell installed on ubuntu. But when i add the lines that tell sphinx to use it - i see no difference. I have re-indexed still to no avail. Has anybody got any troubleshooting advice, as i cant seem to find much online or how to get it working. Is there anything i can do to find where the problem lies? --Trickedicky 11:51, 9 February 2009 (UTC)
[edit] Confirm that Aspell command access is working in Windows?
I've got a similar issue. I'm trying to use the Aspell command line with SphinxSearch (which I love, btw). Aspell tests fine on it's own, but it doesn't offer any suggestions for spelling on my wiki. Is there a special format for specifying the path to Aspell when hosting on Windows? 130.234.189.190 16:49, 23 February 2009 (UTC)
[edit] Solution
To enable Aspell to work within Windows, open the SphinxSearch.php file. Change the $wgSphinxSuggestMode to true. Then change the $wgSphinxSearchPersonalDictionary variable to C:\Program Files\Aspell\dict (adjust this path according to where you installed Aspell), and save your changes.
Here is an example of my changes:
# Should the suggestion mode be enabled? $wgSphinxSuggestMode = true; # Path to where aspell has location and language data files. Leave commented out if unsure #$wgSphinxSearchPspellDictionaryDir = "/usr/lib/aspell"; # Path to personal dictionary (for example personal.en.pws.) Needed only if using a personal dictionary $wgSphinxSearchPersonalDictionary = "C:\Program Files\Aspell\dict";
[edit] Alternative solution
Above solution did not work, until I added the php_pspell.dll to the php.ini and installed the aspell-15.dll in the WINDOWS/system32 dir. See this guide. Then I used the config from this guide, and then Did You Mean finally worked.
[edit] Case Sensitivity
Is SphinxSearch case sensitive? --Robinson Weijman 11:42, 10 February 2009 (UTC)
- Yes, if you use default sphinx.conf that comes with it. If you want to change that, remove the "A..Z->a..z, " part from the charset_table setting. Svemir Brkic 13:53, 10 February 2009 (UTC)
-
- Thanks for the prompt response! --Robinson Weijman 15:16, 10 February 2009 (UTC)
[edit] Search Results without Wikicode/Wiki markup
Does anyone have experience in excluding wikicode / wiki markup from being displayed in the search results? I couldn't find anything on the Sphinx site. Thanks in advance, labalena 149.211.153.96
- There is no easy way. Sphinx does not know anything about wiki markup - it can only be told to strip HTML when indexing. You could in theory keep a separate copy of active revisions, with wiki markup removed with some script, and index that instead of the real content. Svemir Brkic 02:32, 18 March 2009 (UTC)
[edit] REDIRECT
Q: How config sphinx to not search "Redirect page" in my search result I have
- Page title
REDIRECT Page title to redirect
A: You can modify the query in sphinx.conf to filter out any pages you do not want. Svemir Brkic 02:30, 18 March 2009 (UTC)
Q2: How can filter out all page contain this: "#REDIRECT [["
A2: Change this:
sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text \ FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id
To something like this:
sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text \ FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id and page_is_redirect=0
Svemir Brkic 17:39, 19 March 2009 (UTC)
[edit] No Result Page
I'm running on Windows and MediaWiki 1.14. The daemon is started, and I can use the command line search, but when I try and search using the special page, nothing happens. The page simply appears to reload. There are no messages in Event Viewer, the query.log file shows my query terms, there are no messages in searchd.log. I have no idea why I'm not getting any result pages. Any able to give me a clue?
Also, the service appears to be listening since I was able to connect via telnet to localhost at 3312. I haven't changed any of the properties in the SphinxSearch.php page.
Running through the options in SphinxSearch.php I noticed that the $wgSphinxSearch_index value didn't match my Sphinx.conf. I've corrected that but still no search results. The URL in the address bar doesn't even change after hitting the search button.
- Is Sphinx set to be the default search? Whatever it is now, try the other option. Also make sure php.ini is set to display or log errors and watch for PHP errors or warnings. Svemir Brkic 02:36, 18 March 2009 (UTC)
[edit] Handling of HTML tags
There are problems with the current handling of HTML tags: The <span> tags that highlight the match are inserted into the result before the result is run through strip_tags(), requiring strip_tags() to exclude the <span> tag. This has potential to cause problems, when <span> tags are used in Wiki pages.
Furthermore, strip_tags() gets confused (and removes a lot of wanted content) by input like
- 3<4
- Run <code>mail who@ev.er <text</code> on the shell to send an e-mail containing the contents of file text to who@ev.er.
which is likely to appear on Wiki pages. --Patrick Nagel 09:48, 8 April 2009 (UTC)
In LocalSettings.php try adding $wgSphinxSearch_host = "127.0.0.1";
[edit] newworldencyclopedia not such a good example
Went to http://www.newworldencyclopedia.org as this was given as a great example of Sphinx in action.
Did a search for pople to see if it would correct this to people.
The result was somewhat uninspiring.
OK to ask did I mean pope. Not OK to tell me it was showing zero matches and then said it found pople 15 times in 9 documents without showing any of them?
My users would not be impressed I'm afraid.
There is no page titled "pople". You can create this page. Did you mean pope?
Displaying 0-0 of 0 matches for query pople retrieved in 0.031 sec with following stats:
* pople found 15 times in 9 documents
- So far nobody suggested another example. The problems at NWE will be corrected soon. Currently there is only one index, used by two different interfaces. One of the interfaces exposes additional namespaces. Sphinx always tells you the full number of matches in the index, even though some of them may be filtered out by namespace and other filters. Svemir Brkic 19:44, 15 April 2009 (UTC)
[edit] sphinxapi.php - Permission Denied
I've installed the extension 0.6.1 and SphinxSearch 0.9.8.1 using MediaWiki 1.14 on Centos 5.2. when i uncomment the line require_once( "$IP/extensions/SphinxSearch/SphinxSearch.php" ); in localsettings.php I get the following error:
Warning: require_once(/var/www/mediawiki/extensions/SphinxSearch/sphinxapi.php) [function.require-once]: failed to open stream: Permission denied in /var/www/mediawiki/extensions/SphinxSearch/SphinxSearch.php on line 20
Fatal error: require_once() [function.require]: Failed opening required '/var/www/mediawiki/extensions/SphinxSearch/sphinxapi.php' (include_path='.:/var/www/mediawiki:/var/www/mediawiki/includes:/var/www/mediawiki/languages') in /var/www/mediawiki/extensions/SphinxSearch/SphinxSearch.php on line 20
permissions and user is the same as all other extensions that work fine. When i comment out require_once ( dirname( __FILE__ ) . "/sphinxapi.php" ); in SphinxSearch.php the extension is loaded and listed in the Special pages, although it can't be used to search. Help is much appreciated.
- Are you sure permissions are exactly the same on sphinxapi.php file as on the other files in the same folder? There has to be some difference. 17:01, 12 May 2009 (UTC)
They are the same as noted below
-rwxr--r-- 1 root root 2217 May 4 08:09 ExtensionFunctions.php
-rwxr--r-- 1 root root 34191 May 11 13:54 sphinxapi.php
-rw-r--r-- 1 root root 14081 May 7 08:50 sphinx.conf.bak
-rwxr--r-- 1 root root 22472 Nov 10 18:59 SphinxSearch_body.php
-rwxr--r-- 1 root root 1758 Nov 10 19:00 SphinxSearch.js
-rwxr--r-- 1 root root 8616 Nov 10 2008 SphinxSearch_PersonalDict.php
-rwxr--r-- 1 root root 4222 May 11 13:59 SphinxSearch.php
-rw-r--r-- 1 root root 4287 May 11 13:11 SphinxSearch.php.bak
-rwxr--r-- 1 root root 5534 Nov 10 2008 SphinxSearch_spell.php
-rw-r--r-- 1 root root 1456 Apr 6 2008 spinner.gif
Thanks J
[edit] Sorting by namespace
Is this possible? I'm not able to find a clear way to do it in the documentation, but am looking for some way to put one of our existing namespaces at the top of all the other hits. Any ideas? As always, thanks in advance. CaliVW78 13:39, 18 May 2009 (UTC)
[edit] Tweaking the Search Excerpts?
Does anyone know how I might adjust the excerpts displayed with the search returns? I find 5 lines to be a bit too long, and would like to experiment with fewer. Thanks!
--John Thomson 03:13, 11 June 2009 (UTC)
- In SphinxSearch_body.php, line 343 or so, change "limit" => 400 to a smaller number. I guess we should make it configurable, but there it is for now. Svemir Brkic 13:22, 11 June 2009 (UTC)
-
- Ah! Many thanks, Svemir, that did the job nicely! (I went to 250, and am very pleased!)
--John Thomson 22:14, 11 June 2009 (UTC)
- Ah! Many thanks, Svemir, that did the job nicely! (I went to 250, and am very pleased!)
[edit] Pages made from templates/transcluded pages do not rank well
So far Sphinx search produces the best result of all the search engines I have tried. Recently I have noticed that some my templates and sub pages are appearing higher in the search results than the main page that includes them. I understand that the indexer does not parse any of the wiki text and only looks at a single page entity. Is it possible to specify groups of related pages so that my main page will contain all of the text of it sub pages?
For Example, if Page-main =>(includes) Page-info & Page-index then I want all the text on Page-info and Page-index to be included in the results for Page-main. I can even go as far as saying that it is a rule that pages have a strict naming convention -main -info -index
Or is there some way for the indexer to know that -index is 'linked to' by -main and include the results for -index in main.
Any suggestions are welcome.
[edit] Bug: MediaWiki variables (like {{CURRENTTIMESTAMP}}) get expanded on results page but not in search
see http://www.newworldencyclopedia.org/entry/Special:Search?search={{CURRENTTIMESTAMP}}&fulltext=Search
--Patrick Nagel 04:25, 27 July 2009 (UTC)
- Good catch :-) I fixed it in the CVS. If you need to fix this in your wiki before the next release, change the line 49 in SphinxSearch_body.php and add "nowiki" tags around %s. Svemir Brkic 11:49, 27 July 2009 (UTC)
[edit] MW API Bug
Causes a 500 when calling search from MW API (ie: api.php?action=query&list=search&srsearch=samplequery)
[edit] Result weighting
Hello,
I have a little problem with the sortings in the result page.
For e.g. if I search for mysql I get every entry but the sorting is horrible.
I have several pages with mysql in text and as part of the page_title.
I would like to have the page_title parts in front of the appearing in body results. I set the
$wgSphinxSearch_weights = array('old_text'=>1, 'page_title'=>1000);
in SphinxSearch.php.
I also tried sql_attr_uint = page_title in my sphinx.conf. but didn't help at all to get a better result.
Settings in the sphinxapi.php:
Matching mode is set to extended. Sort mode is set to relevance. Tried every type but this is the best so far. Group mode is set to SPH_GROUPBY_ATTR and ranker is the default one.
I would be very pleased if someone could help me.
Greetings,
Tom
- sql_attr_uint is only for numeric values. It is used for filtering. After you adjust this and the weights, make sure to rebuild the index and restart the deamon, just in case. In our case, we always get the title matches first. Our weights are 'old_text'=>1, 'page_title'=>100, extended match mode, and we leave sorting and grouping at default (we do not set them at all.) Svemir Brkic 13:16, 8 September 2009 (UTC)
-
- Hi, thanks for the quick answer. I turned it back to default. Rebuilded the index, after the changes but still for e.g. i get a page with the title Statistics infront of a page called MySQL5. It has something todo with the weight of upcomming words in body i guess but as I defined the weight of page_title higher than the old_text, i guess it should be vice verca. Greetings, Tom.
-
-
- What versions of sphinx, the extension, etc. do you use? Svemir Brkic 03:12, 9 September 2009 (UTC)
-
Hi,
basic sphinx installer is sphinx-0.9.8.1. Extension Sphinx is SphinxSearch-0.6.1. PS: Mediawiki is version 1.15.0 Tom
[edit] Multiple Indexes on same searchd returning strange results
When using multiple indexes (for multiple wikis) running against the same searchd, we end up with a search matching against all indexes, but only displaying results for the configured index. This can cause several dozen pages of blank entries interspersed with a few links.
Results are even worse when sphinx is used for things other than mediawiki - there will be SQL syntax errors and search feature will be broken completely - when there are matches from other indices which do not have all the expected fields in the query result.
in SphinxSeach_body.php change
$res = $cl->Query($search_term, "*");
to
$res = $cl->Query($search_term, "index1 index2");
where index1 and index2 are names of indices you want to search.
there really should be a config variable - specifying which indices need to be searched.
[edit] SQLite Configuration
First off, many thanks for the extension. For me it is a real problem-solver.
With a few changes to the sphinx config file and a couple of helper scripts (php), SphinxSearch works with SQLite based MediaWikis, too. Thought this might be of interest, seeing as vanilla search does not work with SQLite in the current stable MediaWiki (1.15.1).
My setup:
- MediaWiki 1.15.1
- PHP 5.1.6
- SphinxSearch 0.6.1
- Sphinx 0.9.8.1
First, the helper scripts. These are php scripts that are run by the sphinx indexer and do the following:
- Connect to the MediaWiki SQLite database file using php_pdo (which must be available to run MW on SQLite anyhow)
- Run the indexer queries
- Translate the results into XML for sphinx to process as an xmlpipe2 type source
These can go anywhere you want - I put mine in the ./data/ directory alongside the wikidb.sqlite file. Probably they should go in ./maintenance where all the other command-line PHP scripts are.
This is the main update script:
/path/to/wiki/data/sphinx_sqlite_main.php:
<?php
// This is the path to your SQLite MediaWiki database file
$wikidb='/path/to/wiki/data/wikidb.sqlite';
// Bail if $wikidb is not a file
if (!is_file($wikidb)) {
exit;
}
// Bail if PDO constuctor fails
if (!$db = new PDO("sqlite:".$wikidb)) {
exit;
}
// Build the query
$qry = "SELECT page_id,page_title,page_namespace,old_id,old_text ";
$qry .= "FROM page,revision,text WHERE rev_id=page_latest AND old_id=rev_text_id";
// Run the query
$res=$db->query($qry);
// Parse the results into XML
$xmlout = '<?xml version="1.0" encoding="utf-8"?>'."\n"; // Should be true as SQLite downconverts UTF-16
$xmlout .= '<sphinx:docset>'."\n";
while($arr=$res->fetch(PDO::FETCH_ASSOC)) {
$xmlout .= '<sphinx:document id="'.$arr['page_id'].'">'."\n";
foreach ($arr as $strField => $strVal) {
$xmlout .= '<'.$strField.'>';
$xmlout .= $strVal;
$xmlout .= '</'.$strField.'>'."\n";
}
$xmlout .= '</sphinx:document>'."\n";
}
$xmlout .= '</sphinx:docset>'."\n";
// Deconstruct
$db = null;
// Return the XML
if ($xmlout != '') {
print $xmlout;
}
?>
This is the incremental update script. The only difference is the extra AND clause in the query. Probably these should be combined, but I didn't want to bother with command line options. I'm lazy.
/path/to/wiki/data/sphinx_sqlite_incremental.php:
<?php
// This is the path to your SQLite MediaWiki database file
$wikidb='/path/to/wiki/data/wikidb.sqlite';
// Bail if $wikidb is not a file
if (!is_file($wikidb)) {
exit;
}
// Bail if PDO constuctor fails
if (!$db = new PDO("sqlite:".$wikidb)) {
exit;
}
// Build the query
$qry = "SELECT page_id,page_title,page_namespace,old_id,old_text ";
$qry .= "FROM page,revision,text WHERE rev_id=page_latest AND old_id=rev_text_id ";
$qry .= "AND page_touched >= ".strftime("%Y%m%d")."070000 "; // adjust to the time of the main indexing job in UTC
// Run the query
$res=$db->query($qry);
// Parse the results into XML
$xmlout = '<?xml version="1.0" encoding="utf-8"?>'."\n"; // Should be true as SQLite downconverts UTF-16
$xmlout .= '<sphinx:docset>'."\n";
while($arr=$res->fetch(PDO::FETCH_ASSOC)) {
$xmlout .= '<sphinx:document id="'.$arr['page_id'].'">'."\n";
foreach ($arr as $strField => $strVal) {
$xmlout .= '<'.$strField.'>';
$xmlout .= $strVal;
$xmlout .= '</'.$strField.'>'."\n";
}
$xmlout .= '</sphinx:document>'."\n";
}
$xmlout .= '</sphinx:docset>'."\n";
// Deconstruct
$db = null;
// Return the XML
if ($xmlout != '') {
print $xmlout;
}
?>
You can test these with php -e script.php and they should spit out XML.
We will use these to feed the xmlpipe2 sources in sphinx.conf. I chose to define the fields and attributes in the config file, though you can also do this in the XML itself. Here is what your source containers look like. These replace the MySQL ones of the same names.
sphinx.conf:
source src_wiki_main
{
type = xmlpipe2
xmlpipe_command = php -e /path/to/wiki/data/sphinx_sqlite_main.php
xmlpipe_field = page_id
xmlpipe_field = page_title
xmlpipe_field = old_text
xmlpipe_attr_uint = page_namespace
xmlpipe_attr_uint = old_id
}
# data source definition for the incremental index
source src_wiki_incremental : src_wiki_main
{
xmlpipe_command = php -e /path/to/wiki/data/sphinx_sqlite_incremental.php
# all other parameters are copied from the parent source,
}
You will likely need to specify the full path to the PHP command line executable in the xmlpipe_command directives if this isn't in cron's exection path. (i.e., xmlpipe_command = /usr/bin/php -e [...]).
That's it. All else is as it appears on the Extension Page.
NB: This is functional on a dev box with a ten page wiki. It has not been production tested. That is your job <wink>.
-Jef (jef at lfaccess dot net - checked infrequently)
[edit] Strange output
I'm running the SphinxSearch on a standard LAMP setup (CentOS 5.3, Apache 2.2.14, MySQL 5.0.77, PHP 5.2.11) with MediaWiki 1.15.1. Searches keep returning well formatted and readable page titles, but garbage for the excerpts, like:
Title 1
��ߎ�&����)����^w+E����f�lқh�bl�k�:`�N�����l
Title 2
U�A��0�E�=E/`�9@W�������
Title 3
�XYs"��~�W�����1B�`Z��1kbu�j4�1C�Ew���k��ul�������cy�� ��
Title 4
�Z�n��� �W4�� 9����vl�2����� YH���0���g� �ͰI�����
Has anyone else seen this or have an idea of where to look?
- Do you have $wgCompressRevisions enabled? It looks like by default this extension sets Sphinx up to read text.old_text directly out of the database, which will not give useful results if you have any of our fancier storage features enabled. (Compressed revisions, batch compression, external storage, 'cur' table back-compat entries, legacy encoding back-compat entries, etc.) In this case you'd need to feed updates into Sphinx over an xmlpipe source or something... --brion 08:55, 6 November 2009 (UTC)
-
- Excellent catch. I do have $wgCompressRevisions enabled. As a result, I had tried altering the sql_query statements in sphinx.php to accont for that with "
sql_query = SELECT page_id, page_title, page_namespace, old_id, UNCOMPRESS(old_text) AS old_text FROM mw_page, mw_revision, mw_text WHERE rev_id=page_latest AND old_id=rev_text_id AND page_touched>=DATE_FORMAT(CURDATE(), '%Y%m%d030000')" to no avail. What puzzles me is that the page titles come out just fine, but the full text excerpts don't. How would I go about feeding updates into Sphinx over an xmlpipe since it's all in a MySQL DB?
- Excellent catch. I do have $wgCompressRevisions enabled. As a result, I had tried altering the sql_query statements in sphinx.php to accont for that with "