Extension talk:SphinxSearch/LQT Archive 1

Old discussion points relevant only to older versions of the extension, Sphinx, or MediaWiki have been moved to the archive page.

feature request: show categories in result list
it would be great to be able to list the categories of articles in search results. --Nilsja 01:29, 1 June 2010 (UTC)
 * This would be a neat feature, and one my users would like. I looked in SphinxSearch_Body.php as this is where the excerpt is generated. I am not php expert, but couldn't easily see where the categories of the result articles were stored, or how to add these to the generated excerpt, perhaps as a list just before the text excerpt is shown. This would be really useful. Happy to add this to a hack in My current version if someone would suggest what needs to get added. -- Brett.tyson 13:20, 18 April 2011 (UTC)

Working well in MW1.5!
Everyone I have spoken to that uses our internal Wiki has nothing but positives to say about this. I really think MediaWiki should adopt Sphinx as the DEFAULT search, as the bundled one is so bad. --195.75.83.25 08:45, 6 July 2009 (UTC)


 * Thanks! Note that the next release will not use ExtensionFunctions, and it will require at least MW 1.7 (and PHP5 in general.) Also note that it was never intended for MW <1.9 - no idea how you got it to work with 1.5 :-) Svemir Brkic 19:45, 12 December 2009 (UTC)

Grouping results by Namespace
I have the extension running but since it doesn't support the weighting of results, is there a way to group the results by the namespace they belong to? -- 20:48, 14 April 2009 (UTC)


 * Version 0.7 supports $wgSphinxSearch_index_weights array, which lets you specify a weight for each index you have. You still need to setup those indexes manually, as it does not make much sense to have that as the default setup. I will try to prepare a sample sphinx.conf for such a setup - unless somebody else beats me to it (HINT: you can have the main query include only namespace 0, and a supplemental index for other namespaces, or even each namespace separately. You will also need to have more than one incremental index, but maybe you will decide other namespaces do not need to get updated as frequently...) Svemir Brkic 04:08, 18 February 2010 (UTC)

Running on separate machines
My wiki is on shared hosting, so I can't install Sphinx on its server. I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home. With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver. It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)


 * Sphinx.conf file can be configured to make sphinxd run on a different machine (let's call it Machine S, for sphinx) from the machine running MySQL (let's call it Machine M, for mysql). However, in that case Machine S has to have netword access to Machine M. My guess is that something similar to your present setup with SSH tunnels could be done here as well. If you are interested in trying this out, please let me know via email (see extension credits) and we could work through these questions then. --Gri6507 22:17, 11 November 2007 (UTC)

I am very interested to know how this configuration has worked for users. I have read many articles that state the problem of "I run a wiki, but it's through a shared hosting and installing the Sphinx daemon is out of the question". This is also my case - so in an effort to get better search capabilities than the standard search, this appears to be one of my only options. A few questions I have: how much traffic is involved when the indexing is performed? Is it a problem to have the daemon going across the wire to access the SQL database for indexing? I have concerns that it will drastically increase my bandwidth usage. Second, I think it would be a great addition to the extension to redirect/use the standard search if the sphinxd cannot be found (if the daemon machine goes offline). Just some ramblings but I am interested in anyone's thoughts - Blac0177 06:04, 12 December 2008 (UTC)


 * Daemon does not do the indexing. That is done with a separate process which you run on a schedule. Daemon simply searches the index and returns the results. Search requests are small, and search results depend on the actual data being searched. You could have a replica of your database on the machine that runs the indexer and the daemon - just as it is described above in the Lucene example. You just need to make sure that your web server can communicate on the specified host and port to your sphinx daemon (and that nobody else can, as a security precaution.) Your bandwidth usage will depend on two things - the way you use to replicate the database, and the amount of search queries you get.


 * For replication, if it is MySQL and you can turn on binary logs, you can just replay those logs on your local copy. This is in case the database is too big to copy entire thing over for every indexer run. You could also dump just the records modified since the last run, since indexer does not really need your entire database - it only needs those tables that it actually indexes. Svemir Brkic 18:57, 17 January 2009 (UTC)

Searching multiple wikis
I currently have 3 wikis indexed with Sphinx. The search works well, but it is returning results for all of them. I've set the "$wgSphinxSearch_index = X"; line in the SphinxSearch.php. Am I missing something? --N0ctrnl 20:28, 24 March 2008 (UTC)


 * New in 0.7: $wgSphinxSearch_index_list lets you specify the list of indexes to search. You can have multiple main and delta indexes on the machine, and each wiki can define its own pair to search. You can also combine several index files and assign different weight to each of them, using $wgSphinxSearch_index_weights array. Svemir Brkic 01:06, 18 February 2010 (UTC)

Search through various indexes are failing to display combined results
On our 1.15.1 wiki system we are using SphinxSearch version 0.7 and the query log shows that a search on
 * both indexes are search for 0.023 sec [ext/1/rel 7 (0,15)] [wiki_main,wiki_incremental,research_main,research_incremental] Cotterrell but the search on one wiki system only shows results from [wiki_main,wiki_incremental] where results from [research_main,research_incremental] are not shown.
 * [research_main,research_incremental] are indexed and
 * SphinxSearch.php has been maintained with $wgSphinxSearch_index_list = "wiki_main,wiki_incremental,research_main,research_incremental"; and $wgSphinxSearch_index_weights has been set.

Changing to the other wiki system (1.16beta) for the search term 'kotler' would result in Displaying 61-75 of 87 matches for query kotler retrieved in 0.008 sec with these stats: kotler found 200 times in 107 documents Above numbers may include documents not listed due to search options. but would not show any result in the list (as those terms only exist on the 1.15.1 system). We conclude that the search term is found in the index files but something hinder to display results from one wiki on the others wiki result page.

Is their an option to set a split or combined display of search results and in case of a combined display also render the right url to an article pending on the server.

Any suggestions how to solve this would be appreciated --MWJames 01:35, 27 May 2010 (UTC)

^^ **I can confirm that this problem with my setup as well** --01:35, 13 June 2010 (UTC)

Facing the same issue, I have about 5 Wiki's and only the results of the main wiki are being displayed as links. --AutoStatic 15:10, 23 September 2010 (UTC)


 * Sphinx is not storing all the information needed to display the results. It only stores IDs and some other specific attributes such as the namespace. String content is only indexed - stored in some special manner that allows fast searching. When displaying search results, sphinx must have access to the original database to be able to lookup the titles and show the links. Svemir Brkic 17:25, 23 September 2010 (UTC)


 * What do you mean exactly by 'have access to the original database'? Where can I set these access rights? So it has probably nothing to do with the way I set up my Wiki's (http://www.steverumberg.com/wiki/index.php?title=WikiHelp_-_Method_Two)? --AutoStatic 12:26, 21 October 2010 (UTC)


 * SphinxSearch extension asks for search results from the index. It does not know how those results got into the index. It assumes they all came from the wiki database extension runs on currently. In order to display search results, extension needs to make additional database queries to the database the search results came from originally. This is not a matter or access rights. Extension was not designed to support multiple databases in this way. Such support would have to be added by someone - perhaps by adding an array that maps each index to a specific database. Svemir Brkic 15:44, 13 November 2010 (UTC)


 * Hello Svemir, all wiki's share the same database, they only have a different prefix. --AutoStatic 14:54, 17 January 2011 (UTC)

Is there any way to prevent Sphinx from indexing particular pages?
I realize this runs counter to what most people would want, but some pages don't need to be indexed. I've made some reasonable searches here and on the Sphinx site, and believe this is more relevant to a MediaWiki discussion than Sphinx in general. Jon Doran, 9 May 2008


 * You could modify the query in sphinx.conf to filter out any pages you do not want. It could be done based on namespace, a join with some other table (e.g. categorylinks,) or some new field or table you would create yourself. Svemir Brkic 01:23, 10 May 2008 (UTC)


 * Thanks for the suggestions. I did not consider the query, but now that you mention it, there is a lot I can do with it.  Jon Doran, 10 May 2008

More Windows Install Issues
Help for windows users.

Step 5 - Start Sphinx Daemon
To create the windows service ... C:\sphinx-0.9.9-win32\bin> searchd.exe --install  --config C:\Sphinx\sphinx.conf --servicename SphinxSearch

Start the service c:\> sc start SphinxSearch

If it fails, double check your sphinx.conf file. Make sure the paths are set properly in the searchd section.

Step 6 - Configure Incremental Updates
Use the Windows Task Scheduler, found under Accessories | System Tools | Scheduled Tasks

Create two jobs. One to run once a day and one to run as needed for incremental updates. If you use the scheduler there will not be any command pop ups.

If you get "ERROR: index 'wiki_main': column number 1 has no name." when trying to index, copy libmysql.dll from MySQL 5.0.37 into Sphinx bin directory. For some reason 5.1 version does not work with Sphinx on Windows.

Incremental Update and Windows Task Scheduler
As for the index update the windows task scheduler has to run the indexer.exe therefore one solution to avoid unnecessary command pop up in Windows is:
 * 1) Create one batch file wiki_main.bat that contain  \indexer.exe --quiet --config \sphinx.conf wiki_main --rotate
 * 2) Create one batch file wiki_incremental.bat that contain  \indexer.exe --quiet --config \sphinx.conf wiki_incremental --rotate
 * 3) Using the help of Invisible Batch File and create a " \bin\invisible.vbs" file that contains CreateObject("Wscript.Shell").Run """" & WScript.Arguments(0) & """", 0, this will help to set the batch file invisible
 * 4) After this the task scheduler can be run with a command like  \system32\wscript.exe " \bin\invisible.vbs" " \Sphinx\bin\wiki_incremental.bat"

Problems on Windows Vista / Windows 7
We are testing searchd on Vista and Windows 7 where the searchd daemon does not return any search results but since the test php-scripts that are coming with the standard sphinx interface itself not working we are guessing it is a problem inherited in the SphinxSearch and not related to the Mediawiki/SphinxSearch interface. We created a forum post Sphinx forum post/searchd and Windows for any follow up's. --MWJames 18:00, 4 May 2010 (UTC)
 * On Windows Vista we had to change $wgSphinxSearch_host from localhost to 127.0.0.1 in the SphinxSearch.php file.

$wgSphinxSearch_host = '127.0.0.1'; $wgSphinxSearch_port = 9312;
 * 1) Host and port on which searchd deamon is tunning
 * I've had the same issue and after some searching found that I started the search daemon with config file C:\Sphinx\sphinx.conf instead of the sphinx.conf file from the extension. I started over completely as my config files were messed up by continuous tinkering anyway, and this time made sure to always use the sphinx.conf file from the extension when there was a config needed (indexer, search deamon registration as service, ...). Now it works like a charm.

Question when searching for IP's
We use the Wiki here in an IT setting so many of our articles refer IP addresses. The default search does not find any variation of IPs when searched (for example 102., 102.160.2.2, 106..etc.) Can anyone tell me if this search does a better job with this? Thanks. --Comalia 19:37, 15 July 2008 (UTC)
 * It would certainly do a better job than MySQL full-text index - even in default configuration. You could also tweak it further, but I am not sure I fully understand what exactly you need. If you provide a some specific examples of data and search strings that should match it, I can test it. Svemir Brkic 22:45, 15 July 2008 (UTC)

Sure. Say that I have a few articles that have the line of text 192.165.1.0 in them. So, if searching for 192.165.1.0, would it return any results? Or variations of it, such as "192.165"? --Comalia 13:41, 18 July 2008 (UTC)
 * Yes, both searches will match that article. It will consider 192, 165, 1, and 0 as separate "words". You can tell it whether to search for all those words or any of them (it is an option on the search page, but you can also change the default.) Since proximity of the matched words is an important factor, you will get the articles that have entire IP in them first. Svemir Brkic 16:46, 18 July 2008 (UTC)

Installing issues
I am trying to install SphinxSearch 0.9.8 on Linux RHEL with mySQL. I did the ./configure and everything seemed fine. Then when build the binaries with make, I get the follwing:

sphinx.h:54:19: error: mysql.h: No such file or directory --Comalia 19:50, 22 July 2008 (UTC)


 * SOLUTION
 * I had a similar issue on FC9. I did "yum install mysql-devel" and that fixed it. Try installing the mysql-devel version for your mysql install and then building sphinx. --5 August 2008


 * SOLUTION
 * Same here. I'm a debian user. aptitude install libmysql++-dev libmysqlclient15-dev checkinstall corrected the problem. Much Love

init.d script for FC users
Here is a chkconfig compatible script I created for FC users. It is a modification on a script by Vladimir Fedorkov. This Script assumes you've put the pid file (config in sphinx.conf) in /var/run for selinux purposes. Speaking of selinux, you'll need to add port 3312 to the http port context.

Keyword Priority in Query String
It seems that the order of keywords actually changes the results. In my case, if I send a space delimited list of keywords, I get different search results depending on the position of my most important keywords. Am I missing a setting that prioritizes keywords based on there position in the query string? Cedarrapidsboy 13:45, 26 September 2008 (UTC)


 * Yes, order of keywords matters, as well as the order and proximity of the matches in searched text. That is the function of the Sphinx itself, but you can affect it by changing the matching mode in SphinxSearch.php. Svemir Brkic 17:01, 26 September 2008 (UTC)


 * Thanks. Is the order of keywords documented?  I mean to say, where can I find information on where to put my most important words, and then my least important words?  I understand the matching modes, but haven't read any mention of keyword position (with no operators joining them) affecting priority.  I'm sure I'm likely blind. 12.207.221.230 00:23, 27 September 2008 (UTC)

Feature Request: Excluding Selected Categories from search
Will be useful to filter results not only by pointing to desired categories, but also by setting undesired categories $cl->SetFilter('category', $categories_to_exclude, true);

The Search Form will be like this: Include   Exclude Category1       [x]        [ ] Category2       [ ]        [ ] ... Category7        [ ]        [x] Category8       [ ]        [ ] --StasFomin 14:59, 10 November 2008 (UTC)


 * New in 0.7: $wgUseExcludes Svemir Brkic 01:59, 18 February 2010 (UTC)

Wikipedia
Can anyone tell me why Wikipedia has not installed this extension? According to the main article, it works with Wikipedia. --Robinson Weijman 09:59, 21 January 2009 (UTC)


 * Wikipedia already uses a Lucene search engine. —Emufarmers(T 11:53, 21 January 2009 (UTC)


 * OK, thanks. So when and why would SphinxSearch be better than Lucene-Search - and vice versa?  --Robinson Weijman 07:34, 22 January 2009 (UTC)


 * Lucene has more features and is a more stable and mature product. It also needs more resources and is harder to install and maintain. Sphinx is still evolving - both the search engine itself and MediaWiki extension. It may not have all the features of Lucene yet, but it is much easier to setup and try out. If it does not do something you need, by all means go for Lucene. Svemir Brkic 14:00, 22 January 2009 (UTC)


 * Thanks both of you for your feedback.--Robinson Weijman 08:57, 23 January 2009 (UTC)

Orphaned pages not indexed?
It seems SphinxSearch is unable to find orphaned pages in my wiki. Only things linked somehow (also indirectly) from the Main Page can be found. Is that something that can be configured away? I am using sphinx-0.9.9-rc1, SphinxSearch extension 0.6.1, MediaWiki 1.13.3. Thanks! — User:Trohlfing Feb 8, 2009


 * SphinxSearch extension is not using wiki links in any way. It is running direct database queries, as specified in sphinx.conf. You need to check the namespace of those orphaned pages. Svemir Brkic 21:44, 8 February 2009 (UTC)


 * There is a case in which it only finds linked-to pages. If a page title has spaces in it, these are stored as underscores in the MediaWiki database (in the "page" table).  Sphinx doesn't know about this, so if you search for a page called "Sphinx search engine", by typing "Sphinx search engine" instead of "Sphinx_search_engine", you'll only see the links to that page -- so, if your page is orphaned (no pages reference your page title using spaces), you won't get the result.  User:mphasak 23:30, 9 February 2009 (UTC)


 * Check your sphinx.conf file. If it has "_-> ," in the charset_table, you should remove it and reindex. Svemir Brkic 02:34, 10 February 2009 (UTC)


 * Perfect!! Thanks, Svemir.  User:mphasak 19:25, 10 February 2009 (UTC)

How to Get the Did you mean feature working
Im trying to get the Did you mean feature of Sphinx working, i have aspell installed on ubuntu. But when i add the lines that tell sphinx to use it - i see no difference. I have re-indexed still to no avail. Has anybody got any troubleshooting advice, as i cant seem to find much online or how to get it working. Is there anything i can do to find where the problem lies? --Trickedicky 11:51, 9 February 2009 (UTC)

Confirm that Aspell command access is working in Windows?
I've got a similar issue. I'm trying to use the Aspell command line with SphinxSearch (which I love, btw). Aspell tests fine on it's own, but it doesn't offer any suggestions for spelling on my wiki. Is there a special format for specifying the path to Aspell when hosting on Windows? 130.234.189.190 16:49, 23 February 2009 (UTC)

Solution
To enable Aspell to work within Windows, open the LocalSettins.php file and do this before you include SphinxSearch.php:

P. S. Above solution did not work, until I added the php_pspell.dll to the php.ini and installed the aspell-15.dll in the WINDOWS/system32 dir. See this guide. Then I used the config from this guide, and then Did You Mean finally worked.

More info
I got this working on Window 2008 on 8/27/2010. I wanted to mention a few things I uncovered.

The php_pspell.dll appears to no longer be supported, and they are moving towards the Enchant library instead. When I downloaded PHP 5.3.3 NTS package the php_pspell.dll file was not included. So you can't use that dll to call ASpell from PHP, and must use the command line calls to ASpell instead.

When using the personal dictionary, it is not just a list of words. It needs to have a header line. http://aspell.net/man-html/Format-of-the-Personal-and-Replacement-Dictionaries.html So this is my test personal dictionary:

For whatever reason, I had to specify the $wgSphinxSuggestMode and $wgSphinxSearchPersonalDictionary settings both before and after the require_once. So here is my complete config:

If you use stopwords, they will not fall out of the index, until you re-index. Might be obvious to some, but doesn't hurt to say it.

Also the extension page mentions this: "SphinxSearch will create a new restricted access special page called Wiki-specific Sphinx search spellcheck dictionary." ... but I have no idea what this is talking about. The restricted special page never showed up, and I didn't see any code the looked like it extended SpecialPage besides the special Sphinx search page.

Case Sensitivity
Is SphinxSearch case sensitive? --Robinson Weijman 11:42, 10 February 2009 (UTC)


 * Yes, if you use default sphinx.conf that comes with it. If you want to change that, remove the "A..Z->a..z, " part from the charset_table setting. Svemir Brkic 13:53, 10 February 2009 (UTC)


 * Thanks for the prompt response! --Robinson Weijman 15:16, 10 February 2009 (UTC)

Search Results without Wikicode/Wiki markup
Does anyone have experience in excluding wikicode / wiki markup from being displayed in the search results? I couldn't find anything on the Sphinx site. Thanks in advance, labalena 149.211.153.96


 * There is no easy way. Sphinx does not know anything about wiki markup - it can only be told to strip HTML when indexing. You could in theory keep a separate copy of active revisions, with wiki markup removed with some script, and index that instead of the real content. Svemir Brkic 02:32, 18 March 2009 (UTC)

REDIRECT
Q: How config sphinx to not search "Redirect page" in my search result I have REDIRECT Page title to redirect
 * Page title

A: You can modify the query in sphinx.conf to filter out any pages you do not want. Svemir Brkic 02:30, 18 March 2009 (UTC)

Q2: How can filter out all page contain this: "#REDIRECT [["

A2: Change this:

sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text \ FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id

To something like this:

sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text \ FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id and page_is_redirect=0

Svemir Brkic 17:39, 19 March 2009 (UTC)

Handling of HTML tags
There are problems with the current handling of HTML tags: The  tags that highlight the match are inserted into the result before the result is run through , requiring   to exclude the   tag. This has potential to cause problems, when  tags are used in Wiki pages.

Furthermore, strip_tags gets confused (and removes a lot of wanted content) by input like which is likely to appear on Wiki pages. --Patrick Nagel 09:48, 8 April 2009 (UTC)
 * 3<4
 * Run  on the shell to send an e-mail containing the contents of file text to who@ev.er.

In LocalSettings.php try adding $wgSphinxSearch_host = "127.0.0.1";

Sorting by namespace
Is this possible? I'm not able to find a clear way to do it in the documentation, but am looking for some way to put one of our existing namespaces at the top of all the other hits. Any ideas? As always, thanks in advance. CaliVW78 13:39, 18 May 2009 (UTC)
 * See above --Svemir Brkic 19:58, 12 December 2009 (UTC)

Pages made from templates/transcluded pages do not rank well
So far Sphinx search produces the best result of all the search engines I have tried. Recently I have noticed that some my templates and sub pages are appearing higher in the search results than the main page that includes them. I understand that the indexer does not parse any of the wiki text and only looks at a single page entity. Is it possible to specify groups of related pages so that my main page will contain all of the text of it sub pages?

For Example, if Page-main =>(includes) Page-info & Page-index then I want all the text on Page-info and Page-index to be included in the results for Page-main. I can even go as far as saying that it is a rule that pages have a strict naming convention -main -info -index

Or is there some way for the indexer to know that -index is 'linked to' by -main and include the results for -index in main.

Any suggestions are welcome.

Result weighting
Hello,

I have a little problem with the sortings in the result page.

For e.g. if I search for mysql I get every entry but the sorting is horrible. I have several pages with mysql in text and as part of the page_title.

I would like to have the page_title parts in front of the appearing in body results. I set the

in SphinxSearch.php.

I also tried sql_attr_uint  = page_title in my sphinx.conf. but didn't help at all to get a better result.

Settings in the sphinxapi.php: Matching mode is set to extended. Sort mode is set to relevance. Tried every type but this is the best so far. Group mode is set to SPH_GROUPBY_ATTR and ranker is the default one.

I would be very pleased if someone could help me.

Greetings,

Tom


 * sql_attr_uint is only for numeric values. It is used for filtering. After you adjust this and the weights, make sure to rebuild the index and restart the deamon, just in case. In our case, we always get the title matches first. Our weights are 'old_text'=>1, 'page_title'=>100, extended match mode, and we leave sorting and grouping at default (we do not set them at all.) Svemir Brkic 13:16, 8 September 2009 (UTC)


 * Hi, thanks for the quick answer. I turned it back to default. Rebuilded the index, after the changes but still for e.g. i get a page with the title Statistics infront of a page called MySQL5. It has something todo with the weight of upcomming words in body i guess but as I defined the weight of page_title higher than the old_text, i guess it should be vice verca. Greetings, Tom.


 * What versions of sphinx, the extension, etc. do you use? Svemir Brkic 03:12, 9 September 2009 (UTC)

Hi,

basic sphinx installer is sphinx-0.9.8.1. Extension Sphinx is SphinxSearch-0.6.1. PS: Mediawiki is version 1.15.0 Tom

SQLite Configuration
First off, many thanks for the extension. For me it is a real problem-solver.

With a few changes to the sphinx config file and a couple of helper scripts (php), SphinxSearch works with SQLite based MediaWikis, too. Thought this might be of interest, seeing as vanilla search does not work with SQLite in the current stable MediaWiki (1.15.1).

My setup:


 * MediaWiki 1.15.1
 * PHP 5.1.6
 * SphinxSearch 0.6.1
 * Sphinx 0.9.8.1

First, the helper scripts. These are php scripts that are run by the sphinx indexer and do the following:


 * Connect to the MediaWiki SQLite database file using php_pdo (which must be available to run MW on SQLite anyhow)
 * Run the indexer queries
 * Translate the results into XML for sphinx to process as an xmlpipe2 type source

These can go anywhere you want - I put mine in the ./data/ directory alongside the wikidb.sqlite file. Probably they should go in ./maintenance where all the other command-line PHP scripts are.

This is the main update script:

/path/to/wiki/data/sphinx_sqlite_main.php:

This is the incremental update script. The only difference is the extra AND clause in the query. Probably these should be combined, but I didn't want to bother with command line options. I'm lazy.

/path/to/wiki/data/sphinx_sqlite_incremental.php:

You can test these with php -e script.php and they should spit out XML.

We will use these to feed the xmlpipe2 sources in sphinx.conf. I chose to define the fields and attributes in the config file, though you can also do this in the XML itself. Here is what your source containers look like. These replace the MySQL ones of the same names.

sphinx.conf:

You will likely need to specify the full path to the PHP command line executable in the xmlpipe_command directives if this isn't in cron's exection path. (i.e., xmlpipe_command = /usr/bin/php -e [...]).

That's it. All else is as it appears on the Extension Page.

'''NB: This is functional on a dev box with a ten page wiki. It has not been production tested. That is your job .'''

-Jef (jef at lfaccess dot net - checked infrequently)

UPDATES -- Eshe 1/05/2011:

The following were modifications that I made in order to get the SQLite installation working correctly:


 * In sphinx_sqlite_incremental.php and sphinx_sqlite_main.php, you will need to update 2 areas of the code, see below with comments marked enpicket

Hope this helps anyone else who is using a SQLite MediaWiki Installation!

- Eshe (eshe dot n dot pickett at intel dot com)

Strange output
I'm running the SphinxSearch on a standard LAMP setup (CentOS 5.3, Apache 2.2.14, MySQL 5.0.77, PHP 5.2.11) with MediaWiki 1.15.1. Searches keep returning well formatted and readable page titles, but garbage for the excerpts, like: Has anyone else seen this or have an idea of where to look?


 * Do you have $wgCompressRevisions enabled? It looks like by default this extension sets Sphinx up to read text.old_text directly out of the database, which will not give useful results if you have any of our fancier storage features enabled. (Compressed revisions, batch compression, external storage, 'cur' table back-compat entries, legacy encoding back-compat entries, etc.) In this case you'd need to feed updates into Sphinx over an xmlpipe source or something... --brion 08:55, 6 November 2009 (UTC)


 * Excellent catch. I do have $wgCompressRevisions enabled.  As a result, I had tried altering the sql_query statements in sphinx.php to accont for that with " " to no avail.  What puzzles me is that the page titles come out just fine, but the full text excerpts don't.  How would I go about feeding updates into Sphinx over an xmlpipe since it's all in a MySQL DB?


 * Unfortunately, it is not that kind of compression that Brion is talking about. There are some comments above about using xmlpipe, but I did not play with those myself. There are also other ways you could try working around this. I would consider using the Manual:CompressOld.php script periodically, instead of enabling $wgCompressRevisions. That way current revision text will not be compressed and sphinx will work just fine, but you will still have old revisions compressed. Svemir Brkic 18:43, 5 December 2009 (UTC)

Display of invalid byte sequences
We are having some quite strange behaviour, old content (generated some time ago) is been index correctly and also shown as searchable text but new content that is created only shows as invalid byte sequences and we can't figure out what is the problem.

And yes, the demon is running (searches for old content are found and displayed correctly). sphinx.conf is maintained with charset_type = utf-8 and charset_table = for various languages such as CJK. We tried both SphinxSearch 0.9.9 and 1.10-beta with the same result, as shown below (what we see on the Special:SphinxSearch). U��n�0���~ >@j$M�n�����zن�EϲM�Bd���4^�w�)� ��` �4������ �ww����mtX s�6�n �,��Ѥ8�P��3 )�~�qX�������

We are running out of ideas in connection with what causes this behaviour. Help would be much appreciated. (System: MediaWiki	1.16.1 (r80998), PHP 5.2.13 (apache2handler), MySQL 5.1.44-community). We also created a SphinxSearch forum followup. --MWJames 23:30, 25 February 2011 (UTC)


 * We found out that as soon as $wgCompressRevisions = true; was used in LocalSettings.php, somehow it destructed to text generation and indexing in SphinxSearch. This might relate to the fact that $wgCompressRevisions need the PHP zlib/output compression. As for now we turned off this option--MWJames 12:39, 26 February 2011 (UTC)

Search Results within Interlinks counted as double
A rather philosophical questions, but if one uses interlinks (normal interlink, semantic interlink) such as Cateora, Philip R. then a search for the term Cateora would be counted twice in the search result screen.

The system configuration Sphinx Version 0.9.9, SphinxSearch Version 0.6.1, LightTPD 1.4.22, MediaWiki	1.15.1, PHP	5.2.9-1 (cgi-fcgi), MySQL	5.0.77-community-nt,


 * Default setup indexes the wiki source. You could use xml pipe or some other approach to index parsed articles. That would solve this and some other issues. I may look into that in the future. Svemir Brkic 14:52, 5 January 2010 (UTC)

Installing a different language (morphology) for sphinx
I'm trying to install a french stemmer: morphology=libstemmer_french as suggested here: http://www.sphinxsearch.com/forum/view.html?id=11#9507

Of course I downloaded libstemmer.c.tgz and extracted it to libstemmer.c.

I added --with-libstemmer to ./configure. All seems to have gone well (got lots of verbose when make entered this directory):

Making all in libstemmer_c make[1]: Entering directory `/home/inmdev/Downloads/sphinx-0.9.9/libstemmer_c' gcc -DHAVE_CONFIG_H -I. -I../config -I/usr/local/include -I/usr/include/mysql -Wall -g -D_FILE_OFFSET_BITS=64 -O3 -DNDEBUG -MT stem_ISO_8859_1_danish.o -MD -MP -MF .deps/stem_ISO_8859_1_danish.Tpo -c -o stem_ISO_8859_1_danish.o `test -f 'src_c/stem_ISO_8859_1_danish.c' || echo './'`src_c/stem_ISO_8859_1_danish.c mv -f .deps/stem_ISO_8859_1_danish.Tpo .deps/stem_ISO_8859_1_danish.Po gcc -DHAVE_CONFIG_H -I. -I../config -I/usr/local/include -I/usr/include/mysql -Wall -g -D_FILE_OFFSET_BITS=64 -O3 -DNDEBUG -MT stem_UTF_8_danish.o -MD -MP -MF .deps/stem_UTF_8_danish.Tpo -c -o stem_UTF_8_danish.o `test -f 'src_c/stem_UTF_8_danish.c' || echo './'`src_c/stem_UTF_8_danish.c mv -f .deps/stem_UTF_8_danish.Tpo .deps/stem_UTF_8_danish.Po [...]

After all this I try ./indexer --all --config ../../SphinxSearch-0.6.1/sphinx.conf and I get the following warning: WARNING: index 'wiki_main': invalid morphology option 'libstemmer_french' - IGNORED

According to libstimmer.c/libstimmer/modules.txt, the french module can be refered to by either french, fr, fre, fra... french UTF_8,ISO_8859_1 french,fr,fre,fra

Anyways, don't know where to go from here. Fabricebaro 19:01, 6 January 2010 (UTC)

PS: I posted this matter here too: http://www.sphinxsearch.com/forum/view.html?id=19#22615


 * What did you put in your sphinx.conf instead of "morphology = stem_en"? I would try "morphology = stem_fr" --Svemir Brkic 04:46, 7 January 2010 (UTC)


 * Sorry for the omission (I removed this info by accident). I tried:
 * * morphology = libstemmer_french
 * * morphology = libstemmer_fr
 * * morphology = stem_fr
 * * morphology = french
 * It should be libstemmer_french according to http://www.sphinxsearch.com/forum/view.html?id=11

Activating "Did you mean" with a French wiki
I installed SphinxSearch on a French wiki. When I activate "Did you mean" I get the following error (displayed at the top of the search result page): Also I do have some English in the wiki. Is it possible to use both languages for spelling suggestions ? Fabricebaro 18:42, 13 January 2010 (UTC)


 * You need to find out how to install French dictionary for pspell. If yuou actually have it installed, but the language code is not "fr" for some reason, you can edit that line in SphinxSearch_spell.php and change  to whatever it should be - even if you have to hard-code the string. Spelling suggestions in both languages would require some additional work in the same file - if you can find someone who knows some PHP... Svemir Brkic 01:07, 17 February 2010 (UTC)

Always add wildcard better automatically add wildcard?
The SphinxSearch-Extension is running nicely here on openSUSE with MW 1.15.1. I would love to be able to do the following:

Define a number x as a variable. After searching for "string" and the number of results is =< x, the search is automatically changed to "string*" and results are shown.

Alternatively: being able to switch on always adding a * to every search term. If I enter "string", "string*" will be searched.

Is it feasible?


 * One way to do this yourself is to find  in SphinxSearch_body.php and change it to   (of course, it would be better to first check if * is already in there, or if there are multiple words, or to use a preg_replace to do it conditionally, but I am kind of pressed for time right now...) Svemir Brkic 01:14, 17 February 2010 (UTC)

Cannot Find Wiki Main
I installed SphinxSearch but when I do:

$> /path/to/sphinx/search/ --config /path/to/sphinx.conf "search string"

I get an error saying:

index 'wiki_main' : search error: failed to open /var/data/sphinx/wiki_main.sph: No such file or dierctory

Did I miss something?


 * Perhaps some of the installation steps had errors you have not noticed. You need to check wiki_main actually exists. Maybe something is wrong in the sphinx.conf file. Hard to say without more information. Svemir Brkic 01:16, 17 February 2010 (UTC)


 * I am getting a very similar error as the one above. I am using exactly what other successful user have used in the sphinx.conf file, but I believe I must be missing whatever this wiki_main.spl is.  I know I don't have a file named this (maybe too new believing I should), figured it must come from somewhere.  So I believe my statement of /var/data/sphinx/wiki_main is wrong or it isn't and I'm at a loss for how to fix it.  Can anyone help?


 * The error is misleading, all you need to do is to create the mentioned directory, and the script will take care of the rest.
 * Jahângir

Is there a publicly available wiki with SpinxSearch installed?
I am looking into both SphinxSearch and LuceneSearch. I would like to try out both before installing one or the other. Is there a publicly available wiki with SpinxSearch installed on which I can try out some queries? Also, I need the following capabilities and wonder if SphinxSearch supports them: 1) ability to search pages for specific tags (e.g.,, ) ; 2) ability to index a site with on the order of 150,000 articles, and 3) exporting a list of pages (identified either by page name or some internal identifier like page_id) to an external file. Dnessett 18:07, 25 February 2010 (UTC)


 * NWE uses Sphinx. It has 107,378 total pages in the database, but less than 20,000 are actual articles. I see no reason not to be able to handle 150,000 or more, given enough disk space and memory. sphinx engine itself is used on some very big sites. You can search for tags, but it is ignoring special characters by default, so it will find the alphanumeric part only. Dealing properly with tags would require some customization, but should not be too hard if you know PHP. As for exporting, the extension now supports MediaWiki Search API, so you could provide a link that would open the api request in XML or text format, for example (or point it to a script that would format it further.) Svemir Brkic 18:30, 25 February 2010 (UTC)


 * Thanks. Dnessett 18:37, 25 February 2010 (UTC)


 * A follow-up question. I have looked through the sphinx and sphinx extension documentation, but could not find the answer to the following question. Suppose I want to index the same db twice, once using the standard character set and a second time using an enhanced character set including characters that are used in wikitext markup (e.g., "{", "}", "<", ">"). Is this possible? Would I need to run two instances of the sphinx daemon, each with a different configuration file and different directory targets to separate their activity? Or is there some way to utilize the same daemon to run two indexes? Dnessett 20:59, 25 February 2010 (UTC)

Create the page "I want to create a new page" on this wiki!
Can anyone tell me if it's possible to have SphinxSearch provide a link to create a new page if it doesn't find an existing page in the search results? I liked this feature in the default search engine. Thanks! --WilkBoy 14:43, 31 March 2010 (UTC)


 * If you press enter or click Go, a red link "create this page" will appear. If you click search, it will not. Svemir Brkic 18:55, 31 March 2010 (UTC)

hi - this doesnt work for me, i get at the top of the page, with no red link. would like to have the create new page on results page. Thanks! Selspiero

in MW 1.16
Upgraded my wiki to 1.16 (in DEV), and now when I don't get an exact match, I see at the top of the search results, instead of seeing "There is no page titled "Foo Bar". You can create this page."

Any ideas? Thanks! --John Thomson 21:17, 14 April 2010 (UTC)


 * Thanks for the report. That message was removed from MW in 1.16. The fix for this will be implemented in the next release of the extension. Svemir Brkic 00:23, 15 April 2010 (UTC)


 * Hi Svemir. Version 0.7.1, released in September of 2010, is still afflicted by this issue. --John Thomson 04:33, 6 March 2011 (UTC)

Temp Fix Between Releases
In the meantime I fixed this in my installation by modifying SphinxSearch_body.php $wgOut->addWikiText( wfMsg( 'noexactmatch', wfEscapeWikiText( $term ) ) ); needs to be changed to $wgOut->addWikiText( wfMsg( 'searchmenu-new', wfEscapeWikiText( $term ) ) );

Gomeztogo 00:00, 22 June 2010 (UTC)


 * Thank you, Gomeztogo! This did the trick for me!  Much appreciated! --John Thomson 04:33, 6 March 2011 (UTC)


 * This still isn't fixed, but the temp fix doesn't work for me until after adding 'searchmenu-new' as an additional entry in the $messages array in SphinxSearch.i18.n.php:

'sphinxPspellError'       => 'Could not invoke pspell extension.', 'searchmenu-new'		=> 'There is no article titled $1. You can create this page.'

Semantic Wiki and Sphinx
This topic is surely not on your urgent list, but semantic abilities for a Wiki become more and more import and we are using it extensively throughout our Wiki to build ontologies and give pages characteristics other than the standard. Do you have any plans to give some thoughts on how to integrate select statements for the Semantic Wiki Extension. We assume that pages with specific properties and keywords should be ranked higher in the search hierarchy than standard pages without those special classification. It could be a nice feature in comparison with other MW search engines.

Thanks, James


 * I am in the process of re-engineering some parts of the extension. I am making it use more of the standard MW search code (which improved significantly since this extension was first developed,) but I am also looking for more ways to make it better than the default. I will look into Semantic Wiki and how to integrate with it when available. Svemir Brkic 20:32, 17 April 2010 (UTC)

Compatible with $wgEnableMWSuggest?
Is this extension compatible with the $wgEnableMWSuggest = true; setting? I can't get that to work. If not, is it a planned enhancement? Thanks! Gomeztogo 23:06, 30 April 2010 (UTC)


 * It is not compatible right now, but it will be soon - probably by the end of May. Svemir Brkic 01:35, 1 May 2010 (UTC)

Search results and vector skin
Still not as important as the functional interface redesign but do you plan to adapt the display of search results and incorporating the new layout from the vector skin.

We are using 1.16 beta on with our test server and the sphinx search display is a bit out of sync with the vector skin such as (option boxes at the bottom, un-collapse option box, navigation to other pages at the bottom, search bar at the bottom, option such as content pages, image pages).

The sphinx works with our research wiki therefore not only results itself important but also the display of search results to ease navigation. Would be great if you could have look at it. Thanks. --MWJames 15:34, 3 May 2010 (UTC)


 * Next release of Sphinx will use much more of the existing MW classes, including display and pagination. Instead of implementing it as a special page and rewriting everything, most of it is now implemented as an extension of the existing search classes. The new search interface, image thumbnails are there by default now. I am just missing a few pieces before being able to release a candidate. The biggest missing piece is the category filter. I will try to commit a testable version to svn by the end of the week, with or without categories. Svemir Brkic 17:06, 3 May 2010 (UTC)
 * Do you have any news on this request ? --Toma007 14:09, 6 September 2010 (UTC)


 * Do you have any news on the integration/development in connection with 1.16 aka 1.17?--MWJames 15:41, 24 February 2011 (UTC)


 * For me it does not work with Vector but does work in the default skin...

Remote host doesn't work
You have a line in SphinxSearch.php: $wgSphinxSearch_host = 'localhost'; with no checking to see if this is overriding a user-configured variable, i.e. it needs - if (!isset($wgSphinxSearch_host)) - around it. I had to remove this line to get it to work. DimeCadmium 22:04, 19 June 2010 (UTC)


 * Please use the latest version of the extension and set these variables after you include SphinxSearch.php Svemir Brkic 17:18, 13 November 2010 (UTC)

Problems with message handling in MediaWiki:Noexactmatch and MediaWiki:Searchmenu-new
Did you plan to use the MediaWiki:Searchmenu-new to show search messages one's the search has failed. We actual would need this, because we would create new pages not just as a red link but through so called Semantic Forms which allows higher data integrity and a predefined set of categorized data. The MediaWiki:Searchmenu-new would give a possibility to customized and enter those defined forms. --MWJames 00:26, 26 June 2010 (UTC)

After further testing here is the result and its seems that message is still coming from MediaWiki:Noexactmatch but the problem is that in case that we render a link through a form statement, the standard search interface would allow an URL to be rendered into the message.

But when we switch back to SphinxSearch than we don't get an rendered URL, only a message such as  Chapter Form (In the standard search their would be a link instead of this message). Even though we are using something like the standard interface reacts as it should but the SphinxSearch interface interpret something else so that we can not create a dynamic URL. It seems that their is a problem with handling some html input as indicated by <a href="/a/index.php... --MWJames 22:14, 26 June 2010 (UTC)

sphinxsearch in Debian and Ubuntu -- api missing
Debian and Ubuntu now package sphinxsearch; 0.9.9 comes with Sid and Maverick. This of course makes things simpler: no need to download source and build, and the daemon is already configured in sysv.

We still need to copy the sphinx.conf file from this project -- it should be placed in /etc/mediawiki -- along with the rest of the extension. I had to comment out the ports (and add prefixes); then it worked fine.

Upgrades of sphinxsearch within Debian/Ubuntu should work without changing the configuration.

The only problem is the API file, sphinxapi.php. It is not in the binary package. If you download the latest from upstream, you get the dreaded version mismatch error. I haven't tested this in detail, but found that I could use a sphinxapi.php from a sphinxsearch installation from last fall for the 0.9.9.6 version in Ubuntu Maverick. I don't really see an elegant solution -- maybe we should request Debian to package it in the binary deb.

Liontooth 20:14, 5 July 2010 (UTC)

Category filter
I just installed sphinx search on MW 1.17a and it works fine.. but there is no category filter. Is that an option one could enable somewhere? Or is it currently blank? Or am I just experiencing a bug?

Feature request / Using the search function in templates
Maybe a bit far fetched, but maybe you could provide a search function such as {#ssearch } that can be called in a template and return the results as ul or ol list where the amount of search results is limit by a variable parsed through the function. --MWJames 05:33, 24 July 2010 (UTC)

Where defines these hooks?
in the sphinxsearch_body.php i found these codes:

wfRunHooks( 'SphinxSearchFilterSearchableNamespaces', array( &$namespaces ) ); wfRunHooks( 'SphinxSearchGetSearchableCategories', array( &$categories ) ); wfRunHooks( 'SphinxSearchGetNearMatch', array( &$term, &$t ) ); wfRunHooks( 'SphinxSearchBeforeResults', array( wfRunHooks( 'SphinxSearchBeforeQuery', array( &$this->search_term, &$cl ) ); wfRunHooks( 'SphinxSearchAfterResults', array( $term, $this->page ) );

i looked up every files of sphinxsearch, bus still can not found the functions such as "SphinxSearchFilterSearchableNamespaces" "SphinxSearchBeforeQuery", i wang to know how them works, but i can't find the function definition.

who can help me, many thx! 121.8.153.6 09:39, 30 July 2010 (UTC)


 * These are all optional hooks, so they are not defined by default. Some of them will be deprecated soon, as standard MW search-related hooks will be available in this extension as well. Once that is cleaned up, I will document any sphinx-specific hooks that remain. For now, just see where they are called and what arguments they receive. That should give you an idea about what you may be able to do with them. Svemir Brkic 15:38, 19 September 2010 (UTC)

Can't get negation to work, always finds all words
I'm struggling to get the negate search option to work, i.e. searching for "word1 -word2" still shows hits for both words, instead of those with only word1. The search CLI tool works fine, though.

I have already tried the different search modes ($wgSphinxSearch_mode: SPH_MATCH_EXTENDED, SPH_MATCH_EXTENDED2, SPH_MATCH_BOOLEAN). SphinxSearch extension 0.7.0 and Sphinx 1.10-beta (r2420) (but same behavior with 0.9.9-release (r2117)), MediaWiki 1.12.0

Any help would be greatly appreciated! --Mmaddin 08:18, 5 August 2010 (UTC)


 * There is definitely something broken now, I also can't use quotes for a phrase match ("word1 word2") - it always finds OR match, or, if I choose "match all words" via the radio button, it finds AND match. I need to make AND match the default, but there is no option, except setting $wgSphinxSearch_mode to SPH_MATCH_ALL - but then there is no radio button anymore (for the rare cases where an OR match is useful). There is not much missing for this extension to be a really great alternative to the built-in search - but right now there is a number of small but annoying (maybe cosmetic, but still important) bugs, that really need to be fixed. --Patrick Nagel 02:03, 31 August 2010 (UTC)
 * Shortly after posting this, I found $wgSphinxMatchAll in the code (can't find documentation anywhere), which you can set to true in LocalSettings.php. Then the radio buttons are there, but "match all words" is selected by default. --Patrick Nagel 02:06, 31 August 2010 (UTC)

Access denied
I'm getting this error message when trying to search in MW: Query failed: connection to 127.0.0.1:9312 failed (errno=13, msg=Permission denied). Deamon is running and responding to search from term.

System: MediaWiki 	1.16.0 PHP 	5.3.3 (apache2handler) MySQL 	5.1.49 SphinxSearch (versjon 0.7.0) Sphinx Engine: Sphinx 0.9.9 (r2117; Dec 02, 2009) CentOS 5

From startup file: [Thu Aug 5 14:47:37.533 2010] [ 8444] using config file '/etc/sphinx/sphinx.conf'... [Thu Aug  5 14:47:37.533 2010] [ 8444] listening on 127.0.0.1:9312

Solved: Turn of SELinux. I'm leaving this post if anyone else should run into this problem themselves. Alternate solution: leave SELinux running, and run this command: setsebool -P httpd_can_network_connect 1 which will allow httpd to make network connections.

Project still alive?
I hope this project is still active. Many of the threads above mention a new build possibility in May of 2010, but the only build still available is from Feb.

This extension is great, but some missing features are holding me back from deployment. -  Gomeztogo 01:06, 18 August 2010 (UTC)


 * Apologies to everybody who has been waiting for the new release. I am about to start working on the extension again. --Svemir Brkic 02:06, 11 September 2010 (UTC)

successfully controls the search results according to user rights
i have an extension named rarc, which builds a category and uses the same name as the name of a user group. By adding only users who belongs to the xxx user group can see this page. At first, the sphinxsearch do not care of this privilege rules, still displays the search results. With the recommendation of the sphinxsearch author, i build 5 indexes to implement this. wiki_main --includes everyting expect todays new articles wiki_incremental --index automatically every 2mins, just today's new articles, and merge at night with the wiki_main wiki_small_main --some thing like wiki_main, but exclude the private articles wiki_small_incremental --some thing like wiki_incremental, but exclude the private articles wiki_private --every private articals of the wikisits (due to small amount of articls, no need to build the incre+merg mechanism )

modified the codes, if a user belonged to xxx user group, then search with (wiki_small_main wiki_small_incremental wiki_private ) if it is a normal user, the searchlist is (wiki_small_main wiki_small_incremental)

the wiki_main is uesed to display the breif cntent of the article according to the page_id.

can only one tell me where defines these hooks such as 		wfRunHooks( 'SphinxSearchFilterSearchableNamespaces', array( &$namespaces ) ); wfRunHooks( 'SphinxSearchGetSearchableCategories', array( &$categories ) ); wfRunHooks( 'SphinxSearchGetNearMatch', array( &$term, &$t ) ); wfRunHooks( 'SphinxSearchBeforeResults', array( wfRunHooks( 'SphinxSearchBeforeQuery', array( &$this->search_term, &$cl ) ); wfRunHooks( 'SphinxSearchAfterResults', array( $term, $this->page ) );

i use powergrep to search in the directory of /extesnion/sphinxsearch and /sphinx, both of them don't contains only of these "SphinxSearchFilterSearchableNamespaces"  or 'SphinxSearchGetSearchableCategories' 121.8.153.6 08:23, 26 August 2010 (UTC)

undefined function wfLoadExtensionMessages with install on MediaWiki 1.10.1
I am attempting to install sphinx search extension 0.7 on MediaWiki 1.10.1, but I get the error: undefined function wfLoadExtensionMessages.

How can I resolve this without upgrading mediawiki?


 * For older versions of MW you need to use older versions of the extension. They are available at SourceForge --Svemir Brkic 02:03, 11 September 2010 (UTC)


 * One thing you could try, if you downloaded 0.7 from SourceForge, is to download the latest build instead. Svemir Brkic 00:23, 13 September 2010 (UTC)

runaway strikethrough in search results
Sometimes in the middle of my search results, in the middle of an excerpt, the text changes to strikethrough and this continued to the end of the page. Somehow this was caused by embedded hyphens in the excerpt. I fixed it by stripping hyphens out of the excerpt before it gets formatted, in SphinxSearch_body.php line 424, with this change (added "\-" to the characters):

$entry = preg_replace('/([\[\]\{\}\*\#\|\!\-]+|==+)/',       ' ',        strip_tags($entry, '  ')

I don't understand exactly what happened here, but most likely this is why. My page names are not standard mediawiki usage, but rather lower case words separated by hyphens, like .../wiki/my-page and .../wiki/another-page. I'm also using the semantic mediawiki extensions so there is additional metadata embedded in the wikitext.

The generated html where the strikethrough began looked something like this - the " " tag gets split somehow and we unfortunately end up with an " " tag (after ...issues-in-)

modernities-issues-in- pan style='color:red'&gt;culture-and 10   The

which if I put here as wikitext yields

modernities-issues-in- pan style='color:red'&gt;culture-and 10   The

and oh no!! that continues until I close it with an " " tag.

Thanks for a great extension! The results are SO much better than the default search! Initialcapsarestupid 18:18, 13 September 2010 (UTC)

Version 0.7.1 Released
This minor update contain various fixes and tweaks that were made in recent months. It has been uploaded to SourceForge repository - to avoid complaints about bugs that have already been fixed in MW repository. This it not the big update I was talking about, but I am working on that too and it is looking good... --Svemir Brkic 02:12, 15 September 2010 (UTC)

Curious about Searching Syntax
Do boolean operators, ordering, phrase proximity work with this extension? We've installed the extension, but it does not appear to work.--Alanna.macnevin 14:30, 20 September 2010 (UTC)


 * Please download the latest snapshot (trunk, version 0.7.2) from MW (not from SourceForge) and set $wgSearchType to 'SphinxMWSearch' in your LocalSettings file. Also, make sure to update your sphinx.conf (page_is_redirect is being indexed now) and reindex the wiki. This search type uses the new approach that lets MW handle the interface. It still misses some features from the old version, but they will be added soon, and maybe you do not care about those anyway. Svemir Brkic 19:11, 20 September 2010 (UTC)

Did you mean? pigtailed pigtailed pigtailed Aggregate ....No.
For some reason when I try to search part of a title the search will not locate the article. Instead it preforms a full-text search omitting article titles and asks if I meant "pigtailed Keyword Searched". The same thing happens no matter the parameters I set (i.e. match titles only etc).--Siadsuit 18:37, 28 September 2010 (UTC)

possible to install without compiler?
Hi, in my hosting plan, the compiler is disabled. I'm wondering if i can install sphinx without using a compiler? Is it possible for me to compile it locally, and just upload and run? --Wmama 05:06, 29 September 2010 (UTC)

Won't return some obvious results
Hello. I've got Sphinx 0.9.9 working in MW 1.16.0 with the latest extension (i think). I just noticed today that some obvious should-be-hits aren't making the results (ie. search for 'floor' gets many instances but misses it in a title and an internal link that I know are there!). The closest similar errors I see look like one that won't return link hits and another where sphinx.conf was interpreting underscores wrong. Any other ideas as to what might cause this? I rebuilt the index twice! - Lucas Billett 21:27, 6 October 2010 (UTC)


 * Ok. Weird. I tried setting wgSearchType = 'SphinxMWSearch' as mentioned above and the problem went away. Not sure why that would happen? I like the idea of the search output being packaged in MW. Nice work. However I get an 'undefined index:matches' error on line 189 of SphinxMWSearch.php when it doesn't find an exact match anywhere. I tried surrounding the culprit statement with an if(isset) and it went away... Not sure what I might have broken though. - Lucas Billett 12:33, 7 October 2010 (UTC)


 * Thanks for isset catch Lucas - it has been committed in revision 76659, along with some other tweaks - will post about that in a minute. Svemir Brkic 15:24, 14 November 2010 (UTC)

"Table 'mywiki.documents' doesn't exist" issue
Hi,

I've installed Sphinx in Windows 2003, to index a Mediawiki page, under XAMPP server.

When I configure the Sphinx.conf.in file, and run the indexer.exe, I've this:

C:\sphinx\bin>indexer.exe --config c:\sphinx\sphinx.conf.in --all Sphinx 0.9.9-release (r2117) Copyright (c) 2001-2009, Andrew Aksyonoff

using config file 'c:\sphinx\sphinx.conf.in'... indexing index 'test1'... ERROR: index 'test1': sql_query: Table 'mywiki.documents' doesn't exist (DSN=m ysql://root:***@127.0.0.1:3306/mywiki). total 0 docs, 0 bytes total 0.008 sec, 0 bytes/sec, 0.00 docs/sec indexing index 'test1stemmed'... ERROR: index 'test1stemmed': sql_query: Table 'mywiki.documents' doesn't exist (DSN=mysql://root:***@127.0.0.1:3306/mywiki). total 0 docs, 0 bytes total 0.001 sec, 0 bytes/sec, 0.00 docs/sec distributed index 'dist1' can not be directly indexed; skipping. total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg total 0 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg

In phpmyadmin I can see that "documents" table doesn't exists.

What is the table taht I have to configure?

Thanks in advanced.


 * Isn't sphinx.conf.in a sample configuration file that comes with sphinx? You sure you updated and are using the sphinx.conf file that came with the extension? - Lucas Billett 11:53, 12 October 2010 (UTC)

Templates / Transclusion - returns only template not users
If I have a template called Cereals with the contents "Rice, Wheat, Rye", and then I use that template in an article, and then search for "Wheat" I only get the template, not the page using the template. I guess this makes sense since MW doesn't store the rendered content, it expands the templates on the fly when the page is rendered.

I was thinking to have sphinx index all the rendered pages instead, any ideas as to how? --72.148.136.13 23:01, 12 October 2010 (UTC)


 * ARGH! Yes, this is going to be the only way. I'm going to have to use xmlpipe and write a shell script (or php?) to get every page, wrap it inside of xmlpipe tags and dump it into the indexer. Argh, argh, argh. The good news is I get to spend the next few days at work hacking php :) Or switch to Lucene, but everyone says Lucene doesn't work on Windows, and my box is a WAMP.

Query failed: no enabled local indexes to search
Hi,

Pls help! Sphinx Search Problem - Query failed: no enabled local indexes to search I am able to run indexer in console and do a search in console. i am able to see the results.

i have started the daemon and it is running. When i use Special: Sphinx Search: it says "Could not instantiate Sphinx client. " And then when search for a word : it says "Query failed: no enabled local indexes to search "

If anyone have an idea, what mistake i am doing, pls let me know.

Some details :

Mediawiki 1.15.3

XAMPP 1.7.2

Windows Xp SP2

Sphinx 0.9.9 (win32)

SphinxSearch 0.7.1

Thanks in Advance.

--Ramesh

I have the same issue, any help out there? Using sphinx-0.9.9, Mediawiki 1.16

I appear to have resolved this by moving XAMPP into a folder different from Program Files. --Nate

--193.16.163.244 14:14, 26 October 2010 (UTC)

Proximity search not working
I have set $wgSphinxSearch_mode = SPH_MATCH_EXTENDED2;

in LocalSettings.php, but proximity searches do not seem to be working:

Query failed: index wiki_incremental,wiki_main: syntax error, unexpected '~', expecting $end near '~2'

I tried putting a backslash before the ~, e.g.,

"search terms"\~2

instead, I just get results similar to an "or" query, which is not what I want. Any suggestions?

This has been an issue since SphinxSearch 0.7 - now using 0.7.1 with Sphinx 1.10-beta. (same issue with 0.7 and sphinx 0.99)

Update: SphinxSearch 0.6.1 works perfectly with sphinx 1.10-beta, MW 1.15.5 - all SPH_MATCH_EXTENDED2 syntax works.

--Fungiblename 03:50, 12 November 2010 (UTC)


 * Would you please try the latest build instead of 0.7.1? Set $wgSearchType to 'SphinxMWSearch' in your LocalSettings file. Also, make sure to update your sphinx.conf (page_is_redirect is being indexed now) and reindex the wiki. Svemir Brkic 15:48, 13 November 2010 (UTC)
 * User error: Thanks Svemir; I tried all of that, but unfortunately, this returns no results at all for any query and takes me to the default MediaWiki search page ("Search results" at top of screen, rather than "Search wiki using Sphinx" - although it looks like you're trying to do away with a dedicated special page in SphinxMWSearch.php).
 * I tried changing $wgSearchType back to SphinxSearch, but SPH_MATCH_EXTENDED2 still throws the same error. There is a small change from 0.7.1: If I escape the "~", the number at the end gets treated as an "AND" part of the query, and the terms between quotes get treated as an AND, so the query only returns pages containing "search" AND "terms" AND "2", which is not expected/desired behavior. 0.6.1 still works quite well, but I will keep testing new SphinxSearch versions as you release them. I hope that some of this helps you pinpoint the regression between 0.6.1 and 0.7+ for handling SPH_MATCH_EXTENDED2 syntax. Unfortunately, I really do not know my way around PHP that well, so I am sorry that I cannot be too helpful in debugging. I am sorry that I forgot to mention this, but I am running SMWHalo 1.5.1, if this is relevant at all. --Fungiblename 01:55, 14 November 2010 (UTC)
 * Proximity search works fine for me with the latest build and SphinxMWSearch. It is supposed to say "Search results" at top of screen in the new mode. Please post or send me all the lines from your LocalSettings file that deal with SphinxSearch.
 * As for the old version working, that is only if your quotes are matching. If your search term contains only one double quote, you will get an error. Fixing that for 0.7 broke proximity searching. In 0.7.2 (latest build) this is addressed in a better way, but only for SphinxMWSearch. 173.72.163.240 03:25, 14 November 2010 (UTC)
 * User error: I made a small mistake in my configuration and forgot to copy over the "sql_attr_uint = page_is_redirect" statement from the attribute columns portion of the sphinx.conf template into my conf file (I did pull in the new wiki_main and wiki_incremental queries). The proximity search now returns valid results, thanks! The results still highlight the proximity number as a search term, e.g. for a search

"search terms"~2"
 * The results look like: search something terms some other text ... 2
 * Now, the number is treated like an OR, rather than an AND, so valid pages get returned, but the proximity parameter highlighting might be distracting for some users. One other note: If there is a wildcard inside a proximity search query, the wildcarded term will not get highlighted in the search results. Example:

"sear* term"~2
 * returns: text of some length search term some more text 2.
 * Using wildcards on both terms completely disables term highlighting in search excerpts (except for the proximity parameter), but the search results return valid pages. FYI, here are my LocalSettings parameters for Ubuntu 10.04 (I have some custom file locations and port settings):

$wgSearchType = 'SphinxMWSearch'; require_once( "$IP/extensions/SphinxSearch-r75726/SphinxSearch.php" ); $wgSphinxSearchPersonalDictionary = dirname( __FILE__ ). "/path/to/wordlist/wordlist.en.pws"; $wgSphinxSearchExtPath = '/web/accessible/path/to/SphinxSearch-r75726'; $wgSphinxSearch_port = 10101; $wgSphinxSearch_mode = SPH_MATCH_EXTENDED2; $wgSphinxSuggestMode = true; $wgSphinxSearchPspellDictionaryDir = "/usr/lib/aspell"; $wgSphinxSearchAspellPath = "/usr/bin/aspell";


 * Thanks for building this incredible tool - it has made my installation so much more useful! --Fungiblename 14:44, 14 November 2010 (UTC)

I'm seeing this error with MediaWiki 1.16.2, Sphinx 0.9.9, and SphinxSearch 0.7.1. (Yes, I'll try the latest build next as rec'd above....) For instance a search for

"web address"~2

Presents

Query failed: index wiki_incremental,wiki_main: syntax error, unexpected '~', expecting $end near '~2' This is with Sphinx set to be the default search method. --Whit 20:30, 1 April 2011 (UTC)

Nope, the latest build doesn't fix it. At least it doesn't with "$wgSearchType = 'SphinxSearch';", and with "$wgSearchType = 'SphinxMWSearch';" it fails with

PHP Fatal error: Cannot call constructor in /var/www/wiki/extensions/SphinxSearch/SphinxMWSearch.php on line 23, referer: http://192.168.1.229 /wiki/index.php/Main_Page --Whit 20:43, 1 April 2011 (UTC)

Search suggestions
Latest SVN build has improved search suggestions for SphinxMWSearch - using enchant library and a script that creates a "dictionary" based on the frequency of words in your wiki (so it will never suggest a word that is not actually in the wiki.) I decided to use enchant because direct support for aspell/pspell is being dropped from PHP (at least on Windows) and I could not even install them on my box with php 5.3.

To try it, install enchant extension, run SphinxSearch_setup.php to create a dictionary, and set $wgSphinxSuggestMode to 'enchant'. Without enchant, you can set $wgSphinxSuggestMode to 'soundex' and it will use mysql's SOUNDEX to find wiki articles that sound similar to the search query. Svemir Brkic 15:35, 14 November 2010 (UTC)

Old method is currently still supported, but only via aspell binary, not via pspell extension. Set $wgSphinxSuggestMode to 'aspell' for that. Svemir Brkic 18:49, 7 September 2011 (UTC)

Enchant
Enchant is a library that provides a generic interface to third-party spell-checking APIs developed by $wgSphinxSuggestMode = 'enchant';

Create a dictionary
Execute SphinxSearch_setup.php to create necessary dictionary files sphinx.dic and sphinx.aff.

Using Enchant with PHP on Windows
See information about Using Enchant with PHP on Windows

Soundex
According to Wikipedia, soundex is a phonetic algorithm for indexing names by sound. MySQL uses soundex as function to determine and compare two strings that sound almost the same and should return identical soundex strings. $wgSphinxSuggestMode = 'soundex';

Aspell
$wgSphinxSuggestMode = 'aspell';

What is Sphinx, What Does it Do, and Why Would You Use It?
Could someone write a quick introduction section to this article?

What is SphinxSearch, what does it do, and why would you use it?

Would be nice. Right at the top. All obvious and helpful-like.

Oh
UPDATE: Holy cow! It exists! But it's four sections deep and it's titled "Description"!

Okay.


 * Good point. I removed old news items that got accumulated above description over time. Svemir Brkic 18:19, 24 November 2010 (UTC)

Bug fix
If you get this error when you search only on a stop word: Undefined index: words near line 446

Here is what I did to fix it:

Problem with SphinxMWSearch
Hi, I am trying to use SphinxSearch (Snapshot 80923) with MW 1.16 on a Suse Linux Enterprise Server 10. I included Sphinx Search the following way: But I get the following Error PHP Fatal error: Can not call constructor in /extensions/SphinxSearch/SphinxMWSearch.php on line 22 Can anyone help me? Am I missing another Extensions?

Solution
There is no __construct function in the SearchEngine class. Change the following code in SphinxMWSearch.php:

function __construct( $db ) { parent::__construct( $db ); }

To this:

function __construct( $db ) { $this->db = $db; }

This seems to allow $wgSearchType = 'SphinxMWSearch'; to work properly. Hope this helps!

--Jlemley 23:27, 3 June 2011 (UTC)

Search weights Problem
Im running SphinxSearch 0.9.9 on mediawiki 1.16.

It works fine (at least it looks like). but when the change on my localsettings to "$wgSphinxSearch_weights = SPH_MATCH_PHRASE;" for example, the search stops working (in fact, if i define any value to $wgSphinxSearch_weights the search stops working).

anyone has any idea what it could be?

Thanks in advance,

- Miguel (20-02-2011)

Categories under "Search in categories" appear in all caps with recent MediaWiki
I've updated MediaWiki to 1.18alpha (current SVN trunk), and since then, SphinxSearch's category list under "Search in categories" looks ugly, since all categories are in all capital letters. There is no problem with functionality, it's just a cosmetic issue.

The following patch (changing nothing more than getting the category names from the page_title column instead of the cl_sortkey column) corrects the issue for me, but I don't know if it breaks SphinxSearch on older MediaWiki versions, please test:

Index: SphinxSearch_body.php

=
====================================================== --- SphinxSearch_body.php      (revision 83693) +++ SphinxSearch_body.php      (working copy) @@ -101,7 +101,7 @@                               array( 'ORDER BY' => 'cl_sortkey' ) );                       while ( $x = $dbr->fetchObject ( $res ) ) { -                               $categories[$x->cl_from] = $x->cl_sortkey; +                               $categories[$x->cl_from] = $x->page_title;                        }                        if ( $cache_key ) {                                # cache query results for a day

--Patrick Nagel 06:02, 11 March 2011 (UTC)

Search string with an apostrophe
Using the a search string that contains an apostrophe, like someone's search, SphinxSearch will eventually cut off the rest after the apostrophe in the search input field. We are using 0.7.2 SphinxSearch, MW 1.16.1. --MWJames 01:29, 13 March 2011 (UTC)
 * I can confirm that on SphinxSearch 0.7.2 and MW 1.18alpha. The search works correctly, but the text that appears in the search input field below the search results (if any), contains only the text up to the first apostrophe. --Patrick Nagel 05:12, 13 March 2011 (UTC)

Call to undefined method SphinxSearch:transformSearchTerm
Mediawiki 1.16.4, SphinxSearch Extension HEAD as of 11/05/2011. Special page sphinx search works, trying to use search on left-hand side provides the error: Fatal error: Call to undefined method SphinxSearch::transformSearchTerm in  C:\xampp\htdocs\includes\specials\SpecialSearch.php on line 127 Fixed, this happens if you declare $wgSearchType after the require_once line this error will occur.

Sphinx 2.0.1 and compat_sphinxql_magics
While testing 2.0.1, an issue appeared when compat_sphinxql_magics is set to 0, while compat_sphinxql_magics = 1 still works. Changes in Sphinx 2.0.1 seems to create problems with the current SphinxSearch (Version 0.7.2).

Testing environment: MediaWiki	1.16.1 (r80998), PHP	5.2.13 (apache2handler), MySQL	5.1.44-community, SphinxSearch (Version 0.7.2). --MWJames 09:45, 18 May 2011 (UTC)

Sphinx 2.0.1 and Real-time indexes
Since 1.10-beta/2.0.1, Sphinx can create Real-time indexes that allows instant search results. Does anyone has some experience how to get this working with SphinxSearch?--MWJames 09:43, 18 May 2011 (UTC)


 * RT indexes still lack infix/prefix support and multi-value attributes. Once RT indexes in sphinx engine mature a bit more, I can look into it again. Svemir Brkic 12:19, 4 September 2011 (UTC)

Query failed: failed to read searchd response
When I do a search, I get

Query failed: failed to read searchd response (status=2613, ver=11825, len=775238701, read=73)

No idea what's going wrong.

-- Jeroen De Dauw 17:18, 25 June 2011 (UTC)

Default Namespaces
How can you change which namespaces are searched by default when Sphinx Search is run? I'd like it if the search would automatically include category pages as well as standard articles.

194.98.70.14 08:37, 19 August 2011 (UTC)


 * You should just need to adjust $wgNamespacesToBeSearchedDefault. —Emufarmers(T 02:27, 21 August 2011 (UTC)

Only strange characters with new pages
Since a few weeks now with experience a strange behaviour. When using sphinxsearch we get on the resultpage strange characters. Before we had never a problem with sphinsearch No changes were made in the entire installation of mediawiki and all extensions we use. example: +I-.I��K�(���J�L�PH�


 * M.W.: 1.15.1
 * PHP : 5.3.5
 * MySQL : 5.5.8
 * SphinxSearch: Version 0.7.0

Have you experienced this kind of behaviour ? --217.149.129.145 07:32, 25 August 2011 (UTC)
 * We had similar issues when our cache settings included $wgCompressRevisions = true; (LocalSettings.php); see also --MWJames 15:31, 25 August 2011 (UTC)

SphinxMWSearch 0.8; context per line parameter doesn't work
We already got the new SphinxSearch 0.8 from SVN, and started testing the interface on a MW 1.17 system. We found that the limit of characters customized in a User personal preference does not work. We changed it from 200 to 400 to 800 but the display of output characters is not changing at the result page. --MWJames 06:42, 7 September 2011 (UTC)


 * This was caused by a change in MW itself - at some point someone decided to hard-code those numbers in SearchEngine class itself. We added our own method to override that in SphinxSearch in r96435 (as long as you have $wgSphinxSearchMWHighlighter set to false.) Svemir Brkic 14:11, 7 September 2011 (UTC)

SphinxMWSearch 0.8; Doesn't show total of result hits
In earlier versions of SphinxSearch, the total results where shown, so an user had an orientation on the amount of hits. Now the standard MW search screen depending on the select view 20, 50 etc. shows only the notice of "Showing below up to 20 results starting with #1" but one has no information about the total amount of hits available. --MWJames 06:48, 7 September 2011 (UTC)


 * Thanks. This has been corrected in r96431 Svemir Brkic 13:38, 7 September 2011 (UTC)

SphinxMWSearch 0.8; Highlighting of non-Latin characters doesn't work
While testing the new interface we encountered a problem with non-Latin characters (we tested Japanese and Chinese) and the highlighting of the search term on the output page. In earlier versions the search term was highlighted in the result output (Latin as well as non-Latin characters), now in the standard MW search results are displayed but without highlighting the particular search term (it does work with Latin characters). --MWJames 07:03, 7 September 2011 (UTC)


 * What is the value of $wgSphinxSearchMWHighlighter in your case? Svemir Brkic 13:43, 7 September 2011 (UTC)


 * We did a combined search (a Latin character word together with a Japanese character) with alternating settings $wgSphinxSearchMWHighlighter = true; and $wgSphinxSearchMWHighlighter = false; (also with $wgAdvancedSearchHighlighting = true set), the result page would show results with both terms, but only the Latin character word would get highlighted.

SphinxMWSearch 0.8; Search display behaviour different in 1.18alpha
We tested SphinxMWSearch 0.8 in 1.18alpha (r96396) and recognized that the display behaviour have changed. This might be related to the fact that the search string works now with &profile=all&redirs=1 parameter instead of listing every single namespace as in 1.17. The 1.18 search-URL looks like title=Special:Search&search=<...>&fulltext=Search&profile=all&redirs=1. Also see Manual:Hooks/SpecialSearchSetupEngine and Manual:Hooks/SpecialSearchProfileForm --MWJames 08:26, 7 September 2011 (UTC)

SphinxMWSearch 0.8; searchInput box does not initiate the search
Testing 1.17, Vector skin (Extension:Vector and $wgVectorUseSimpleSearch = true; ) together with SphinxMWSearch 0.8.

Using the searchinput box (right corner) will redirect to the MW Special:Search and generate an search-URL similar to Special:Search&search=<...>&fulltext=1. This particular string will not initate a search in sphinx, only after pressing the search button on the MW Special:Search page the search-URL would change to something similar like Special:Search&redirs=0&search=<...>&fulltext=Search which initiates the search in sphinx and returns with results. --MWJames 08:54, 7 September 2011 (UTC)


 * Thanks for the testing - even though I am not done with 0.8 yet. I am committing changes to trunk in stages to make it easier to keep track of what changed and why. I will try to verify or fix these issues as I do so. Svemir Brkic 11:02, 7 September 2011 (UTC)

SphinxMWSearch 0.8; $wgSphinxSuggestMode = 'enchant'; forces Fatal error
Using the setting $wgSphinxSuggestMode = 'enchant'; forcing a Fatal error: Call to undefined function enchant_broker_init in ...\extensions\SphinxSearch\SphinxMWSearch.php on line 233 --MWJames 18:22, 7 September 2011 (UTC)


 * Fixed in 96467. Thanks! Svemir Brkic 18:54, 7 September 2011 (UTC)