Extension talk:SphinxSearch/Archive/2012

From mediawiki.org
Latest comment: 11 years ago by Teststudent in topic Faulty Search Results

Tips[edit]

Adding search inputbox to a page[edit]

Not related to this extension but since 0.8+, SphinxSearch is supporting the standard MW Search interface and by using the InputBox extension one can setup a search input box per page. The Inputbox will redirect to the MW search and display results if found. For details on how to use the InputBox extension, please see examples provided on that particular page.

How to avoid redirects in the result display[edit]

You have to modify your sphinx.conf configuration file and add the parameter page_is_redirect=0 to your sql_query statement. Depending on your sql_query configuration, statements may vary.

Before

sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text \
  FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id

After

sql_query = SELECT page_id, page_title, page_namespace, old_id, old_text \
  FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id and page_is_redirect=0
More recent versions of the extension have a default sphinx.conf that collects page_is_redirect as an attribute that is used in filtering, the same way MW search works in general. This approach should be used only if you never ever what to see redirects in search results. Svemir Brkic 17:54, 16 September 2011 (UTC)Reply

Working well in MW1.5![edit]

Everyone I have spoken to that uses our internal Wiki has nothing but positives to say about this. I really think MediaWiki should adopt Sphinx as the DEFAULT search, as the bundled one is so bad. --195.75.83.25 08:45, 6 July 2009 (UTC)Reply

Thanks! Note that the next release will not use ExtensionFunctions, and it will require at least MW 1.7 (and PHP5 in general.) Also note that it was never intended for MW <1.9 - no idea how you got it to work with 1.5 :-) Svemir Brkic 19:45, 12 December 2009 (UTC)Reply

Running on separate machines[edit]

My wiki is on shared hosting, so I can't install Sphinx on its server. I run the search backend (presently Lucene, but Sphinx sounds promising) on a machine in my home. With Lucene, my backend machine SSHs into the webserver, grabs a dump of the wiki, indexes it, and then runs search queries from the webserver through the index it generates and sends the results back to the webserver. It's a rather convoluted setup (and it's even messier when it comes to updating the index), but I'm wondering if I can do anything along the same lines here. --Emufarmers 20:08, 11 November 2007 (UTC)Reply

Sphinx.conf file can be configured to make sphinxd run on a different machine (let's call it Machine S, for sphinx) from the machine running MySQL (let's call it Machine M, for mysql). However, in that case Machine S has to have netword access to Machine M. My guess is that something similar to your present setup with SSH tunnels could be done here as well. If you are interested in trying this out, please let me know via email (see extension credits) and we could work through these questions then. --Gri6507 22:17, 11 November 2007 (UTC)Reply

I am very interested to know how this configuration has worked for users. I have read many articles that state the problem of "I run a wiki, but it's through a shared hosting and installing the Sphinx daemon is out of the question". This is also my case - so in an effort to get better search capabilities than the standard search, this appears to be one of my only options. A few questions I have: how much traffic is involved when the indexing is performed? Is it a problem to have the daemon going across the wire to access the SQL database for indexing? I have concerns that it will drastically increase my bandwidth usage. Second, I think it would be a great addition to the extension to redirect/use the standard search if the sphinxd cannot be found (if the daemon machine goes offline). Just some ramblings but I am interested in anyone's thoughts - Blac0177 06:04, 12 December 2008 (UTC)Reply

Daemon does not do the indexing. That is done with a separate process which you run on a schedule. Daemon simply searches the index and returns the results. Search requests are small, and search results depend on the actual data being searched. You could have a replica of your database on the machine that runs the indexer and the daemon - just as it is described above in the Lucene example. You just need to make sure that your web server can communicate on the specified host and port to your sphinx daemon (and that nobody else can, as a security precaution.) Your bandwidth usage will depend on two things - the way you use to replicate the database, and the amount of search queries you get.
For replication, if it is MySQL and you can turn on binary logs, you can just replay those logs on your local copy. This is in case the database is too big to copy entire thing over for every indexer run. You could also dump just the records modified since the last run, since indexer does not really need your entire database - it only needs those tables that it actually indexes. Svemir Brkic 18:57, 17 January 2009 (UTC)Reply

Searching multiple wikis[edit]

I currently have 3 wikis indexed with Sphinx. The search works well, but it is returning results for all of them. I've set the "$wgSphinxSearch_index = X"; line in the SphinxSearch.php. Am I missing something? --N0ctrnl 20:28, 24 March 2008 (UTC)Reply

New in 0.7: $wgSphinxSearch_index_list lets you specify the list of indexes to search. You can have multiple main and delta indexes on the machine, and each wiki can define its own pair to search. You can also combine several index files and assign different weight to each of them, using $wgSphinxSearch_index_weights array. Svemir Brkic 01:06, 18 February 2010 (UTC)Reply

Search through various indexes are failing to display combined results[edit]

On our 1.15.1 wiki system we are using SphinxSearch version 0.7 and the query log shows that a search on

  • both indexes are search for 0.023 sec [ext/1/rel 7 (0,15)] [wiki_main,wiki_incremental,research_main,research_incremental] Cotterrell but the search on one wiki system only shows results from [wiki_main,wiki_incremental] where results from [research_main,research_incremental] are not shown.
  • [research_main,research_incremental] are indexed and
  • SphinxSearch.php has been maintained with $wgSphinxSearch_index_list = "wiki_main,wiki_incremental,research_main,research_incremental"; and $wgSphinxSearch_index_weights has been set.

Changing to the other wiki system (1.16beta) for the search term 'kotler' would result in Displaying 61-75 of 87 matches for query kotler retrieved in 0.008 sec with these stats: kotler found 200 times in 107 documents Above numbers may include documents not listed due to search options. but would not show any result in the list (as those terms only exist on the 1.15.1 system). We conclude that the search term is found in the index files but something hinder to display results from one wiki on the others wiki result page.

Is their an option to set a split or combined display of search results and in case of a combined display also render the right url to an article pending on the server.

Any suggestions how to solve this would be appreciated --MWJames 01:35, 27 May 2010 (UTC)Reply


^^ **I can confirm that this problem with my setup as well** --01:35, 13 June 2010 (UTC)

Facing the same issue, I have about 5 Wiki's and only the results of the main wiki are being displayed as links. --AutoStatic 15:10, 23 September 2010 (UTC)Reply

Sphinx is not storing all the information needed to display the results. It only stores IDs and some other specific attributes such as the namespace. String content is only indexed - stored in some special manner that allows fast searching. When displaying search results, sphinx must have access to the original database to be able to lookup the titles and show the links. Svemir Brkic 17:25, 23 September 2010 (UTC)Reply
What do you mean exactly by 'have access to the original database'? Where can I set these access rights? So it has probably nothing to do with the way I set up my Wiki's (http://www.steverumberg.com/wiki/index.php?title=WikiHelp_-_Method_Two)? --AutoStatic 12:26, 21 October 2010 (UTC)Reply
SphinxSearch extension asks for search results from the index. It does not know how those results got into the index. It assumes they all came from the wiki database extension runs on currently. In order to display search results, extension needs to make additional database queries to the database the search results came from originally. This is not a matter or access rights. Extension was not designed to support multiple databases in this way. Such support would have to be added by someone - perhaps by adding an array that maps each index to a specific database. Svemir Brkic 15:44, 13 November 2010 (UTC)Reply
Hello Svemir, all wiki's share the same database, they only have a different prefix. --AutoStatic 14:54, 17 January 2011 (UTC)Reply
I also have the same problem. I use the sphinx extension 0.8.5 with mediawiki 1.16.2. I have 6 different wiki databases on one machine. I installed sphinx on each wiki. When I make a search on one wiki, it only returns results of the current wiki. Is it possible to solve the problem by using multi-valued attributes ? BMxWiki 8:39, 25 January 2012 (UTC)

Is there any way to prevent Sphinx from indexing particular pages?[edit]

I realize this runs counter to what most people would want, but some pages don't need to be indexed. I've made some reasonable searches here and on the Sphinx site, and believe this is more relevant to a MediaWiki discussion than Sphinx in general. Jon Doran, 9 May 2008

You could modify the query in sphinx.conf to filter out any pages you do not want. It could be done based on namespace, a join with some other table (e.g. categorylinks,) or some new field or table you would create yourself. Svemir Brkic 01:23, 10 May 2008 (UTC)Reply
Thanks for the suggestions. I did not consider the query, but now that you mention it, there is a lot I can do with it. Jon Doran, 10 May 2008


Question when searching for IP's[edit]

We use the Wiki here in an IT setting so many of our articles refer IP addresses. The default search does not find any variation of IPs when searched (for example 102., 102.160.2.2, 106..etc.) Can anyone tell me if this search does a better job with this? Thanks. --Comalia 19:37, 15 July 2008 (UTC)Reply

It would certainly do a better job than MySQL full-text index - even in default configuration. You could also tweak it further, but I am not sure I fully understand what exactly you need. If you provide a some specific examples of data and search strings that should match it, I can test it. Svemir Brkic 22:45, 15 July 2008 (UTC)Reply

Sure. Say that I have a few articles that have the line of text 192.165.1.0 in them. So, if searching for 192.165.1.0, would it return any results? Or variations of it, such as "192.165"? --Comalia 13:41, 18 July 2008 (UTC)Reply

Yes, both searches will match that article. It will consider 192, 165, 1, and 0 as separate "words". You can tell it whether to search for all those words or any of them (it is an option on the search page, but you can also change the default.) Since proximity of the matched words is an important factor, you will get the articles that have entire IP in them first. Svemir Brkic 16:46, 18 July 2008 (UTC)Reply

Feature Request: Excluding Selected Categories from search[edit]

Will be useful to filter results not only by pointing to desired categories, but also by setting undesired categories

 $cl->SetFilter('category', $categories_to_exclude, true);

The Search Form will be like this:

               Include    Exclude
 Category1        [x]        [ ]
 Category2        [ ]        [ ]
  ...
 Category7        [ ]        [x]
 Category8        [ ]        [ ]

--StasFomin 14:59, 10 November 2008 (UTC)Reply

New in 0.7: $wgUseExcludes Svemir Brkic 01:59, 18 February 2010 (UTC)Reply
Latest SVN version (0.8.4, r96768) allows you to exclude categories by adding "-incategory:Foo" to the search query. Svemir Brkic 02:33, 11 September 2011 (UTC)Reply

Wikipedia[edit]

Can anyone tell me why Wikipedia has not installed this extension? According to the main article, it works with Wikipedia. --Robinson Weijman 09:59, 21 January 2009 (UTC)Reply

Wikipedia already uses a Lucene search engine. —Emufarmers(T|C) 11:53, 21 January 2009 (UTC)Reply
OK, thanks. So when and why would SphinxSearch be better than Lucene-Search - and vice versa? --Robinson Weijman 07:34, 22 January 2009 (UTC)Reply
Lucene has more features and is a more stable and mature product. It also needs more resources and is harder to install and maintain. Sphinx is still evolving - both the search engine itself and MediaWiki extension. It may not have all the features of Lucene yet, but it is much easier to setup and try out. If it does not do something you need, by all means go for Lucene. Svemir Brkic 14:00, 22 January 2009 (UTC)Reply
Thanks both of you for your feedback.--Robinson Weijman 08:57, 23 January 2009 (UTC)Reply

Offline Wikipedia[edit]

I'm using an offline Wikipedia and the search isn't working. I'm having trouble installing Sphinx, and see that Wikipedia uses Lucene. Is that automatically installed, so that I would get the MWSearch extension? Or is there something else I need to do to get my search to work correctly? Thanks! User: Kschindler 07:00, 4 Oct 2011

By offline do you mean a set of static HTML pages, or a local web server using mysql and php? 75.75.36.106 12:41, 4 October 2011 (UTC)Reply

Search Results without Wikicode/Wiki markup[edit]

Does anyone have experience in excluding wikicode / wiki markup from being displayed in the search results? I couldn't find anything on the Sphinx site. Thanks in advance, labalena 149.211.153.96

There is no easy way. Sphinx does not know anything about wiki markup - it can only be told to strip HTML when indexing. You could in theory keep a separate copy of active revisions, with wiki markup removed with some script, and index that instead of the real content. Svemir Brkic 02:32, 18 March 2009 (UTC)Reply

Handling of HTML tags[edit]

There are problems with the current handling of HTML tags: The <span> tags that highlight the match are inserted into the result before the result is run through strip_tags(), requiring strip_tags() to exclude the <span> tag. This has potential to cause problems, when <span> tags are used in Wiki pages.

Furthermore, strip_tags() gets confused (and removes a lot of wanted content) by input like

  • 3<4
  • Run <code>mail who@ev.er <text</code> on the shell to send an e-mail containing the contents of file text to who@ev.er.

which is likely to appear on Wiki pages. --Patrick Nagel 09:48, 8 April 2009 (UTC)Reply

Somehow I forgot to take care of this. I am dealing with another highlighting issue now, so I will try to fix this problem as well. As a workaround, you can try setting $wgSphinxSearchMWHighlighter to true after you include sphinx in your LocalSettings. That will use MW's own highlighter which may be better in some cases (and worse in others...) Svemir Brkic 00:28, 10 September 2011 (UTC)Reply
Fixed in 0.8.3 (r96735) Svemir Brkic 20:07, 10 September 2011 (UTC)Reply

Namespace[edit]

Sorting by namespace[edit]

Is this possible? I'm not able to find a clear way to do it in the documentation, but am looking for some way to put one of our existing namespaces at the top of all the other hits. Any ideas? As always, thanks in advance. CaliVW78 13:39, 18 May 2009 (UTC)Reply

See above --Svemir Brkic 19:58, 12 December 2009 (UTC)Reply

Grouping results by Namespace[edit]

I have the extension running but since it doesn't support the weighting of results, is there a way to group the results by the namespace they belong to? -- 20:48, 14 April 2009 (UTC)

Version 0.7 supports $wgSphinxSearch_index_weights array, which lets you specify a weight for each index you have. You still need to setup those indexes manually, as it does not make much sense to have that as the default setup. I will try to prepare a sample sphinx.conf for such a setup - unless somebody else beats me to it (HINT: you can have the main query include only namespace 0, and a supplemental index for other namespaces, or even each namespace separately. You will also need to have more than one incremental index, but maybe you will decide other namespaces do not need to get updated as frequently...) Svemir Brkic 04:08, 18 February 2010 (UTC)Reply

Unable to get namespaces to work[edit]

Seems like I've only managed to get the main namespace to work on the search results and not additional name spaces... Do I need an index for each namespace and adjust the sphinx.conf as necessary? If so what do I need to tweak? --Skunark 18:01, 18 September 2011 (UTC)Reply

In SphinxSearch 0.8+ this should work out of the box (as long as you don't have a modified sphinx.conf and you use the standard MW search with option Everything or Advanced), but you might describe your system environment (Sphinx Engine, Extension version, MW version etc.) otherwise people are not able to help. --MWJames 18:41, 18 September 2011 (UTC)Reply
Sphinx 0.9.9, Extension 1.17 and development head, MW is 1.17. I've added "AND page_namespace=100" to the end of the sql_query and the expected pages did show up for that namespace, but of course I just disabled all other namespaces.... The only other tweak i've made in the sphinx.conf is to deal with the wgDBprefix for page, revision, text and categorylinks.
My goal is to search three namespaces NS_MAIN, UserNS1 and UserNS2 such that I can search for page UserNS1:PageX or UserNS2:PágeX by either just typing any of the following: a) PageX, b) PágeX, c) UserNS1:PageX, d) UserNS2:PageX. Such that the results would list UserNS1:PageX and UserNS2:PágeX. --Skunark 19:32, 18 September 2011 (UTC)Reply
I've tweaked the charset_table and that seems to have cleared up some of the issues. I think the bigger issues for me now is trying to weight titles higher than the page text.

Pages made from templates/transcluded pages do not rank well[edit]

So far Sphinx search produces the best result of all the search engines I have tried. Recently I have noticed that some my templates and sub pages are appearing higher in the search results than the main page that includes them. I understand that the indexer does not parse any of the wiki text and only looks at a single page entity. Is it possible to specify groups of related pages so that my main page will contain all of the text of it sub pages?

For Example, if Page-main =>(includes) Page-info & Page-index then I want all the text on Page-info and Page-index to be included in the results for Page-main. I can even go as far as saying that it is a rule that pages have a strict naming convention -main -info -index

Or is there some way for the indexer to know that -index is 'linked to' by -main and include the results for -index in main.

Any suggestions are welcome.

Result weighting[edit]

Hello,

I have a little problem with the sortings in the result page.

For e.g. if I search for mysql I get every entry but the sorting is horrible.
I have several pages with mysql in text and as part of the page_title.

I would like to have the page_title parts in front of the appearing in body results. I set the

$wgSphinxSearch_weights = array('old_text'=>1, 'page_title'=>1000);

in SphinxSearch.php.

I also tried sql_attr_uint = page_title in my sphinx.conf. but didn't help at all to get a better result.

Settings in the sphinxapi.php:

 Matching mode is set to extended.
 Sort mode is set to relevance. Tried every type but this is the best so far.
 Group mode is set to SPH_GROUPBY_ATTR and ranker is the default one.

I would be very pleased if someone could help me.

Greetings,

Tom

sql_attr_uint is only for numeric values. It is used for filtering. After you adjust this and the weights, make sure to rebuild the index and restart the deamon, just in case. In our case, we always get the title matches first. Our weights are 'old_text'=>1, 'page_title'=>100, extended match mode, and we leave sorting and grouping at default (we do not set them at all.) Svemir Brkic 13:16, 8 September 2009 (UTC)Reply
Hi, thanks for the quick answer. I turned it back to default. Rebuilded the index, after the changes but still for e.g. i get a page with the title Statistics infront of a page called MySQL5. It has something todo with the weight of upcomming words in body i guess but as I defined the weight of page_title higher than the old_text, i guess it should be vice verca. Greetings, Tom.
What versions of sphinx, the extension, etc. do you use? Svemir Brkic 03:12, 9 September 2009 (UTC)Reply

Hi,

basic sphinx installer is sphinx-0.9.8.1. Extension Sphinx is SphinxSearch-0.6.1. PS: Mediawiki is version 1.15.0 Tom

Search result output[edit]

Strange output[edit]

I'm running the SphinxSearch on a standard LAMP setup (CentOS 5.3, Apache 2.2.14, MySQL 5.0.77, PHP 5.2.11) with MediaWiki 1.15.1. Searches keep returning well formatted and readable page titles, but garbage for the excerpts, like:

*  Title 1
��ߎ�&����)����^w+E����f�lқh�bl�k�:`�N�����l

* Title 2
U�A��0�E�=E/`�9@W�������

* Title 3
�XYs"��~�W�����1B�`Z��1kbu�j4�1C�Ew���k��ul�������cy�� ��

* Title 4
�Z�n��� �W4�� 9����vl�2����� YH���0���g� �ͰI�����



Has anyone else seen this or have an idea of where to look?

Do you have $wgCompressRevisions enabled? It looks like by default this extension sets Sphinx up to read text.old_text directly out of the database, which will not give useful results if you have any of our fancier storage features enabled. (Compressed revisions, batch compression, external storage, 'cur' table back-compat entries, legacy encoding back-compat entries, etc.) In this case you'd need to feed updates into Sphinx over an xmlpipe source or something... --brion 08:55, 6 November 2009 (UTC)Reply
Excellent catch. I do have $wgCompressRevisions enabled. As a result, I had tried altering the sql_query statements in sphinx.php to accont for that with "sql_query = SELECT page_id, page_title, page_namespace, old_id, UNCOMPRESS(old_text) AS old_text FROM mw_page, mw_revision, mw_text WHERE rev_id=page_latest AND old_id=rev_text_id AND page_touched>=DATE_FORMAT(CURDATE(), '%Y%m%d030000')" to no avail. What puzzles me is that the page titles come out just fine, but the full text excerpts don't. How would I go about feeding updates into Sphinx over an xmlpipe since it's all in a MySQL DB?
Unfortunately, it is not that kind of compression that Brion is talking about. There are some comments above about using xmlpipe, but I did not play with those myself. There are also other ways you could try working around this. I would consider using the Manual:CompressOld.php script periodically, instead of enabling $wgCompressRevisions. That way current revision text will not be compressed and sphinx will work just fine, but you will still have old revisions compressed. Svemir Brkic 18:43, 5 December 2009 (UTC)Reply

Search Results within Interlinks counted as double[edit]

A rather philosophical questions, but if one uses interlinks (normal interlink, semantic interlink) such as [[Reference Author::Philip R Cateora|Cateora, Philip R.]] then a search for the term Cateora would be counted twice in the search result screen.

The system configuration Sphinx Version 0.9.9, SphinxSearch Version 0.6.1, LightTPD 1.4.22, MediaWiki 1.15.1, PHP 5.2.9-1 (cgi-fcgi), MySQL 5.0.77-community-nt,

Default setup indexes the wiki source. You could use xml pipe or some other approach to index parsed articles. That would solve this and some other issues. I may look into that in the future. Svemir Brkic 14:52, 5 January 2010 (UTC)Reply

Installing a different language (morphology) for sphinx[edit]

I'm trying to install a french stemmer: morphology=libstemmer_french as suggested here: http://www.sphinxsearch.com/forum/view.html?id=11#9507

Of course I downloaded libstemmer.c.tgz and extracted it to libstemmer.c.

I added --with-libstemmer to ./configure. All seems to have gone well (got lots of verbose when make entered this directory):

Making all in libstemmer_c
make[1]: Entering directory `/home/inmdev/Downloads/sphinx-0.9.9/libstemmer_c'
gcc -DHAVE_CONFIG_H -I. -I../config -I/usr/local/include -I/usr/include/mysql -Wall -g
-D_FILE_OFFSET_BITS=64 -O3 -DNDEBUG -MT stem_ISO_8859_1_danish.o -MD -MP -MF
.deps/stem_ISO_8859_1_danish.Tpo -c -o stem_ISO_8859_1_danish.o `test -f
'src_c/stem_ISO_8859_1_danish.c' || echo './'`src_c/stem_ISO_8859_1_danish.c
mv -f .deps/stem_ISO_8859_1_danish.Tpo .deps/stem_ISO_8859_1_danish.Po
gcc -DHAVE_CONFIG_H -I. -I../config -I/usr/local/include -I/usr/include/mysql -Wall -g
-D_FILE_OFFSET_BITS=64 -O3 -DNDEBUG -MT stem_UTF_8_danish.o -MD -MP -MF
.deps/stem_UTF_8_danish.Tpo -c -o stem_UTF_8_danish.o `test -f
'src_c/stem_UTF_8_danish.c' || echo './'`src_c/stem_UTF_8_danish.c
mv -f .deps/stem_UTF_8_danish.Tpo .deps/stem_UTF_8_danish.Po
[...]

After all this I try ./indexer --all --config ../../SphinxSearch-0.6.1/sphinx.conf and I get the following warning: WARNING: index 'wiki_main': invalid morphology option 'libstemmer_french' - IGNORED

According to libstimmer.c/libstimmer/modules.txt, the french module can be refered to by either french, fr, fre, fra... french UTF_8,ISO_8859_1 french,fr,fre,fra

Anyways, don't know where to go from here. Fabricebaro 19:01, 6 January 2010 (UTC)Reply

PS: I posted this matter here too: http://www.sphinxsearch.com/forum/view.html?id=19#22615

What did you put in your sphinx.conf instead of "morphology = stem_en"? I would try "morphology = stem_fr" --Svemir Brkic 04:46, 7 January 2010 (UTC)Reply
Sorry for the omission (I removed this info by accident). I tried:
* morphology = libstemmer_french
* morphology = libstemmer_fr
* morphology = stem_fr
* morphology = french
It should be libstemmer_french according to http://www.sphinxsearch.com/forum/view.html?id=11

Activating "Did you mean" with a French wiki[edit]

I installed SphinxSearch on a French wiki. When I activate "Did you mean" I get the following error (displayed at the top of the search result page): Warning: pspell_new_config() [function.pspell-new-config]: PSPELL couldn't open the dictionary. reason: No word lists can be found for the language "fr". in /var/www/jungle.inm.com/w/extensions/SphinxSearch/SphinxSearch_spell.php on line 40 Also I do have some English in the wiki. Is it possible to use both languages for spelling suggestions ? Fabricebaro 18:42, 13 January 2010 (UTC)Reply

You need to find out how to install French dictionary for pspell. If yuou actually have it installed, but the language code is not "fr" for some reason, you can edit that line in SphinxSearch_spell.php and change $wgUser->getDefaultOption('language') to whatever it should be - even if you have to hard-code the string. Spelling suggestions in both languages would require some additional work in the same file - if you can find someone who knows some PHP... Svemir Brkic 01:07, 17 February 2010 (UTC)Reply

Always add wildcard better automatically add wildcard?[edit]

The SphinxSearch-Extension is running nicely here on openSUSE with MW 1.15.1. I would love to be able to do the following:

Define a number x as a variable. After searching for "string" and the number of results is =< x, the search is automatically changed to "string*" and results are shown.

Alternatively: being able to switch on always adding a * to every search term. If I enter "string", "string*" will be searched.

Is it feasible?

One way to do this yourself is to find $cl->Query($search_term, "*"); in SphinxSearch_body.php and change it to $cl->Query($search_term . "*", "*"); (of course, it would be better to first check if * is already in there, or if there are multiple words, or to use a preg_replace to do it conditionally, but I am kind of pressed for time right now...) Svemir Brkic 01:14, 17 February 2010 (UTC)Reply

Cannot Find Wiki Main[edit]

I installed SphinxSearch but when I do:

$> /path/to/sphinx/search/ --config /path/to/sphinx.conf "search string"

I get an error saying:

index 'wiki_main' : search error: failed to open /var/data/sphinx/wiki_main.sph: No such file or dierctory

The error is misleading, all you need to do is to create the mentioned directory, and the script will take care of the rest.
Jahângir

Is there a publicly available wiki with SpinxSearch installed?[edit]

I am looking into both SphinxSearch and LuceneSearch. I would like to try out both before installing one or the other. Is there a publicly available wiki with SpinxSearch installed on which I can try out some queries? Also, I need the following capabilities and wonder if SphinxSearch supports them: 1) ability to search pages for specific tags (e.g., <ref>, <section begin=chapter1 />); 2) ability to index a site with on the order of 150,000 articles, and 3) exporting a list of pages (identified either by page name or some internal identifier like page_id) to an external file. Dnessett 18:07, 25 February 2010 (UTC)Reply

NWE uses Sphinx. It has 107,378 total pages in the database, but less than 20,000 are actual articles. I see no reason not to be able to handle 150,000 or more, given enough disk space and memory. sphinx engine itself is used on some very big sites. You can search for tags, but it is ignoring special characters by default, so it will find the alphanumeric part only. Dealing properly with tags would require some customization, but should not be too hard if you know PHP. As for exporting, the extension now supports MediaWiki Search API, so you could provide a link that would open the api request in XML or text format, for example (or point it to a script that would format it further.) Svemir Brkic 18:30, 25 February 2010 (UTC)Reply
Thanks. Dnessett 18:37, 25 February 2010 (UTC)Reply
A follow-up question. I have looked through the sphinx and sphinx extension documentation, but could not find the answer to the following question. Suppose I want to index the same db twice, once using the standard character set and a second time using an enhanced character set including characters that are used in wikitext markup (e.g., "{", "}", "<", ">"). Is this possible? Would I need to run two instances of the sphinx daemon, each with a different configuration file and different directory targets to separate their activity? Or is there some way to utilize the same daemon to run two indexes? Dnessett 20:59, 25 February 2010 (UTC)Reply
Single daemon can have multiple data sources specified in sphinx.conf file. We do that by default, actually so that we can have a main + delta index. You may add more and extension will search all of them by default. More advanced configuration is also possible. Svemir Brkic 00:33, 10 September 2011 (UTC)Reply

Create the page "I want to create a new page" on this wiki![edit]

Can anyone tell me if it's possible to have SphinxSearch provide a link to create a new page if it doesn't find an existing page in the search results? I liked this feature in the default search engine. Thanks! --WilkBoy 14:43, 31 March 2010 (UTC)Reply

If you press enter or click Go, a red link "create this page" will appear. If you click search, it will not. Svemir Brkic 18:55, 31 March 2010 (UTC)Reply

hi - this doesnt work for me, i get <noexactmatch> at the top of the page, with no red link. would like to have the create new page on results page. Thanks! Selspiero

This has been fixed in versions 0.8 and above. Svemir Brkic 00:34, 10 September 2011 (UTC)Reply

Semantic Wiki and Sphinx[edit]

This topic is surely not on your urgent list, but semantic abilities for a Wiki become more and more import and we are using it extensively throughout our Wiki to build ontologies and give pages characteristics other than the standard [[Category:...]]. Do you have any plans to give some thoughts on how to integrate select statements for the Semantic Wiki Extension. We assume that pages with specific properties and keywords should be ranked higher in the search hierarchy than standard pages without those special classification. It could be a nice feature in comparison with other MW search engines.

Thanks, James

I am in the process of re-engineering some parts of the extension. I am making it use more of the standard MW search code (which improved significantly since this extension was first developed,) but I am also looking for more ways to make it better than the default. I will look into Semantic Wiki and how to integrate with it when available. Svemir Brkic 20:32, 17 April 2010 (UTC)Reply

Compatible with $wgEnableMWSuggest?[edit]

Is this extension compatible with the $wgEnableMWSuggest = true; setting? I can't get that to work. If not, is it a planned enhancement? Thanks! Gomeztogo 23:06, 30 April 2010 (UTC)Reply

The lastest version (0.8.2, svn r9671) will use sphinx for as-you-type suggestions if you have both $wgEnableMWSuggest and $wgEnableSphinxPrefixSearch set to true. Svemir Brkic 02:57, 10 September 2011 (UTC)Reply

Category filter[edit]

I just installed sphinx search on MW 1.17a and it works fine.. but there is no category filter. Is that an option one could enable somewhere? Or is it currently blank? Or am I just experiencing a bug?

Latest version (0.8.1+) replaces old "hierarchical categories" approach with a simpler, yet more flexible "incategory" prefix, similar to English Wikipedia search. These feature is still under development, but you should already be able to search like this:
foo incategory:Bar
Make sure "sql_attr_multi = uint category..." line is not commented-out in your sphinx.conf file. Category search is case-sensitive and you have to use underscores instead of spaces, at least for now. Support for excluding categories and including category children is coming soon. Svemir Brkic 03:04, 10 September 2011 (UTC)Reply
I'm using MW 1.17 and extension 0.8.2, which works great. But, what's the best way to search within one category using a search box? I want to add a search field on one page that searches only content in that category. Possible? Thanks. 62.231.49.30 15:31, 16 September 2011 (UTC)Reply
That is something you would have to do with hooks, in the skin, and/or with custom javascript. Svemir Brkic 16:34, 16 September 2011 (UTC)Reply
You might have a look at Adding search inputbox to a page --MWJames 16:47, 16 September 2011 (UTC)Reply
Thanks to you both. OK, I guess I was hoping that there was a quicker and more reliable way than custom javascripts, but good to know that it is an option. Yes, I've been using Inputbox already, but the trick is defining the categories that are searched. Using namespaces seemed an option, but of course namespace pages are excluded from the main wiki search. 62.231.49.30 10:53, 19 September 2011 (UTC)Reply
Found a solution using the namespace approach above. It means manually creating namespaces and adding the namespace prefix to each article, but at least it will give some control. You also need to make sure that the articles are included in the main wiki seatch, but Manual:$wgNamespacesToBeSearchedDefault allows that. 62.231.49.30 09:57, 20 September 2011 (UTC)Reply

I have installed the latest version (0.8.5) on MW 1.18 and uncommented the sql_attr_multi line in the configuration file, but the incategory parameter didn't seem to be working. The only result I got was:

There were no results matching the query.
Create the page "Foo incategory:Bar" on this wiki!

Did I miss something here? --GnuDoyng 05:39, 26 December 2011 (UTC)Reply

The one thing that comes to my mind is that after you activate the incategory statement in sphinx.conf you have to rebuild the index completely (no incremental, a full index rebuild). --MWJames 10:59, 26 December 2011 (UTC)Reply
Thank you. I'm still cherishing the hope of creating a category filter like that one that is used in New World Encyclopedia. Can someone shed a light? Thank you so much! --GnuDoyng 00:49, 27 December 2011 (UTC)Reply
When you look at New World Encyclopedia you see a Special:Search page that has been developed for SphinxSearch 0.7 but since 0.8+ SphinxSearch uses MW's internal Special:Search page to display the search results. If you you know php then in MW 1.18 you can enable some of the new SpecialSearch.php hooks and create a new search profile that include those category filters as html element. For further details see SpecialSearchProfileForm hook and Extension:Translate which uses those hooks to enhance the search profile.

Feature request / Using the search function in templates[edit]

Maybe a bit far fetched, but maybe you could provide a search function such as {#ssearch } that can be called in a template and return the results as ul or ol list where the amount of search results is limit by a variable parsed through the function. --MWJames 05:33, 24 July 2010 (UTC)Reply

Where defines these hooks?[edit]

in the sphinxsearch_body.php i found these codes:

wfRunHooks( 'SphinxSearchFilterSearchableNamespaces', array( &$namespaces ) );
wfRunHooks( 'SphinxSearchGetSearchableCategories', array( &$categories ) );
wfRunHooks( 'SphinxSearchGetNearMatch', array( &$term, &$t ) );
wfRunHooks( 'SphinxSearchBeforeResults', array(
wfRunHooks( 'SphinxSearchBeforeQuery', array( &$this->search_term, &$cl ) );
wfRunHooks( 'SphinxSearchAfterResults', array( $term, $this->page ) );

i looked up every files of sphinxsearch, bus still can not found the functions such as "SphinxSearchFilterSearchableNamespaces" "SphinxSearchBeforeQuery" , i wang to know how them works, but i can't find the function definition.

who can help me , many thx! 121.8.153.6 09:39, 30 July 2010 (UTC)Reply

These are all optional hooks, so they are not defined by default. Some of them will be deprecated soon, as standard MW search-related hooks will be available in this extension as well. Once that is cleaned up, I will document any sphinx-specific hooks that remain. For now, just see where they are called and what arguments they receive. That should give you an idea about what you may be able to do with them. Svemir Brkic 15:38, 19 September 2010 (UTC)Reply

Can't get negation to work, always finds all words[edit]

I'm struggling to get the negate search option to work, i.e. searching for "word1 -word2" still shows hits for both words, instead of those with only word1. The search CLI tool works fine, though.

I have already tried the different search modes ($wgSphinxSearch_mode: SPH_MATCH_EXTENDED, SPH_MATCH_EXTENDED2, SPH_MATCH_BOOLEAN). SphinxSearch extension 0.7.0 and Sphinx 1.10-beta (r2420) (but same behavior with 0.9.9-release (r2117)), MediaWiki 1.12.0

Any help would be greatly appreciated! --Mmaddin 08:18, 5 August 2010 (UTC)Reply

There is definitely something broken now, I also can't use quotes for a phrase match ("word1 word2") - it always finds OR match, or, if I choose "match all words" via the radio button, it finds AND match. I need to make AND match the default, but there is no option, except setting $wgSphinxSearch_mode to SPH_MATCH_ALL - but then there is no radio button anymore (for the rare cases where an OR match is useful).
There is not much missing for this extension to be a really great alternative to the built-in search - but right now there is a number of small but annoying (maybe cosmetic, but still important) bugs, that really need to be fixed. --Patrick Nagel 02:03, 31 August 2010 (UTC)Reply
Shortly after posting this, I found $wgSphinxMatchAll in the code (can't find documentation anywhere), which you can set to true in LocalSettings.php. Then the radio buttons are there, but "match all words" is selected by default. --Patrick Nagel 02:06, 31 August 2010 (UTC)Reply
All these issues are fixed in current SVN version (0.8+) Svemir Brkic 03:07, 10 September 2011 (UTC)Reply

successfully controls the search results according to user rights[edit]

i have an extension named rarc, which builds a category and uses the same name as the name of a user group. By adding [[category:xxx]] only users who belongs to the xxx user group can see this page. At first, the sphinxsearch do not care of this privilege rules, still displays the search results. With the recommendation of the sphinxsearch author, i build 5 indexes to implement this.

wiki_main --includes everyting expect todays new articles
wiki_incremental  --index automatically every 2mins, just today's new articles, and merge at night with the wiki_main
wiki_small_main  --some thing like wiki_main, but exclude the private articles
wiki_small_incremental --some thing like wiki_incremental, but exclude the private articles
wiki_private  --every private articals of the wikisits (due to small amount of articls, no need to build the incre+merg mechanism )

modified the codes, if a user belonged to xxx user group, then search with (wiki_small_main wiki_small_incremental wiki_private ) if it is a normal user, the searchlist is (wiki_small_main wiki_small_incremental)

the wiki_main is uesed to display the breif cntent of the article according to the page_id.

can only one tell me where defines these hooks such as

wfRunHooks( 'SphinxSearchFilterSearchableNamespaces', array( &$namespaces ) ); wfRunHooks( 'SphinxSearchGetSearchableCategories', array( &$categories ) ); wfRunHooks( 'SphinxSearchGetNearMatch', array( &$term, &$t ) ); wfRunHooks( 'SphinxSearchBeforeResults', array( wfRunHooks( 'SphinxSearchBeforeQuery', array( &$this->search_term, &$cl ) ); wfRunHooks( 'SphinxSearchAfterResults', array( $term, $this->page ) );

i use powergrep to search in the directory of /extesnion/sphinxsearch and /sphinx, both of them don't contains only of these "SphinxSearchFilterSearchableNamespaces" or 'SphinxSearchGetSearchableCategories' 121.8.153.6 08:23, 26 August 2010 (UTC)Reply

Did you mean? pigtailed pigtailed pigtailed Aggregate ....No.[edit]

For some reason when I try to search part of a title the search will not locate the article. Instead it preforms a full-text search omitting article titles and asks if I meant "pigtailed Keyword Searched". The same thing happens no matter the parameters I set (i.e. match titles only etc).--Siadsuit 18:37, 28 September 2010 (UTC)Reply

possible to install without compiler?[edit]

Hi, in my hosting plan, the compiler is disabled. I'm wondering if i can install sphinx without using a compiler? Is it possible for me to compile it locally, and just upload and run? --Wmama 05:06, 29 September 2010 (UTC)Reply

Templates / Transclusion - returns only template not users[edit]

If I have a template called Cereals with the contents "Rice, Wheat, Rye", and then I use that template in an article, and then search for "Wheat" I only get the template, not the page using the template. I guess this makes sense since MW doesn't store the rendered content, it expands the templates on the fly when the page is rendered.

I was thinking to have sphinx index all the rendered pages instead, any ideas as to how? --72.148.136.13 23:01, 12 October 2010 (UTC)Reply

ARGH! Yes, this is going to be the only way. I'm going to have to use xmlpipe and write a shell script (or php?) to get every page, wrap it inside of xmlpipe tags and dump it into the indexer. Argh, argh, argh. The good news is I get to spend the next few days at work hacking php :) Or switch to Lucene, but everyone says Lucene doesn't work on Windows, and my box is a WAMP.

Query failed: no enabled local indexes to search[edit]

Hi,

Pls help! Sphinx Search Problem - Query failed: no enabled local indexes to search I am able to run indexer in console and do a search in console. i am able to see the results.

i have started the daemon and it is running. When i use Special: Sphinx Search: it says "Could not instantiate Sphinx client. " And then when search for a word : it says "Query failed: no enabled local indexes to search "

If anyone have an idea, what mistake i am doing, pls let me know.

Some details :

Mediawiki 1.15.3

XAMPP 1.7.2

Windows Xp SP2

Sphinx 0.9.9 (win32)

SphinxSearch 0.7.1

Thanks in Advance.

--Ramesh

I have the same issue, any help out there? Using sphinx-0.9.9, Mediawiki 1.16


I appear to have resolved this by moving XAMPP into a folder different from Program Files. --Nate

--193.16.163.244 14:14, 26 October 2010 (UTC)Reply

Problem with SphinxMWSearch[edit]

Hi, I am trying to use SphinxSearch (Snapshot 80923) with MW 1.16 on a Suse Linux Enterprise Server 10. I included Sphinx Search the following way:

$wgSearchType = 'SphinxMWSearch';
require_once( "$IP/extensions/SphinxSearch/SphinxSearch.php" );

But I get the following Error

PHP Fatal error:  Can not call constructor in /extensions/SphinxSearch/SphinxMWSearch.php on line 22

Can anyone help me? Am I missing another Extensions?

--Jlemley 23:27, 3 June 2011 (UTC)Reply

Fixed in 0.8 Svemir Brkic 03:23, 10 September 2011 (UTC)Reply
This looks like it was fixed after the 1.16 changes so not usable for older mediawiki installs.
commit 7f4aaf11ba845131837307811f5ecbc38dd236e6
Author: Svemir Brkic <svemir@users.mediawiki.org>
Date:   Wed Sep 7 03:06:44 2011 +0000

    emulate Wikipedia search where ~ prefix prevents automatic redirect
    use strict comparison for suggest mode (some folks may still have it set to true)
    remove methods that need not be overwritten
    use $this->db consistently (with backward-compatibility check for MW prior to r77109)

Fatal Error with SphinxMWSearch[edit]

Hey I'm having a fatal error also. I'm using MW 1.16 with the latest sphinx (downloaded on 10/12/11)

 Fatal error: Can not call constructor in /var/www/mediawiki-1.16.2/extensions/SphinxSearch/SphinxMWSearch.php on line 23

Line 23 is (if I counted right):

 parent::__construct( $db );

I don't know if it matters, but the download link said:

 A snapshot of version r77506 of the SphinxSearch extension for MediaWiki 1.17.x has been created. Your download should start automatically in 5 seconds. 

I'm using Mediawiki 1.16, but the wiki page says it's compatible. Thanks

Search weights Problem[edit]

Im running SphinxSearch 0.9.9 on mediawiki 1.16. It works fine (at least it looks like). but when the change on my localsettings to "$wgSphinxSearch_weights = SPH_MATCH_PHRASE;" for example, the search stops working (in fact, if i define any value to $wgSphinxSearch_weights the search stops working). - Miguel (20-02-2011)

Proper values for that variable are something like:
$wgSphinxSearch_weights = array(
	'old_text' => 1,
	'page_title' => 100
);
Perhaps you wanted to change $wgSphinxSearch_mode instead? Note that in the latest version you can do a phrase search by enclosing the words in double quotes. Svemir Brkic 03:26, 10 September 2011 (UTC)Reply

Call to undefined method SphinxSearch:transformSearchTerm()[edit]

Mediawiki 1.16.4, SphinxSearch Extension HEAD as of 11/05/2011. Special page sphinx search works, trying to use search on left-hand side provides the error:

 Fatal error: Call to undefined method SphinxSearch::transformSearchTerm() in 
 C:\xampp\htdocs\includes\specials\SpecialSearch.php on line 127

Fixed, this happens if you declare $wgSearchType after the require_once line this error will occur.

Sphinx 2.0.1 and compat_sphinxql_magics[edit]

While testing 2.0.1, an issue appeared when compat_sphinxql_magics [1] is set to 0, while compat_sphinxql_magics = 1 still works. Changes in Sphinx 2.0.1 seems to create problems with the current SphinxSearch (Version 0.7.2).

Testing environment: MediaWiki 1.16.1 (r80998), PHP 5.2.13 (apache2handler), MySQL 5.1.44-community, SphinxSearch (Version 0.7.2). --MWJames 09:45, 18 May 2011 (UTC)Reply

That would be an issue in sphinx.conf file itself. Once version 2 becomes more stable, we will update the documentation and default sphinx.conf. Until then you will need to set compat_sphinxql_magics the way you want it. Svemir Brkic 03:29, 10 September 2011 (UTC)Reply

SphinxMWSearch 0.8+[edit]

Search display behaviour different in MW 1.18[edit]

We tested SphinxMWSearch 0.8 in 1.18alpha (r96396) and recognized that the display behaviour have changed. This might be related to the fact that the search string works now with &profile=all&redirs=1 parameter instead of listing every single namespace as in 1.17. The 1.18 search-URL looks like title=Special:Search&search=<...>&fulltext=Search&profile=all&redirs=1. Also see Manual:Hooks/SpecialSearchSetupEngine and Manual:Hooks/SpecialSearchProfileForm --MWJames 08:26, 7 September 2011 (UTC)Reply

I am not sure what to make of this comment. Is the change in display behavior a problem that needs to be fixed? What should I be looking for? Svemir Brkic 03:02, 9 September 2011 (UTC)Reply

I am facing same issue in url redirection, searching from homepage redirects to index.php?title=Special%3ASearch&search=test but with no results and if I search again from the redirected search it redirects to ?title=Special%3ASearch&redirs=0&search=test&fulltext=Search&ns0=1 and gives result what can be the reason, i have checked sphinx properly and it is configured correctly. User:Jyotir Bhandari

With the official release of 1.18, a bugzilla report has been created bugzilla 32970

Sorting prefix[edit]

Is their a possibility to sort by time created, but instead of changing any option in the LocalSettingsExtension talk:SphinxSearch/archive#Having trouble with the Sort By command a predefined prefix like @tcd (time created descending) or @tca (time created ascending) can be used to change the result display dynamically? Options that might worth considering are:

  • @tcd (time created descending)
  • @tca (time created ascending)
  • @user (last revision user)
  • @lang (language)

Add Support for custom search Targets[edit]

Well, this sound more complicated now than in my head...

Problem: We are using Sphinx to search through our wiki. This works perfectly.

But we are using MetaDok, a document management system, which can be searched to. And now we have the following request:

After typing something into the searchbox and pressing search, not only the results from sphinx should be displayed, but also the results of the query in metadok, a different system.

I have been told that we cannot build an index out of metadok, because we would have to use the xml-source of sphinx, and there are about 30GB of data to index...

So, is there a possibility to catch the searchquery and not only direct it to sphinxsearch but also to a custom search engine?

Many Thanks in advance!

MW provides plenty of hooks that would let you insert your own content etc. You probably do not want to do this in a way that relies on SphinxSearch. If you use an existing general hook, you can later use another search engine on wiki side and things will still work. Svemir Brkic 13:03, 27 September 2011 (UTC)Reply
So you arre suggesting to write a Hook which fires when the Search has been Started, so that the normal Sphinx-Search will be performend but also my Custom Search will send its Results to the Result-Page? Sounds like a pretty good Idea, many Thanks :-)--Dominik Sigmund 08:16, 28 September 2011 (UTC)Reply
You could also do it after search has been completed, or you could use a more general hook that will make sure you are on a search page and only do the extra search in that case. Svemir Brkic 12:58, 28 September 2011 (UTC)Reply
Thank you for the Information, which hook would you suggest?--Dominik Sigmund 12:36, 30 September 2011 (UTC)Reply

Sphinx 2.0.1 How do i know this is working?[edit]

I have installed

Mediawiki 1.17.0 Sphinx 2.0.1-beta (Apr 2011) SphinxSearch Extention (Version 0.8.5)

How do I know if SphinxSearch is working? Previous version I would see on the results page(see below) but I don't this any more.

AT THE TOP OF PAGE Search wiki using Sphinx From mediawiki Jump to: navigation, search <noexactmatch> Displaying 0—0 of 0 matches for query "tt" retrieved in 0.064 sec with these stats:

  * "tt" found 13 times in 6 documents 

Above numbers may include documents not listed due to search options.

AT THE BOTTOM OF PAGE 'Powered by Sphinx'

If you set the search type to SphinxMWSearch and you got no error, it is probably fine. However, you can also look into the page source. There should be a HTML comment saying "Powered by http://www.sphinxsearch.com" Svemir Brkic 15:33, 28 September 2011 (UTC)Reply

Enchant Issues[edit]

I have not been able to successfully use Enchant. I'm using the trunk version of SphinxSearch (97051), Sphinx 2.0.1, and PHP 5.3.

On Windows, PHP crashes with a nasty C++ error, which causes the entire site to freeze until the error is cleared. I could not find a solution to this (appears to be a bug in Enchant), so I moved over to Linux. There, searches produce "page cannot be displayed" in IE when enchant is enabled in LocalSettings.php ($wgSphinxSuggestMode = 'enchant';). Through Chrome, the error appears as: Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.

If I delete the sphinx.dic file, no error occurs. If I comment out this line in SphinxMWSearch.php, no error occurs:

enchant_broker_set_dict_path($broker, ENCHANT_MYSPELL, dirname( __FILE__ ));

But, of course, nothing works, either.

I switched to aspell, but the dictionary that is created by SphinxSearch_setup.php won't work, even after adding the header. It seems that aspell can't handle numbers. So I removed about 8000 rows from the file that had numbers mixed in, and aspell works! However, it throws this error:

Notice: Uninitialized string offset: 0 in /htdocs/wiki/extensions/SphinxSearch/SphinxMWSearch.php on line 477

It appears that after finding the &, the loop was running two more times with a blank $value. So I added

if ($value) {

before line 477. Aspell now works.

Am I the only one unlucky enough to have these issues with enchant?

SphinxSearch 0.8 upgraded but old version still showing in MediaWiki version page[edit]

I followed the installation prodedure to update from SphinxSearch 0.7.2 to 0.8. Sphinx was updated to 2.0.2beta and MW to 1.18.0 The Specialpage Version still shows SphinxSearch (Version 0.7.2)

How can I check if the new version of SphinxSearch works?

I think the foremost first thing to do is to look at Special:Version and search for SphinxSearch. As with SphinxSearch 0.8+ you should only be able to use MediaWiki's own Special:Search page, any SphinxSearch Special page should be abandoned and inaccessible if not than you are not using SphinxSearch 0.8+ (because the SphinxSearch own special page has been deleted). --MWJames 21:14, 7 December 2011 (UTC)Reply

Special:Version shows "SphinxSearch 0.7.2" MediaWiki's own Special:Search page reports using Sphnix. I tried Special:SphinxSearch but this special page is not known to MW.

Just want to chime in: I have a brand new sphinx install on our MW 1.18 and it also reports 0.7.2 for the version. But more curious than that is that I'm not sure it's actually being used, like this guy mentioned: Extension talk:SphinxSearch#Sphinx 2.0.1 How do i know this is working.3F... I can't find "sphinx" in the html source of the search page at all. Any thoughts? --Chris_Sanny

In both cases the solution is that the installation did advice to download from the SVN trunk but if you instead use the standard download distribution than you will get the 0.7.2 legacy version which will not work with MW 1.17/1.18. At the time of the Sphinx 0.8 release, MW 1.17 was not officially released this is why the 0.7.2 legacy is still available.--MWJames 06:43, 19 December 2011 (UTC)Reply

Suggested change to sql_query for better page title indexing[edit]

I found that leaving the underscores in the wiki page titles hindered accurate search of them.

here's a suggested change to the sql_query to remove the underscores.

Before sql_query = SELECT page_id, page_title, page_namespace, page_is_redirect, old_id, old_text FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id

After sql_query = SELECT page_id, REPLACE( page_title, '_', ' ' ) as page_title, page_namespace, page_is_redirect, old_id, old_text FROM page, revision, text WHERE rev_id=page_latest AND old_id=rev_text_id

incategory vs. incategorytree[edit]

due to the way we use categories in hewiki, the "incategory" capability is much less useful than it could/should have been. the problem is that we put stuff in subcategories, and when we do, we remove it from the top-level category. this goes pretty deep, and as a result it is not very useful to search in one particular category, b/c the "leaves" of this category tree contain relatively small number of articles.

to demonstrate: "Niels Bohr" is in the category "Danish Physicists", but not in its supercategory "Physicists", so the search

Niels incategory:Physicists


does not show any article about a physicist that contains the word "Niels". (same in enwiki, except there, one should replace "Physicists" (no such category) with the supercat "Physicists by nationality")

so what i'm asking is either to change the logic, such that "incategory" will really mean "in category tree", or, alternatively, to add one more magic word that would mean "in category tree", e.g. "incategorytree:".

peace, קיפודנחש 22:46, 27 December 2011 (UTC)Reply

At first you might want to specify what Sphinx version you are referring to otherwise giving any advice would be difficult. On the second, this interface is maintained on a voluntary basis therefore any request might take a while before it is implemented but you are always free to provided patches that improves the extension.
As for Sphinx Extension 0.8+, if you want a particular functionality to be implemented that is close to your specific search scenario you might consider to use the newly created SpecialSearch.php hooks in MW 1.18 which allows to create a new search profile and where you can add additional search options and logic that can be transferred to the Sphinx search engine. The Extension:Translate shows how such search profile can be developed (Extension:Translate does not make use of SphinxSearch but it shows on how to implement such search profile).--MWJames 00:08, 28 December 2011 (UTC)Reply

Fatal error: Class 'SphinxSearch' not found error after upgrade[edit]

I upgraded to SphinxSearch Version 0.8.5 along with Sphinx 2.0.3. Now I receive an error: "Fatal error: Class 'SphinxSearch' not found in /usr/share/mediawiki/includes/search/SearchEngine.php on line 439" while trying to search.

Without knowing your setup, I don't think you have upgraded to 0.8.5, since the only class in 0.8.5 that is registered is SphinxMWSearch and it seems you are still using 0.7 version where a class SphinxSearch is present. Please make sure you are using [2] for the download of 0.8.5 and follow the steps described on the extension page.--MWJames 10:50, 10 January 2012 (UTC)Reply
Check if $wgSearchType = 'SphinxMWSearch'; The error occurs if you forget to change the wgSearchType during the update.

Search wiki using Sphinx[edit]

We installed the new sphinx extension but how do i know if i am using sphinx and not the mediawiki internal search ?

In the previous verison we had in the search page the title "Search wiki using Sphinx"

In the new wersion Version 0.8.5) i do not see something special ?

Cheears solab

Please, see Step 10 Show Sphinx Search Support --MWJames 23:57, 31 January 2012 (UTC)Reply

Whats up with this search?[edit]

Hi there, for some reason I updated all components of our intranet. Additionally I wanted to improve the search of MW so I forced the users to use Vector skin, enabled its simple Search ($wgVectorUseSimpleSearch) and the $wgEnableMWSuggest. Now we're using the following parts:

  • MW 1.18.1
  • PHP 5.3.9 (will be patched next week)
  • MySQL 5.5.20
  • IIS 6.0
  • Windows Server 2003 R2, Standard, DE, x86

It is still a bad search so I researched a little bit and stuck at Sphinx. After installing and configuring all the components and settings I wonder if it is really a difference.

The version of the search extension is 0.7.2 and sphinx itself is 2.0.3 but where the hell is the difference?

  • I already checked the console: It displays what I searched for starting with the timestamp, including query time and other information. This seems to be the correct check or improvement that the extension would work.
  • I checked the 'implement the logo' variant: The 'powered by sphinx' logo is displayed on every page. So it seems to be running.

But what the heck? When I search for DataLi nothing is found but there is a page called DataLink. Do I need a different search engine? Whats the common definition of search? -.- -- Norman S 13:11, 8 February 2012 (UTC)

At first Whats up with this search... or ... what the heck? using this sort of language will not spark the likelihood of support, as you might have noticed all the work that is been done here is on a voluntary basis. On the second, it has been prominently noted that you should use SphinxSearch 0.8+ but you reported using 0.7.2 so you might want update your version in order to be able to work with MW 1.18 and Sphinx 2.0.3. On the third, this extension only handles the communication between MediaWiki and Sphinx any specifics related to any search feature (character sets, ability to search with *, search categories, minimum length on search terms etc.) are handled in Sphinx itself (see sphinx.conf file). On the fourth, Sphinx requires you to run the indexing process on a regularly basis to ensure that the MediaWiki data and stored keywords in Sphinx correlate and are in sync (see Step 6 Incremental Updates). A final note, if you don't think SphinxSearch works for your environment feel free to refer to the Comparison matrix where you can find other search engines. --MWJames 08:37, 10 February 2012 (UTC)Reply
Okay your right. I was wrong with using these terms - I want to apologize! But using hours and hours for this extension with lots of hope to get a good search made me flustered. Anyways I downloaded the extension from the Download site instead of the SVN trunk. --Norman S 07:49, 13 February 2012 (UTC)Reply
Well, back again. How can I approve that SphinxSearch is working properly? e.g. on one hand the minus affects the results searching for "profile -norman" doesn't bring up my own profile. But when I'm using $wgSphinxSearch_matches = 5; it still displays a lot results. --Norman S 09:37, 13 February 2012 (UTC)Reply

Fixed two issues[edit]

I thought I'd just report fixing a couple of issues. I just upgraded to sphinx-2.0.3-release.tar.gz and to SphinxSearch-trunk-r107287.tar.gz. Please consider this feedback on this version, and I hope this helps someone else with the same issues (or gets me feedback on how to fix better!):

Did You Mean[edit]

For some reason, Did You Mean stopped working. It worked before the upgrade. Hmmmm... Nothing I set in the LocalSettings.php worked.

# "Did You Mean" support
#$wgSphinxSuggestMode = 'soundex';
$wgSphinxSuggestMode = 'aspell';
$wgSphinxSearchAspellPath = '/usr/bin/aspell';
$wgSphinxSearchPersonalDictionary = '/srv/www/htdocs/mediawiki/extensions/SphinxSearch/aspell.en.pws';
$wgSphinxSearchPspellDictionaryDir = "/usr/lib/aspell-0.60";

After about a week of trying various things, I decided to go hunting in the ./mediawiki/extensions/SphinxSearch/SphinxSearch.php file and found these lines:

$wgSphinxSuggestMode = '';
$wgSphinxSearchAspellPath = 'aspell';
$wgSphinxSearchPersonalDictionary = '';

Dopey me. I initialized by variables before the "include" statement. Moving the variables to after the "include" fixes the issue.

Uninitialized string offset[edit]

In /var/log/apache2/error_log I was seeing this:

PHP Notice:  Uninitialized string offset:  0 in /srv/www/htdocs/mediawiki/extensions/SphinxSearch/SphinxMWSearch.php on line 477

I'm not versed in PHP, but it seems that from Googling the error that the array is empty. In SphinxMWSearch.php I inserted the following:

(line 477):  if ( isset($value[0]) ) {
(line 491):  }

This just adds a check to see if the array is set first. I'm not sure this is correct way to fix this issue, but at least it keeps the error from popping up in the error_log file.

Searching thru wiki if there are already other (non-wiki) sites active[edit]

Hello,

I have some trouble regarding a mixed installation of sphinx search for a wiki and non-wiki. The indexer worked fine. I can find results using the search command from my shell. I have set $wgSphinxSearch_index = "wiki_main"; and $wgSphinxSearch_index_list = "wiki_main,wiki_incremental"; The port is set to 9312. The trouble is, that the search within the wiki does not work. I do not see any result, I do not see something in the query log. But when I start the sphinx daemon for the wiki alone, I get suddenly results. Any clue where I should look at? I am using Sphinx 2.0.3 and Extension:SphinxSearch 0.8.5 --Hjmaier (talk) 08:39, 14 March 2012 (UTC)Reply

Problem solved - max_matches was the reason[edit]

I did it :)

Just a hint for others: starting the search daemon with the --console option might help.

I got the following error when I performed a search on the wiki:

query error: per-query max_matches=1000 out of bounds (per-server max_matches=250)

The soloution was easy. I just had to raise the max_matches in my sphinx.conf --Hjmaier (talk) 00:38, 16 March 2012 (UTC)Reply

link to page containing document[edit]

I am maintaining a wiki on a local intranet. Could you tell me if Zend search will show the link to the wiki page which contains an indexed document or if it only links to the indexed document.

Our current search will find a phrase in 'aaa.pdf' but when you click the link, you can only open 'aaa.pdf', you cannot see where 'aaa.pdf' is linked on the wiki.

Thanks.

Faulty Search Results[edit]

First off, thank you for developing this extension, it works great when configured properly.

I host 3 mediawiki installations. Everything is installed correctly as far as I can tell, but only one of the wikis actually gets relevant results. Each wiki has a separate main index and incremental source to use within sphinx.conf... each have their respective unique wiki databases to pull from. I noticed that the one index that is working is the last one configured in the .conf. The other two show really wonky search results that don't really apply to the query, and some queries using custom nomenclature that should show a bunch of results, show none.

Any direction would help. Searching for answers ended up in a maze of conflicting suggestions. --Teststudent (talk) 19:50, 1 May 2012 (UTC)Reply