Jump to content

Extension talk:CirrusSearch/2016

Add topic
From mediawiki.org
Latest comment: 4 years ago by Revansx in topic Failed/Stuck jobs

Discussion related to the CirrusSearch MediaWiki extension.

See also the open tasks for CirrusSearch on phabricator.

Cirrus Search installation & pdf search?

[edit]

I have installed extension Elastica & Cirrus Search, it shows in my Special:version.

This is a private wiki that i have in virtual machine on my network.

In my settings, i included this: $wgCirrusSearchServers = array( 'localhost' );

  1. Also, with cirrus search, is it capable of search text in a pdf that has been uploaded?
  2. How do i know if cirrusSearch actually work on the search bar?

Thanks! AmazingTrans (talk) 21:18, 6 January 2016 (UTC)Reply

  1. Yes, according to [1] 37.4.38.243 (talk) 19:11, 12 January 2016 (UTC)Reply
Will I also need to install elasticsearch for my priavtewiki to work? AmazingTrans (talk) 02:14, 13 January 2016 (UTC)Reply
Yes, this (and other) dependencies are listed there: Extension:CirrusSearch
It would be great if you could tell whether configuring Elasticsearch with CirrusSearch fails on your system too (see "Installing CirrusSearch produces errors" below). 146.52.52.152 (talk) 16:13, 14 January 2016 (UTC)Reply
edit: No, of course not. I misinterpreted your question.
MediaWiki works well without elasticsearch, but if you want to search with CirrusSearch, it needs elasticsearch. 146.52.52.152 (talk) 16:18, 14 January 2016 (UTC)Reply
I have elasticsearch working.
I tried running the following script: updateSearchIndexConfig.php. And i have the following fatal error.
root@linux:/opt/apps/mediawiki/htdocs/extensions/CirrusSearch/maintenance# php updateSearchIndexConfig.php                                                                                  content index...        Fetching Elasticsearch version...2.1.1...ok        Scanning available plugins...                head        Infering index identifier...PHP Fatal error:  Wrong parameters for Exception([string $exception [, long $code [, Exception $previous = NULL]]]) in /opt/apps/mediawiki/htdocs/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Exception/ResponseException.php on line 34       
AmazingTrans (talk) 15:33, 18 January 2016 (UTC)Reply
Sorry for my little english...
I think you should use an older version of elasitchsearch, I have installed 1.7.5 and your command run without error.
Now on my private wiki, search run on cirrusearch, but it doesn't index PDF files... I must investigate :-) AndreaBaitelli (talk) 15:23, 25 February 2016 (UTC)Reply
AndreaBaitelli is correct, currently CirrusSearch requires elasticsearch 1.7.x to run. Support for 2.x (we will probably start with 2.2.x, but will try and test with 2.1.x as well) will be coming in perhaps the next three to four months. EBernhardson (WMF) (talk) 22:06, 3 March 2016 (UTC)Reply
To follow up here the current master branch of CirrusSearch now supports 2.x, and that is being run in production at WMF. The release branches are still on 1.7 though. Unfortunately we were not able to make it compatible with 1.x and 2.x in the same codebase. EBernhardson (WMF) (talk) 21:29, 15 June 2016 (UTC)Reply

Installing CirrusSearch produces errors

[edit]

I have installed elasticsearch, Elastica and CirrusSearch (both showing up in my Special:Version / answering to 'curl http://localhost:9200/_nodes/process?pretty' ).

But executing 'php extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php' just produces:'

content index...

Fetching Elasticsearch version...1.7.3...ok

Scanning available plugins...none

Infering index identifier...wikidb_content_first

Picking analyzer...german

Index exists so validating...

Validating number of shards...ok

Validating replica range...ok

Validating shard allocation settings...done

Validating max shards per node...ok

Validating analyzers...ok

Validating mappings...

Validating mapping...ok

Validating cache warmers...

Updating <my Main Page>...PHP Warning:  Unexpected connection error communicating with Elasticsearch.  Curl code:  52 [Called from ElasticaConnection::{closure} in /var/www/html/wiki/extensions/Elastica/ElasticaConnection.php at line 104] in /var/www/html/wiki/includes/debug/MWDebug.php on line 300

Unexpected Elasticsearch failure.

Http error communicating with Elasticsearch:  Unknown error:52.'

According to curl, error code CURLE_GOT_NOTHING (52) means: 'Nothing was returned from the server, and under the circumstances, getting nothing is considered an error.'

In addition, the elasticsearch-server crashes without any hint in the log.

Is this a CirrusSearch- /Elastica- /elasticsearch-Bug? 37.4.38.243 (talk) 19:24, 12 January 2016 (UTC)Reply

anything that manages to crash elasticsearch would have to be an elasticsearch bug. It may *also* be a cirrussearch bug that triggers the elasticsearch bug, but hard to say without logs. Could you verify elasticsearch doesn't log anything? i would expect to find data in /var/log/elasticsearch/ (depends on package used to install/elasticsearch configuration) EBernhardson (WMF) (talk) 17:29, 21 January 2016 (UTC)Reply
If elastic crashed without producing any logs it's possible that the jvm crashed. Could you check if you find on the elastic node (certainly in /tmp) a file named hs_err_pidXXX.log and see it contains any hints.
You don't seem to have the wikimedia-extra plugin installed could you make sure that $wgCirrusSearchUseExperimentalHighlighter is set to false and $wgCirrusSearchWikimediaExtraPlugin is set to array(). DCausse (WMF) (talk) 17:42, 21 January 2016 (UTC)Reply
elasticsearch didn't log anything because the JVM crashed (I found a file named hs_err_pidXXX.log).
I did some research and found out that Debian's openjdk-7-jre-headless, 7u91-2.6.3-1, armhf was the reason (see https://discuss.elastic.co/t/openjdk-on-arm-crashes-elastic/27283).
Replacing it with openjdk-8-jre-headless, 8u72-b05-6, armhf resolved the issue, it's working fine now!
Thank you, I completely forgot to check the status of the JVM, I only focused on trying different versions of elasticsearch/Mediawiki/Cirrussearch... 146.52.52.152 (talk) 20:28, 23 January 2016 (UTC)Reply

Cirrus Search updateSearchIndexConfig.php PHP Fatal Error

[edit]

I tried to install Cirrus Search, when I'm executing the updateSearchIndexConfig.php I get the following error:

root@wiki:/var/www/html# php extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php

content index...

Fetching Elasticsearch version...2.1.1...ok

Scanning available plugins...none

Infering index identifier...PHP Fatal error:  Wrong parameters for Exception([string $exception [, long $code [, Exception $previous = NULL]]]) in /var/www/html/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Exception/ResponseException.php on line 34 Fschloegl (talk) 17:11, 16 January 2016 (UTC)Reply

Same here. 206.51.148.2 (talk) 23:01, 18 January 2016 (UTC)Reply
I think this has something to do with the elasticsearch version. AmazingTrans (talk) 20:01, 19 January 2016 (UTC)Reply
CirrusSearch does not yet support elasticsearch >=2.0. You will need to use elasticsearch 1.7.x currently. We expect to be upgrading to support >= 2.0 in the april-june timeframe. EBernhardson (WMF) (talk) 17:06, 21 January 2016 (UTC)Reply
I've also pulled a patch forward into the REL1_26 branch which should be merged in the next day or two which will error at the correct place here, when checking the elasticsearch version and give a better error message EBernhardson (WMF) (talk) 17:13, 21 January 2016 (UTC)Reply
Hello CirrusSearch-Team,
fist of all: Thanks for providing a way to connect ElasticSearch to MediaWiki!
Second of all: I would like to provide you with my error messages to help with the development and to get help for myself! :-)
I am running the following software [Special:Version]
MediaWiki:1.26.2
PHP: 5.6.17-0+deb8u1 (apache2handler)
MySQL: 5.5.47-0+deb8u1
Elasticsearch: 1.7.5
CirrusSearch: 0.2 (Hash: c80d8ec6816d01b52033015d4c710035c4f2db4a)
Elastica: 1.3.0.0 (Hash: 2703907a9688c07b45f8d4a7e98a2c1d0047c857)
LocalSettings is configured to $wgDisableSearchUpdate = true;
According to [https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCirrusSearch.git/HEAD/README README-instructions] I am stuck at the point where I build/update search index.
__________
$ php updateSearchIndexConfig.php
content index...
Fetching Elasticsearch version...1.7.5...ok
Scanning available plugins...none
Infering index identifier...mwiki_CP_1262-mwiki_CP__content_first
Picking analyzer...german
Creating index...
Unexpected Elasticsearch failure.
Elasticsearch failed in an unexpected way.  This is always a bug in CirrusSearch.
Error type: Elastica\Exception\ResponseException
Message: InvalidIndexNameException[[mwiki_CP_1262-mwiki_CP__content_first] Invalid index name [mwiki_CP_1262-mwiki_CP__content_first], must be lowercase]
Trace:
#0 /var/www/html/wiki_1262/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Request.php(171): Elastica\Transport\Http->exec(Object(Elastica\Request), Array)
#1 /var/www/html/wiki_1262/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Client.php(621): Elastica\Request->send()
#2 /var/www/html/wiki_1262/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Index.php(494): Elastica\Client->request('mwiki_CP_1262-m...', 'PUT', Array, Array)
#3 /var/www/html/wiki_1262/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Index.php(284): Elastica\Index->request('', 'PUT', Array, Array)
#4 /var/www/html/wiki_1262/extensions/CirrusSearch/includes/Maintenance/Validators/IndexValidator.php(124): Elastica\Index->create(Array, false)
#5 /var/www/html/wiki_1262/extensions/CirrusSearch/includes/Maintenance/Validators/IndexValidator.php(94): CirrusSearch\Maintenance\Validators\IndexValidator->createIndex(false)
#6 /var/www/html/wiki_1262/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(285): CirrusSearch\Maintenance\Validators\IndexValidator->validate()
#7 /var/www/html/wiki_1262/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(226): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->validateIndex()
#8 /var/www/html/wiki_1262/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(49): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->execute()
#9 /var/www/html/wiki_1262/maintenance/doMaintenance.php(103): CirrusSearch\Maintenance\UpdateSearchIndexConfig->execute()
#10 /var/www/html/wiki_1262/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(56): require_once('/var/www/html/w...')
#11 {main}
__________
Is there a way to help me with the next steps? What can I do?
Timaios. T-maios (talk) 21:52, 23 March 2016 (UTC)Reply
Which versions of the following packages do you have installed?
I had no problems with these versions:
elasticsearch        1.7.1
libjna-java        4.1.0-1
libjna-jni        4.1.0-1
libelasticsearch1.7-java    1.7.3+dfsg-3
MediaWiki         1.26.2
Elastica        1.3.0.0
CirrusSearch        0.2 Ezivert (talk) 14:30, 29 March 2016 (UTC)Reply
I do have the same versions installed as you mentioned above, except for:
the problem with indices needing to be lowercased looks to be a bug in cirrus, it looks like your wiki name has capitals and we don't deal with that. I've filled https://phabricator.wikimedia.org/T135021 to track fixing this problem. Should be a relatively easy fix. EBernhardson (WMF) (talk) 17:26, 11 May 2016 (UTC)Reply
Hey, thanks for attending to that issue.
Those capitals are part of the database name - in case that makes a difference:
"mwiki_CP_1262-mwiki_CP__content_first"
<InvalidIndexNameException[[MediaWiki_content_first] Invalid index name [MediaWiki_con
tent_first], must be lowercase]>
To solve:
  1. Take a dump of your mediawiki sql table, eg, "MediaWiki" and rename the database to all lowercase, ie "mediawiki" less the apostrophes. I did this everywhere in the database (~448 instances) using Notepad++.
  2. Upload the new lowercased database into mysql.
  3. Open your localsettings.php file in the wiki folder and change the database name from the original uppercase to the lowercase version. Your wiki will now point to the new lowercase database.
  4. Run the script to create the index: php /var/www/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php
  5. If OK, you'll see the script fetch elasticsearch version...1.7.2...ok ...blah blah blah ...Indexing namespaces...done. 64.166.54.199 (talk) 20:42, 30 August 2016 (UTC)Reply

Not finding "Dates" when searching for "Date"

[edit]

Sorry if this has been asked before.

I recently replaced the regular MediaWiki Search with Cirrus/Elastic, and was hoping it would make searching easier/better for the users.

One particular thing we were hoping for was better searches without having to use wildcards.

Example: When searching for "export" we get both results with "export" and "exported" but when searching for "date" we don't get results with "dates". How do i change the settings to find "dates" when searching for "date"? Jehrbo (talk) 10:05, 12 February 2016 (UTC)Reply

Hello @Jehrbo, I believe CirrusSearch does this by default. It is refereed to as 'stemming' - at least according to the help documentation! Just to clarify, are your referencing the list of suggestions when typing queries into the search box, or the actual list of search results? CKoerner (WMF) (talk) 15:51, 12 February 2016 (UTC)Reply

Preventing sharing some indexes with other nodes/clusters

[edit]

Hello,

Using MW 1.23 version I was trying to share with another node ES content index but not general index. Is there a way to do this? I tried to put $wgCirrusSearchShardCount of general index for that node to 0, but it didn't work. Is it possible? Thanks! Toniher (talk) 21:26, 20 February 2016 (UTC)Reply

I tried to handle this via ES: http://stackoverflow.com/questions/35552337/prevent-some-indexes-to-be-shared-with-other-nodes-in-elasticsearch-1-7-x Toniher (talk) 10:40, 29 February 2016 (UTC)Reply
I think you're stackoverflow question hit on the right answers. You would want to set general to 1 shard, and content to 2 shards (if you have 2 servers for example). Then utilize elasticsearch's functionality for forcing that general shard to not be assigned to a particular node. EBernhardson (WMF) (talk) 22:04, 3 March 2016 (UTC)Reply

Avoid some parts of the content page to be included in the text indexed fields

[edit]

Hello, Is there any way to avoid that part of the contents of the page to be included in the resulting text field in ElasticSearch?

Thanks! Toniher (talk) 11:32, 29 February 2016 (UTC)Reply

Hi@Toniher, there is a task for just this functionality! The example on that task is for excluding navigation templates. What are you trying to exclude? More examples might be helpful in understanding the use of this functionality.
(P.S. Thanks again for a great SMWCon in Barcelona!) CKoerner (WMF) (talk) 16:19, 29 February 2016 (UTC)Reply
Hi Chris, thanks for pointing this task. I'm trying to handle this in a 3rd party wiki. I would also be interested for navigation templates and other kind of templates that may contain information / metadata about the page, and so, not being strictly page content pieces... Toniher (talk) 17:26, 29 February 2016 (UTC)Reply

Customizing the Special:Search form?

[edit]

I want to customize the Search form displayed on the Special:Search page. I'm not sure if CirrusSearch changes it in any way, but I figured I'd ask here anyway since I am using CirrusSearch.

Basically, I want to add a placeholder field and an autofocus field to the search box displayed on Special:Search. Is this possible? How do I do this?

Thank you. Swennet (talk) 15:52, 31 March 2016 (UTC)Reply

My command lines for install CirrusSearch on Debian

[edit]

As I struggled to install CirrusSearch, I wanted to share the command lines that allowed me to install the engine on my debian. It was difficult because there are many steps to accomplish rigorously and to adapt to the system.

# wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.3.deb

# dpkg -i elasticsearch-1.7.3.deb

# service elasticsearch status

[…]

Active: inactive (dead)

[…]

# service elasticsearch start

# service elasticsearch status

[…]

Active: active (running)

[…]

# systemctl enable elasticsearch.service

# wget https://extdist.wmflabs.org/dist/extensions/Elastica-REL1_26-2703907.tar.gz

# tar -xzf Elastica-REL1_26-2703907.tar.gz -C /srv/wiki/extensions

# echo "wfLoadExtension( 'Elastica' );" >> /srv/wiki/LocalSettings.php

# wget https://extdist.wmflabs.org/dist/extensions/CirrusSearch-REL1_26-c80d8ec.tar.gz

# tar -xzf CirrusSearch-REL1_26-c80d8ec.tar.gz -C /srv/wiki/extensions

# echo 'require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";' >> /srv/wiki/LocalSettings.php

# echo "Navigate to Special:Version on my wiki to verify that the extensions are successfully installed: OK"

# echo "read the instructions in https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCirrusSearch.git/HEAD/README"

# apt install php5-curl

# echo '$wgDisableSearchUpdate = true;' >> /srv/wiki/LocalSettings.php

# echo '"Configure your search servers in LocalSettings.php if you aren't running Elasticsearch on localhost:" Elasticsearch IS on localhost'

# php /srv/wiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php

(all ok…)

# sed -i "/wgDisableSearchUpdate/d" /srv/wiki/LocalSettings.php

# php /srv/wiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip

[…]

Indexed a total of 3 pages at 2/second

# php /srv/wiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse

[…]

Indexed a total of 3 pages at 11/second

# echo "\$wgSearchType = 'CirrusSearch';" >> /srv/wiki/LocalSettings.php

And it's done!

If there are errors or omissions, you can tell me. Ascax (talk) 11:46, 1 May 2016 (UTC)Reply

Installing on MW 1.23 with Centos 7/Cpanel

[edit]

Is this configuration possible? I have a small wikifarm setup currently using Lucene/Centos 5/Cpanel and plan to migrate to Centos 7 (which does not provide options to install Tomcat on which Lucene depends). Would anybody be able to take this a small job? Spiros71 (talk) 12:16, 23 May 2016 (UTC)Reply

I'm not sure about your specific issue, but all the dependencies for elasticsearch are shipped within the packages provided by http://elastic.co. Once up and running CirrusSearch talks to elasticsearch over http with a php library that talks http. As for the specifics of Centos/cpanel i really don't know much about them. Good luck! EBernhardson (WMF) (talk) 23:30, 23 May 2016 (UTC)Reply

CirrusSearch for MW1.23 with ICU plugin support?

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I had Lucene installed and I upgraded to Elastica for a Ancient Greek dictionary. However, although Lucene by default would provide diacritics-insensitive autocomplete and search results, it appears that for Elastica 1.7 one needs to use the ICU plugin. But no such option appears in CirrusSearch for MW1.23 (nor in CirrusSearch.php For MW1.26, I could only see it here as $wgCirrusSearchUseIcuFolding). Is there any way to provide diacritics-insensitive autocomplete for MW1.23? Spiros71 (talk) 17:46, 11 June 2016 (UTC)Reply

Cirrus will use the ICU plugin as long as it is installed in elasticsearch. Note that cirrus uses the ICU analyzer only for unicode normalization but still uses the asciifolding filter for diacritics-insensitive search.
Icu folding support is very new in cirrus and is only supported by the completion suggester. It's still experimental because it lacks the option preserve_original.
Concerning MW1.23 I don't see an option that would allow you to activate it by a simple config flag. If your wiki is configured as greek then some diacritics will be handled in fulltext search (not autocomplete) by the greek analyzer (but as far as I know not all ancient greek stress marks are supported by the lucene greek analyzer).
Unfortunately the only option that does not involve a modification in the CirrusSearch source code would be to use a more recent version of cirrus and activate the completion suggester with the $wgCirrusSearchUseIcuFolding option enabled. DCausse (WMF) (talk) 07:51, 13 June 2016 (UTC)Reply
Thank you very much for the reply, David. This is quite strange, as Lucene with MWSearch (which was installed before) performed the diacritics-insensitive autocomplete by default; sounds like a standard functionality was lost with Elastica?
I just checked searching for αιων in https://en.wiktionary.org and neither the autocomplete nor the search results display the ancient Greek equivalent (only difference being the diacritics). However, the word https://en.wiktionary.org/wiki/αἰών exists, and it can only be found if one enters the exact diacritics, which is quite cumbersome and not very user-friendly.
Checking the Cirrus extension code in different versions for MW, I could not find any differences indicating IcuFolding support. Which Cirrus extension versions for MW support enabling IcuFolding via a config flag? Would they work with MW 1.23? Spiros71 (talk) 10:31, 13 June 2016 (UTC)Reply
IcuFolding is currently enabled only for autocomplete queries on the greek wikipedia (searching for ανθρακας in the autocomplete box will suggest Άνθρακας, or your example with αιων you'll see Αιώνας suggested).
I agree with you this is a major regression compared to MWSearch for non-latin wikis.
I doubt that the Cirrus version supporting IcuFolding will work with MW 1.23, but I think the code change would be minimal to enable it for your wiki.
If you feel comfortable hacking some PHP code then you can probably add the code to support it?
I can assist you if you send me a link to the version of the file extensions/CirrusSearch/includes/Maintenance/AnalysisConfigBuilder.php you are using? DCausse (WMF) (talk) 16:07, 13 June 2016 (UTC)Reply
Thank you very much, David. If you could help it would be much appreciated. Currently the index for that wiki is being built. http://www.translatum.gr/downloads/AnalysisConfigBuilder.zip Spiros71 (talk) 16:36, 13 June 2016 (UTC)Reply
This version AnalysisConfiguBuilder.phpshould allow you to force icu folding for fulltext and autocomplete searches.
You will have to:
  1. set $wgCirrusSearchForceIcuFolding = true for this wiki
  2. make sure the icu analyzer plugin properly installed on your elasticsearch cluster
  3. Reindex your wiki with this new config
Note that you will lose the behavior behind ''preserve_original'', basically this feature allows search queries that include words with diacritics to rank pages with words that match the diacritics higher. Example: searching for thé would certainly display pages with the word thé first then pages with the word the, without preserve_original elasticsearch will make no distinction beteween thé and the.
Note that I haven't tested this code. DCausse (WMF) (talk) 18:13, 13 June 2016 (UTC)Reply
Thank you so much David. So my LocalSettings.php will be like this before indexing?
require_once "$IP/extensions/Elastica/Elastica.php";
require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";
$wgCirrusSearchServers = array( '127.0.0.1' );
#$wgDisableSearchUpdate = true;
$wgSearchType = 'CirrusSearch';
$wgCirrusSearchForceIcuFolding = true; Spiros71 (talk) 07:45, 14 June 2016 (UTC)Reply
Yes, make sure:
- you use the modified version of AnalysisConfigBuilder.php I provided
- analysis-icu plugin is properly installed on elasticsearch
Good luck. DCausse (WMF) (talk) 13:47, 14 June 2016 (UTC)Reply
Thank you. I just tried, when I enter "ειμι" in search it will not display "εἰμί". Similarly, when I enter "ει" it will not display "εἴλω" or other words with diacritics on "ι". Spiros71 (talk) 06:59, 15 June 2016 (UTC)Reply
Can you verify that "asciifolding"/"aciifolding_preserve" under the "filter" section uses "type": "icu_folding" when looking at api.php?action=cirrus-settings-dumpon your wiki?
If your wiki looks like the example I provided ("type" : "asciifolding") then icu_folding is probably not activated. It's either :
- a bug in the modified php file I provided
- analysis-icu plugin is not installed
- you did not reindex your wiki properly DCausse (WMF) (talk) 07:20, 15 June 2016 (UTC)Reply
http://lsj.translatum.gr/w/api.php?action=cirrus-settings-dump Spiros71 (talk) 07:28, 15 June 2016 (UTC)Reply
I don't know why this api is not working properly on your wiki...
But you can still display the index settings by asking elasticsearch directly:
- identify the index: curl -s localhost:9200/_cat/indices : you should see a list with wikiname _ (content or general) _ (first or a timestamp)
- identify the one that matches your wiki name then : curl -s 'localhost:9200/wikiname_content_XXXXX?pretty=true' DCausse (WMF) (talk) 07:42, 15 June 2016 (UTC)Reply
https://drive.google.com/file/d/0BzPJhpxReFs4ODd2dkN0VUJWaHM/view?usp=sharing
I can confirm that analysis-icu plugin is installed OK. Spiros71 (talk) 08:09, 15 June 2016 (UTC)Reply
OK, the settings are not correct and still uses the wrong asciifolding filter:
- Did you restart elasticsearch after installing the plugin? Can you see the plugin when running: curl -s 'localhost:9200/_nodes/plugins?pretty=true'
- How did you reindex the wiki? The timestamp in the settings indicates that this index was created on monday 2pm UTC.
Maybe you used forceSearchIndex? To reindex and update index settings you need to run extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php  --reindexAndRemoveOk --indexIdentifier now on this wiki. DCausse (WMF) (talk) 08:25, 15 June 2016 (UTC)Reply
Thanks! Looks fine now https://drive.google.com/file/d/0BzPJhpxReFs4ajFIVVN3WlhFR00/view?usp=sharingMuch indebted. Spiros71 (talk) 06:54, 16 June 2016 (UTC)Reply
I'm glad it worked in the end.
I hope it'll work as you expect, it's really something we'd like to fix properly in Cirrus so please comment on T129545 if you encounter any strange/undesirable behaviors related to folding. DCausse (WMF) (talk) 08:04, 16 June 2016 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Elasticsearch 2.x

[edit]

Looks like Elasticsearch 2.3.3 is supported now (Special:Version on WPEN says so). Documentation still says only 1.x is supported. Any news on this? Osnard (talk) 09:54, 23 June 2016 (UTC)Reply

The released version of CirrusSearch, in the REL1_27 branch, supports elasticsearch 1.x. The master branch, which will eventually be REL1_28, supports 2.x. EBernhardson (WMF) (talk) 16:31, 29 June 2016 (UTC)Reply
Will the CirrusSearch RHEL1_28 branch also support the Mediawiki LTS version 1.27 ? 194.53.161.129 (talk) 08:58, 1 August 2016 (UTC)Reply
Unfortunately i doubt we will be updating 1.27 to support elasticsearch 2.x. The difficulty is that the query api used by elasticsearch in 2.x is different enough from 1.x that supporting both at the same time proved to be difficult. To backport the 2.x changes we would have to remove support for 1.x, which doesn't seem like the right thing to do in an LTS release. EBernhardson (WMF) (talk) 20:43, 15 August 2016 (UTC)Reply
On the other hand, the latest elasticsearch 1.x version (1.7) is EOL in januari 2017, which would leave the Mediawiki LTS without a supported ES version. Josmeer (talk) 08:32, 22 August 2016 (UTC)Reply

Implementation through OpenShift

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


  • Was using shared hosting so no Java support.
  • Used an already available OpenShift Elasticsearch v1.7.1 cluster.
  • On browsing to the OpenShift app, Elasticsearch service was working.
  • Using curl -X GET through shared hosting shell also worked.
  • Also, used ping through SSH, to make sure the app was accessible.
  • Had the following LocalSettings.php configuration:
wfLoadExtension( 'Elastica' );
require_once( "$IP/extensions/CirrusSearch/CirrusSearch.php" );
$wgDisableSearchUpdate = true;
$wgCirrusSearchServers =  '<app-name>.rhcloud.com' ;
  • Extensions were showing to be implemented on Special:Version.
  • Running updateSearchIndexConfig.php, gave the following error:
Fetching Elasticsearch
Unexpected Elasticsearch failure.
Http error communicating with Elasticsearch:  Operation timed out.

Has anyone else come across the "Operation timed out" error? AhmadF.Cheema (talk) 04:58, 10 July 2016 (UTC)Reply

For the operation to time out, it really sounds like it's being blocked. Could you try running this simple script from your website:
echo file_get_contents( 'http://<app-name>.rhcloud.com:9200/' ); EBernhardson (WMF) (talk) 01:57, 16 July 2016 (UTC)Reply
Tried in SSH:
php -r 'echo file_get_contents ("http://<app-name>.rhcloud.com:9200/");'
which didn't return anything. Then tried, without the port:
php -r 'echo file_get_contents ("http://<app-name>.rhcloud.com/");'
which returned the curl -X GET  values.
I forgot to check OpenShift public ports and apparently 9200 is not one of them. (9200 is open from my Wiki host side)
----
Shouldn't it just work without the port specifications, as I have not even specified it in the "$wgCirrusSearchServers" value?
So, is there some way I can remove port from the equation?
Otherwise I suppose I will have to do port binding, which is kind of complicated to do on OpenShift. AhmadF.Cheema (talk) 10:07, 16 July 2016 (UTC)Reply
try using:
$wgCirrusSearchServers = array(
    array( 'host' => '<app-name>.rhcloud.com', 'port' => 80 ),
);
EBernhardson (WMF) (talk) 15:25, 18 July 2016 (UTC)Reply
OK, thanks.
That worked perfectly. AhmadF.Cheema (talk) 17:16, 18 July 2016 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

CirrusSearch: Index not updated automatically

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Hello

I have a problem: CirrusSearch Index not updated even I run maintenance/runJobs.php. I see right lines in the jobs ran, but when i request /wiki/NewPageTitle?action=currusdump it returns empty dataset ([]). Is there something to read about CirrusSearch Index updating?

Thank you,

Serge Sergezolotukhin (talk) 09:33, 18 July 2016 (UTC)Reply

I solved this by myself: the reason is $wgDisableSearchUpdate = true; :-) Sergezolotukhin (talk) 09:51, 18 July 2016 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Cannot Install CirrusSearch Extension

[edit]

I am not able to validate install for download

CirrusSearch-REL1_27-dcb0cf9.tar.gz

on Mediawiki 1.27.0. Mediawiki's Special:Version would crash once I place

require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";

in LocalSettings.php. Is anyone else experiencing this? Thanks. 142.1.184.10 (talk) 19:43, 21 July 2016 (UTC)Reply

Turned out php5-curl is a dependency. This error disappears after it is installed. Could this dependency be pointed out somewhere on the installation page? Thanks. Andychienac (talk) 20:15, 21 July 2016 (UTC)Reply
It looks like the "Installation" section mentions curl as a dependency. "CirrusSearch requires PHP to be compiled with cURL support." Do you think this should be highlighted elsewhere (or with different language if not specific enough?). CKoerner (WMF) (talk) 14:44, 22 July 2016 (UTC)Reply
Thank you for your reply. In hind-sight I did realized the instruction is written after going back and re-reading the page. Perhaps I should have go over the instructions more carefully.
At this point, I am just concerned about the nature of the mistake had one not realize php5-curl is needed as the resulting page after enabling CirrusSearch extension simply shows up as a blank page; the resulting page from Mediawiki installation does not provide any meaningful message that points to the lack of php5-curl. I was fortunate enough to have had some experiencing troubleshooting another extension related issue and utilized the following flags to acquire a message that pointed me to the CURL Php module.
error_reporting( -1 );
ini_set( 'display_errors', 1 );
So I would recommend either of the following
  • Make the Php5-curl requirement stand out more
  • Add in a blurb that lead people back to the Php5-curl requirement to catch those who is surprised to observe a blank page after enabling CirrusSearch add-in. (Either by simply pointing out the missing requirement or by pointing out the above flags).
Thank you for your reply. I am a few day in after having the extensions fully functional and I am loving it. Thank you and your team for your work. Andychienac (talk) 20:04, 25 July 2016 (UTC)Reply
I updated the page to be a little clearer on the requirement. I'll also reach out to the developers and make them aware of the concern regarding notifying admins when curl is not detected. Hope that helps! CKoerner (WMF) (talk) 16:56, 26 July 2016 (UTC)Reply

Error on "Inferring index identifier"

[edit]

Hi,

I've been trying to make CirrusSearch and Elasticsearch work since 4 days, I kinda want to give up now.

Here is my Special:Version page : http://en.pleskdb.ovh/wiki/Special:Version

I have MediaWiki 1.27.0, Elasticsearch 2.3.3, CirrusSearch 0.2 (614b772) and Elastica 1.3.0.0(4607acf).

I'm stuck at this step :

Now run this script to generate your elasticsearch index:php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php

I have the error :

content index...

Fetching Elasticsearch version...2.3.3...ok
Scanning available plugins...none
Inferring index identifier...PHP Fatal error:  Wrong parameters for Exception([string $exception [, long $code [, Exception $previous = NULL]]]) in/var/www/pool.pleskdb.ovh/vendor/ruflin/elastica/lib/Elastica/Exception/ResponseException.php on line 34

What should I do ? I can't find anything on the web about this error.

Thanks. William Gérald Blondel (talk) 20:15, 29 July 2016 (UTC)Reply

I think you use the wrong Elastica version, unfortunately elasticsearch 2.x support is not yet available in an official mediawiki release.
You'll have to wait for the 1.28 release or use the master branch/wmf releases. DCausse (WMF) (talk) 14:34, 3 August 2016 (UTC)Reply

Failed/Stuck jobs

[edit]

We are slowly accruing rows in the job table (for several wikis in a farm) where job_attempts=1 and they all seem to be related to CirrusSearch. They have values for job_cmd of "cirrusSearchLinksUpdate", cirrusSearchLinksUpdatePrioritized", and cirrusSearchIncomingLinkCount".

I searched Phabricator and found T121560 but I don't think this is the same issue. If I run showJobs.php, the output indicates no jobs exist. Darenwelsh (talk) 16:31, 10 August 2016 (UTC)Reply

I took a look around and couldn't figure out what's going on. I'm sure you stumbled on this, which again sounds similar, but not exactly the same.
https://wikitech.wikimedia.org/wiki/Search/Trouble#Updates_don.27t_go_anywhere_and_Cirrus.27s_jobs_look_stuck
Do you have any logs to share? Please do file a phab task if you have a moment. It can't hurt! :) CKoerner (WMF) (talk) 21:03, 10 August 2016 (UTC)Reply
Can you guide me on which logs would be helpful? Darenwelsh (talk) 15:02, 11 August 2016 (UTC)Reply
Whatever became of this? .. We're seeing the same thing. Thanks!
cirrusSearchLinksUpdate jobs are getting stuck, but only for pages with subpages! Revansx (talk) 18:24, 27 July 2021 (UTC)Reply

No results when searching

[edit]

Hi,

MediaWiki 1.26.3
PHP 5.6.24 (apache2handler)
MySQL 5.5.46
Elasticsearch 1.7.5

CirrusSearch 0.2

Elastica 1.3.0.0

Installation was without problems according to https://phab.wmfusercontent.org/file/data/ouaq2ogud2xcltawkhvx/PHID-FILE-pyat6n73gno5m22bmo2r/README

LocalSettings

wfLoadExtension( 'Elastica' );

require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";

#$wgDisableSearchUpdate = true;

$wgCirrusSearchServers = array( 'localhost' );

$wgDebugLogGroups['CirrusSearch'] = "$IP/extensions/CirrusSearch/error.log";

$wgSearchType = 'CirrusSearch';

But when I enter something into the search field I get no results telling me there were no results matching the query.

Any ideas? Donxello (talk) 04:02, 13 August 2016 (UTC)Reply

Hello. U need to run scripts in Cirrus maintanance:
Now run this script to generate your elasticsearch index:
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php
Now remove $wgDisableSearchUpdate = true from LocalSettings.php.  Updates should start heading to Elasticsearch.
Next bootstrap the search index by running:
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse
https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/README Sergezolotukhin (talk) 15:09, 15 August 2016 (UTC)Reply
This is not mentioned anywhere! I installed mediawiki and then the extensions elastica and cirrussearch. Kept getting the error "Fatal exception of type "RuntimeException".
After adding these lines in LocalSettings.php:
$wgCirrusSearchServers = array( 'localhost' );
$wgDebugLogGroups['CirrusSearch'] = "$IP/extensions/CirrusSearch/error.log";
$wgSearchType = 'CirrusSearch';
and running these commands suggested by Sergezolotukhin, the search started working and no where in documentation this is mentioned. 103.70.129.74 (talk) 04:10, 28 February 2018 (UTC)Reply
I'm having the exact same issue. I currently have SphinxSearch installed, but it does not seem to find all the articles I believe it should find. So, I'm attempting to install CIrrus to see how it performs.
{| class="wikitable"
|+My Setup
|-
|Operating System
|CentOS 7
|-
|MediaWiki
|1.27.0
|-
|PHP
|7.0.10
|-
|Apache
|2.4.23
|-
|MySQL
|Amazon AWS Aurora
|-
|ElasticSearch
|1.7.5
|-
|Elastica
|REL1_27-4607acf
|-
|CirrusSearch
|REL1_27-dcb0cf9 (0.2)
|}
Followed instructions here: Extension:CirrusSearch
Installed ElasticSearch via RPM; enabled and started service. Verified working via curl.
Tried enabling Elastica extension using both wfLoadExtension and require_once. Both seem to have the same bearing.
I ran all three maintenance scripts outlined in the README. It indexed nearly 10000 pages.
I've verified that elastic search is actually running:
[centos@ip-10-90-1-9 html]$ sudo systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2016-09-01 15:12:48 UTC; 52min ago
     Docs: http://www.elastic.co
 Main PID: 8839 (java)
   CGroup: /system.slice/elasticsearch.service
           └─8839 /bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccu...
My LocalSettings.php has the following:
wfLoadExtension('Elastica');
require_once("$IP/extensions/CirrusSearch/CirrusSearch.php");
$wgCirrusSearchServers = array('localhost');
$wgSearchType = 'CirrusSearch';
When I uncomment the above settings in my LocalSettings, then attempt to search on my wiki, there are no results. So, for now, I'll have to deal with a half-working Sphinx setup, unless anyone here can suggest what I'm doing wrong? GFXDude2010 (talk) 16:02, 1 September 2016 (UTC)Reply
root@openlist-www ~ # curl '********:9200/_cat/indices?v'
health status index                                 pri rep docs.count docs.deleted store.size pri.store.size
green  open   openlist_ua-wiki__content_first         4   0     193874        45941      2.6gb          2.6gb
green  open   openlist-wiki__content_first            4   0    2530251       723985     28.9gb         28.9gb
green  open   openlist-wiki__general_first            4   0       9792            7     35.8mb         35.8mb
green  open   mediawiki_cirrussearch_frozen_indexes   1   0          0            0       144b           144b
green  open   test                                    1   0          1            0      2.6kb          2.6kb
green  open   openlist_ru-formulars                   1   0    1648710            0    986.6mb        986.6mb
green  open   mw_cirrus_versions                      1   0          6            4     10.2kb         10.2kb
green  open   openlist_ua-wiki__general_first         4   0         63            7    727.4kb        727.4kb
green  open   openlist_ge-wiki__content_first         4   0       3514          953     68.5mb         68.5mb
green  open   openlist_ge-wiki__general_first         4   0         21            0     37.8kb         37.8kb Sergezolotukhin (talk) 14:46, 21 September 2016 (UTC)Reply
Okay. Are there documents in Elastic index? ( curl 'localhost:9200/_cat/indices?v' )
You can discover you Elastic data with DSL:
"query" : {
"match_all" : {}
}
The goal is to find out if data is indexed on not. Sergezolotukhin (talk) 14:43, 21 September 2016 (UTC)Reply
Hi,
I have installed cirrussearch inside mediawiki but it is not searching the word inside any uploaded documents.
Request someone to review and advise if any steps were missed:
Please note, the uploaded documents contains MS WORD, POWERPOINT, PDF'S, EXCEL, MSG (Outlook email) , TXT files.
I followed below steps:-
installed media wiki successfully.
installed elastica inside the extention folder.
installed cirrussearch inside the extention folder
after that I performed steps mentioned in README.txt file
--------------------------------------------------- Instructions in README.TXT file ----------------------------
All elastic versions prior to 5.3.1 have bugs that affect CirrusSearch:
- elastic versions before 5.3.x requires the following config in your LocalSettings.php:
  $CirrusSearchElasticQuirks = [ 'query_string_max_determinized_states' => true ];
- elastic versions before 5.3.1 suffer from a bug that prevent an index to be reindexed
  properly without missing docs when using multiple elasticsearch machines
- when using elastic prior to 5.5.2 with the extra plugin and the super_detect_noop script
  you must activate the "super_detect_noop_enable_native" option (see docs/settings.txt)
Place the CirrusSearch extension in your extensions directory.
Make sure you have the curl php library installed (sudo apt-get install php5-curl in Debian.)
You also need to install the Elastica MediaWiki extension.
Add this to LocalSettings.php:
wfLoadExtension( 'Elastica' );
require_once( "$IP/extensions/CirrusSearch/CirrusSearch.php" );
$wgDisableSearchUpdate = true;
Configure your search servers in LocalSettings.php if you aren't running Elasticsearch on localhost:
$wgCirrusSearchServers = [ 'elasticsearch0', 'elasticsearch1', 'elasticsearch2', 'elasticsearch3' ];
There are other $wgCirrusSearch variables that you might want to change from their defaults.
Now run this script to generate your elasticsearch index:
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php
Now remove $wgDisableSearchUpdate = true from LocalSettings.php.  Updates should start heading to Elasticsearch.
Next bootstrap the search index by running:
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse
Note that this can take some time.  For large wikis read "Bootstrapping large wikis" below.
Once that is complete add this to LocalSettings.php to funnel queries to ElasticSearch:
$wgSearchType = 'CirrusSearch';
--------------------------------------------------------------------------- Amitumar (talk) 08:51, 31 January 2018 (UTC)Reply

not all elasticsearch servers are working

[edit]
Have to make elastic cluster.

Hello.

I have two servers, configured with $wgCirrusSearchServers = new array('elastic1', 'elastic2') and after building index I recognize elastic1 has indexes only and elastic2 is empty. Search is working good. How to connect elastic2 too? Sergezolotukhin (talk) 15:13, 15 August 2016 (UTC)Reply

Are the servers in the same cluster? As in if you do something like
curl -s http://elastic1:9200/_cat/nodes
Does the cluster report that elastic1 and elastic2 are in the same cluster? A two node cluster would report something like this:
10.64.4.13 10.64.4.13 61 99 0.00 d * relforge1001
10.64.37.21 10.64.37.21 53 96 0.08 d - relforge1002 EBernhardson (WMF) (talk) 20:47, 15 August 2016 (UTC)Reply
One more thing: in Cirrus documentation i see $wgCirrusSearchServers = array( 'search01', 'search02' );, but Elastic says they need at least three servers to make cluster. So what's the point in TWO servers in $wgCirrusSearchServers? Or this is a mistake? Sergezolotukhin (talk) 09:51, 16 August 2016 (UTC)Reply
Three servers is needed for a resilient cluster. With two servers you can only have one master capable server, so the loss of that server would bring own the search cluster. With a 3 node cluster all three can be master capable, and two have to agree on who the master is. If you are more worried about scaling than availability, a two node cluster will work ok.
As for why two in the cirrus documentation? It's an example of how to specify more than one server, not a particular suggestion for how many servers you should have. EBernhardson (WMF) (talk) 18:54, 17 August 2016 (UTC)Reply
Thank you for reply. No, I didnt gather that servers into cluster coz of no words about that in manual (yes I am so lame ;-) Sergezolotukhin (talk) 09:07, 16 August 2016 (UTC)Reply

CirrusSearch "An error has occurred while searching" error while ElasticSearch does not appear on Special:Version even though it's installed

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I'm going to overshare in the hopes that it helps track down the issue. I've read pretty extensively through the help topics here, exhausted my google foo trying to find similar situations like what I'm seeing, and had a couple late nights trying to debug. Here's the issue: a supported version of elasticsearch (1.7.5) is installed but not showing up on Special:Version of my 1.27.1 install of MediaWiki. CirrusSearch 0.2 (dcb0cf9) and Elastica 1.3.0.0 (4607acf) are installed and showing on Special:Version.

I have elasticsearch installed on CentOS 7 using the following commands:

wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.5.noarch.rpm
sudo rpm -ivh elasticsearch-1.7.5.noarch.rpm</pre>
Even though it's installed (see below), ElasticSearch Does NOT show in "Installed software" on Special:Version.

<pre>curl "localhost:9200/_nodes/settings?pretty=true"
{
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "1sk_FxWxT3Sw79gcrruaDQ" : {
      "name" : "Halloween Jack",
      "transport_address" : "inet[/172.10.1.15:9300]",
      "host" : "localhost.localdomain",
      "ip" : "127.0.0.1",
      "version" : "1.7.5",
      "build" : "00f95f4",
      "http_address" : "inet[/172.10.1.15:9200]",
      "settings" : {
        "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
        "path" : {
          "conf" : "/etc/elasticsearch",
          "data" : "/var/lib/elasticsearch",
          "logs" : "/var/log/elasticsearch",
          "home" : "/usr/share/elasticsearch"
        },
        "cluster" : {
          "name" : "elasticsearch"
        },
        "name" : "Halloween Jack",
        "client" : {
          "type" : "node"
        },
        "foreground" : "yes",
        "config.ignore_system_properties" : "true",
        "config" : "/etc/elasticsearch/elasticsearch.yml",
        "script" : {
          "inline" : "on",
          "indexed" : "on"
        }
      }
    }
  }
}</pre>
I've followed the instructions in CirrusSearch readme to build the indexes and they appear correctly:

<pre>curl 'localhost:9200/_cat/indices?v'
health status index                                 pri rep docs.count docs.deleted store.size pri.store.size
green  open   mediawiki_cirrussearch_frozen_indexes   1   0          0            0       144b           144b
green  open   mw_cirrus_versions                      1   0          2            2      3.5kb          3.5kb
green  open   wiki_content_first                      4   0        352           12      6.9mb          6.9mb
green  open   wiki_general_first                      4   0        437          110      7.4mb          7.4mb</pre>
Running a search on elasticsearch as follow does return results:

<pre>curl -XGET 'http://127.0.0.1:9200/wiki/page/_search?q=wiki'</pre>
ElasticSearch is obviously installed on a supported version (1.7.5) and seems to be configured correctly, yet ElasticSearch still does not appear in "Installed software" on Special:Version. CirrusSearch and Elastica ''do'' show in Special:Version as follows:

CirrusSearch 0.2 (dcb0cf9) Elastica 1.3.0.0 (4607acf)

Moving on to trying to execute a search, I get this error:

"An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later."

By using <code>$wgDebugToolbar = true;</code> I get the following line in the in-browser debug log.

<pre>[CirrusSearchRequests] {queryType} search for '{query}' against {index} took {tookMs} millis. Requested via {source} for {identity} by executor {executor}
[CirrusSearch] Search backend error during {queryType} search for '{query}' after {took}: {message}</pre>
I have <code>$wgDebugLogGroups['CirrusSearch'] = "$IP/extensions/CirrusSearch/error.log";</code> in <code>LocalSettings.php</code> but (even though extensions dir is drwxr-xr-x apache apache, CirrusSearch is drwxr-xr-x apache apache, and error.log is -rwxrwxrwx apache apache, error.log remains empty.

Any idea why ElasticSearch isn't showing in Special:Version? To me this is an indicator that something isn't configured right. Maybe a permissions thing where MediaWiki and CirrusSearch aren't able to communicate with elasticsearch?

Thanks for your help. [[User:Nwynja|Nwynja]] ([[User talk:Nwynja|talk]]) 20:13, 31 August 2016 (UTC)
:Without the full error message it's hard to guess what could be wrong.
:The maintenance scripts seemed to have worked properly, would it be possible that you installed the php curl extension and forgot to restart apache?
:I'd suggest you to modify the file CirrusSearch/includes/ElasticsearchIntermediary.php by finding the function
:public function failure( \Elastica\Exception\ExceptionInterface $exception = null )
:and add some very simple debug statements there like
:print ( $exception->getMessage() );
:at the very beginning of this function. [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 12:41, 1 September 2016 (UTC)
:Thanks for the quick response. Adding that debug line prints "Couldn't connect to host, Elasticsearch down?" I'm still able to verify elasticsearch is running and also rebuilt the indexes successfully. I've reloaded apache and double checked that curl is installed. Does this error message help at all? [[User:Nwynja|Nwynja]] ([[User talk:Nwynja|talk]]) 15:59, 1 September 2016 (UTC)
:After a little more searching on that error, I was able to find this comment: https://github.com/ruflin/Elastica/issues/665#issuecomment-225487994
:Changing <code>SELINUX</code> to <code>permissive</code> in <code>/etc/sysconfig/selinux</code> and rebooting the machine fixed this error. I don't really understand the security implications of this so I'll do some more research.
:Thanks for pointing me in the right direction with printing out that failure function. [[User:Nwynja|Nwynja]] ([[User talk:Nwynja|talk]]) 16:38, 1 September 2016 (UTC)
{{Archive bottom}}

== Search pdf files ==

What do I have to do to setup CirrusSearch with the ability to index the content of pdf files?

Using MW 1.27, Elasticsearch 1.7.5, CirrusSearch 1.27 and the actual version of Elastica. Search within wiki pages seems working. But I'm not able to search for content in pdf files.

Thank you! [[User:RacingRalf|RacingRalf]] ([[User talk:RacingRalf|talk]]) 13:41, 6 September 2016 (UTC)
:CirrusSearch will index PDF content if the PdfHandler extension is installed: [[Extension:PdfHandler]].
:You may have to run some maintenance scripts to refresh the data of existing PDF in your wiki (please check the PdfHandler documentation) [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 14:00, 21 September 2016 (UTC)
:Is there any chance to include docx, doc, odt etc. for indexing as well? [[User:Jonnnius|Jonnnius]] ([[User talk:Jonnnius|talk]]) 08:07, 30 September 2016 (UTC)
:I am also interested in indexing MS office files. Is this something that is doable? [[User:Dgennaro|Dgennaro]] 19:50, 25 January 2017 (UTC)
:I don't think this is Cirrus specific, cirrus will index any content that has support from a Media Handler. The question would be more "Is there a MediaHandler extension that supports microsoft documents like PdfHandler?".
:I'd suggest asking this question on the mediawiki-l mailing list. This question has already been asked in the past but with no clear answers (see https://lists.wikimedia.org/pipermail/mediawiki-l/2016-September/thread.html#45836)
:I did a quick search not was not able to find an extension like that... so unless some code is hidden in another extension I'm afraid that someone would have to develop this extension. [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 12:46, 26 January 2017 (UTC)
:There was [[Extension:FileIndexer]] that worked very well indexing pdf, doc, docx, xls, ppt ecc. unfortunately it was removed and abandoned due to security issues. I think it was very usefull and I'm really astonished that none has been developed to replace it. [[User:BluAlien|BluAlien]] ([[User talk:BluAlien|talk]]) 19:47, 21 November 2017 (UTC)

== Installing CirrusSearch issue ==
{{Archive top|result=Elastic requires a certain amount of physical RAM and The JVM can crash unexpectedly.
Check system logs then tune ES_HEAP_SIZE in /etc/default/elasticsearch to 1/2 of the available physical ram.
8Gb of physical ram is recommended but it worked with a 512mb VPS in this case.|status=resolved}}

Hi, All

I am following the README in CirruseSearch and try to install CirrusSearch, but I got an exception with this step.

'''php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip'''

It reports: 

: PHP Warning:  Unexpected connection error communicating with Elasticsearch.  Curl code:  52 [Called from {closure} in /var/www/extensions/Elastica/ElasticaConnection.php at line 101] in /var/www/includes/debug/MWDebug.php on line 300
: [             wiki.db] Indexed 10 pages ending at 10 at 2/second
: Exception encountered, of type "Elastica\Exception\Connection\HttpException"

I try

'''sudo /etc/init.d/elasticsearch status'''

and reports:
: elasticsearch is not running

I think my elasticsearch is crash down, but I am new to elasticsearch, Could anyone one help me with this issue?
BTW, my versions number
: MediaWiki 1.26.2
: PHP 5.3.10-1ubuntu3.24
: SQLite 3.7.9
: CirrusSearch 0.2
: Elastica 1.3.0.0
: Elasticsearch 1.7.5

Thanks [[User:Jimmysitu|Jimmysitu]] ([[User talk:Jimmysitu|talk]]) 08:04, 19 September 2016 (UTC)
:you will need to look at elasticsearch logs to see what is happening. Often these are in /var/log/elasticsearch [[User:EBernhardson (WMF)|EBernhardson (WMF)]] ([[User talk:EBernhardson (WMF)|talk]]) 15:45, 19 September 2016 (UTC)
:My /var/log/elasticsearch/elasticsearch.log is here, and I do not see any error there. 
:Could you help?
:Thanks
:<code> </code>
   [2016-09-19 03:24:08,199][INFO ][node                     ] [Douglas Birely] version[1.7.5], pid[17174], build[00f95f4/2016-02-02T09:55:30Z]
   [2016-09-19 03:24:08,200][INFO ][node                     ] [Douglas Birely] initializing ...
   [2016-09-19 03:24:08,311][INFO ][plugins                  ] [Douglas Birely] loaded [], sites []
   [2016-09-19 03:24:08,345][INFO ][env                      ] [Douglas Birely] using [1] data paths, mounts [[/ (/dev/simfs)]], net usable_space [2.1gb], net total_space [5gb], types [simfs]
   [2016-09-19 03:24:12,167][INFO ][node                     ] [Douglas Birely] initialized
   [2016-09-19 03:24:12,168][INFO ][node                     ] [Douglas Birely] starting ...
   [2016-09-19 03:24:12,664][INFO ][transport                ] [Douglas Birely] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.3.172.21:9300]}
   [2016-09-19 03:24:12,692][INFO ][discovery                ] [Douglas Birely] elasticsearch/IM8OF1suShC4LvrDGMrMvA
   [2016-09-19 03:24:16,494][INFO ][cluster.service          ] [Douglas Birely] new_master [Douglas Birely][IM8OF1suShC4LvrDGMrMvA][srv][inet[/192.3.172.21:9300]], reason: zen-disco-join (elected_as_master)
   [2016-09-19 03:24:16,617][INFO ][http                     ] [Douglas Birely] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.3.172.21:9200]}
   [2016-09-19 03:24:16,617][INFO ][node                     ] [Douglas Birely] started
   [2016-09-19 03:24:16,632][INFO ][gateway                  ] [Douglas Birely] recovered [4] indices into cluster_state
   [2016-09-19 03:24:19,347][INFO ][monitor.jvm              ] [Douglas Birely] [gc][young][6][3] duration [744ms], collections [1]/[1.5s], total [744ms]/[1.3s], memory [74.8mb]->[29.5mb]/[1015.6mb], all_pools {[young] [52.4mb]->[1.6mb]/[66.5mb]}{[survivor] [8.3mb]->[8.3mb]/[8.3mb]}{[old] [14mb]->[19.6mb]/[940.8mb]}
   [2016-09-19 03:25:31,761][WARN ][monitor.jvm              ] [Douglas Birely] [gc][young][78][4] duration [1.1s], collections [1]/[1.3s], total [1.1s]/[2.5s], memory [93.7mb]->[35.4mb]/[1015.6mb], all_pools {[young] [65.8mb]->[678.7kb]/[66.5mb]}{[survivor] [8.3mb]->[8.3mb]/[8.3mb]}{[old] [19.6mb]->[26.4mb]/[940.8mb]}
   [2016-09-19 03:38:52,848][INFO ][node                     ] [Isaac Christians] version[1.7.5], pid[637], build[00f95f4/2016-02-02T09:55:30Z]
   [2016-09-19 03:38:52,848][INFO ][node                     ] [Isaac Christians] initializing ...
   [2016-09-19 03:38:52,934][INFO ][plugins                  ] [Isaac Christians] loaded [], sites []
   [2016-09-19 03:38:52,966][INFO ][env                      ] [Isaac Christians] using [1] data paths, mounts [[/ (/dev/simfs)]], net usable_space [2.1gb], net total_space [5gb], types [simfs]
   [2016-09-19 03:38:59,295][INFO ][node                     ] [Isaac Christians] initialized
   [2016-09-19 03:38:59,301][INFO ][node                     ] [Isaac Christians] starting ...
   [2016-09-19 03:39:00,759][INFO ][transport                ] [Isaac Christians] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.3.172.21:9300]}
   [2016-09-19 03:39:00,863][INFO ][discovery                ] [Isaac Christians] elasticsearch/G79LNz6tQbmkVrcWjLyT1g
   [2016-09-19 03:45:12,917][INFO ][node                     ] [Cancer] version[1.7.5], pid[849], build[00f95f4/2016-02-02T09:55:30Z]
   [2016-09-19 03:45:12,918][INFO ][node                     ] [Cancer] initializing ...
   [2016-09-19 03:45:13,063][INFO ][plugins                  ] [Cancer] loaded [], sites []
   [2016-09-19 03:45:13,117][INFO ][env                      ] [Cancer] using [1] data paths, mounts [[/ (/dev/simfs)]], net usable_space [2.1gb], net total_space [5gb], types [simfs]
   [2016-09-19 03:45:19,441][INFO ][node                     ] [Cancer] initialized
   [2016-09-19 03:45:19,446][INFO ][node                     ] [Cancer] starting ...
   [2016-09-19 03:45:20,262][INFO ][transport                ] [Cancer] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.3.172.21:9300]}
   [2016-09-19 03:45:20,299][INFO ][discovery                ] [Cancer] elasticsearch/cqV_-oHcTo6g9Ou2T7YYGg
   [2016-09-19 03:45:24,123][INFO ][cluster.service          ] [Cancer] new_master [Cancer][cqV_-oHcTo6g9Ou2T7YYGg][srv][inet[/192.3.172.21:9300]], reason: zen-disco-join (elected_as_master)
   [2016-09-19 03:45:24,234][INFO ][http                     ] [Cancer] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.3.172.21:9200]}
   [2016-09-19 03:45:24,235][INFO ][node                     ] [Cancer] started
   [2016-09-19 03:45:24,318][INFO ][gateway                  ] [Cancer] recovered [4] indices into cluster_state
   [2016-09-19 03:45:27,110][INFO ][monitor.jvm              ] [Cancer] [gc][young][6][3] duration [740ms], collections [1]/[1.7s], total [740ms]/[1.6s], memory [62.4mb]->[28.9mb]/[1015.6mb], all_pools {[young] [40.2mb]->[1.3mb]/[66.5mb]}{[survivor] [8.3mb]->[8.3mb]/[8.3mb]}{[old] [13.9mb]->[19.5mb]/[940.8mb]}
   [2016-09-19 03:45:32,459][INFO ][monitor.jvm              ] [Cancer] [gc][young][10][4] duration [846ms], collections [1]/[1.7s], total [846ms]/[2.4s], memory [78.8mb]->[35.7mb]/[1015.6mb], all_pools {[young] [50.9mb]->[595.5kb]/[66.5mb]}{[survivor] [8.3mb]->[8.3mb]/[8.3mb]}{[old] [19.5mb]->[26.8mb]/[940.8mb]}
   [2016-09-19 03:54:35,429][INFO ][node                     ] [Mindworm] version[1.7.5], pid[1041], build[00f95f4/2016-02-02T09:55:30Z]
   [2016-09-19 03:54:35,430][INFO ][node                     ] [Mindworm] initializing ...
   [2016-09-19 03:54:35,513][INFO ][plugins                  ] [Mindworm] loaded [], sites []
   [2016-09-19 03:54:35,544][INFO ][env                      ] [Mindworm] using [1] data paths, mounts [[/ (/dev/simfs)]], net usable_space [2.1gb], net total_space [5gb], types [simfs]
   [2016-09-19 03:54:40,696][INFO ][node                     ] [Mindworm] initialized
   [2016-09-19 03:54:40,697][INFO ][node                     ] [Mindworm] starting ...
   [2016-09-19 03:54:41,461][INFO ][transport                ] [Mindworm] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.3.172.21:9300]}
   [2016-09-19 03:54:41,508][INFO ][discovery                ] [Mindworm] elasticsearch/bVkRAHrFTMiib9cA7eNatQ
   [2016-09-19 03:54:45,315][INFO ][cluster.service          ] [Mindworm] new_master [Mindworm][bVkRAHrFTMiib9cA7eNatQ][srv][inet[/192.3.172.21:9300]], reason: zen-disco-join (elected_as_master)
   [2016-09-19 03:54:45,453][INFO ][http                     ] [Mindworm] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.3.172.21:9200]}
   [2016-09-19 03:54:45,456][INFO ][node                     ] [Mindworm] started
   [2016-09-19 03:54:45,576][INFO ][gateway                  ] [Mindworm] recovered [4] indices into cluster_state
   [2016-09-19 03:54:48,450][INFO ][monitor.jvm              ] [Mindworm] [gc][young][6][3] duration [739ms], collections [1]/[1.6s], total [739ms]/[1.3s], memory [68.5mb]->[29.9mb]/[1015.6mb], all_pools {[young] [46.1mb]->[1.7mb]/[66.5mb]}{[survivor] [8.3mb]->[8.3mb]/[8.3mb]}{[old] [14mb]->[19.8mb]/[940.8mb]}
   [2016-09-19 09:56:53,812][INFO ][node                     ] [Warpath] version[1.7.5], pid[644], build[00f95f4/2016-02-02T09:55:30Z]
   [2016-09-19 09:56:53,812][INFO ][node                     ] [Warpath] initializing ...
   [2016-09-19 09:56:53,922][INFO ][plugins                  ] [Warpath] loaded [], sites []
   [2016-09-19 09:56:53,954][INFO ][env                      ] [Warpath] using [1] data paths, mounts [[/ (/dev/simfs)]], net usable_space [2.1gb], net total_space [5gb], types [simfs]
   [2016-09-19 09:56:58,237][INFO ][node                     ] [Warpath] initialized
   [2016-09-19 09:56:58,241][INFO ][node                     ] [Warpath] starting ...
   [2016-09-19 09:56:58,646][INFO ][transport                ] [Warpath] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.3.172.21:9300]}
   [2016-09-19 09:56:58,678][INFO ][discovery                ] [Warpath] elasticsearch/2wuBzO2NTtmcxhaxO_vvqA
   [2016-09-19 09:57:02,495][INFO ][cluster.service          ] [Warpath] new_master [Warpath][2wuBzO2NTtmcxhaxO_vvqA][srv][inet[/192.3.172.21:9300]], reason: zen-disco-join (elected_as_master)
   [2016-09-19 09:57:02,591][INFO ][http                     ] [Warpath] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.3.172.21:9200]}
   [2016-09-19 09:57:02,596][INFO ][node                     ] [Warpath] started
   [2016-09-19 09:57:02,775][INFO ][gateway                  ] [Warpath] recovered [4] indices into cluster_state
   [2016-09-19 09:57:06,488][INFO ][monitor.jvm              ] [Warpath] [gc][young][7][3] duration [811ms], collections [1]/[1.7s], total [811ms]/[1s], memory [67.2mb]->[28.7mb]/[1015.6mb], all_pools {[young] [45mb]->[693.4kb]/[66.5mb]}{[survivor] [8.3mb]->[8.3mb]/[8.3mb]}{[old] [13.9mb]->[19.7mb]/[940.8mb]}
   
     [[User:Jimmysitu|Jimmysitu]] ([[User talk:Jimmysitu|talk]]) 02:15, 20 September 2016 (UTC)
:Elastic seems to publish your public ip, you should be careful if this host is exposed to the internet.
:Could you double check your elasticsearch settings in /etc/elasticsearch/elasticsearch.yml and make sure you only bind and publish a localhost address, e.g. by setting only (if mediawiki and elastic are running on the same machine):
:network.host: 127.0.0.1
:and make sure you comment everything related to http.host, transport.host and *.publish_host
:Then after a restart you check that elastic is running with curl http://localhost:9200/.
:But according to your logs I can see that elastic started without a clean shutdown, we would see multiple log entries with stopping/stopped/closing/closed before the first [Name] version[1.7.5], pid[XXX].... log entry.
:If you did not kill aggressively the java process it can be that the jvm crashed.
:When the jvm crashes it should generate an hs_err_pidXXX.log file somewhere (frequently in /tmp)
:It could be due the os oom killer, you could check for "oom killer" log entries in syslog or dmesg.
:If it's the case make sure to tune your java Xmx settings according to the physical capacity of your host, java Xmx settings are usually configured in /etc/default/elasticsearch with ES_HEAP_SIZE (make sure not to use more than 1/2 of the available physical ram on this host) [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 12:29, 21 September 2016 (UTC)
:I think I found the root cause is ES_HEAP_SIZE setting to 2g, and I change it to 64m, since my server has only 128m ram, Elastic pass all install steps, but there is still sometimes out of memory with dmesg like this.
:[13876259.746727] Out of memory in UB 12125: OOM killed process 621 (java) score 0 vm:877704kB, rss:24112kB, swap:103572kB
:Do you know how much memory do Elastic need at least? [[User:Jimmysitu|Jimmysitu]] ([[User talk:Jimmysitu|talk]]) 16:18, 11 October 2016 (UTC)
:128m of RAM seems pretty low and I'm afraid it will be extremely hard to have a working setup with apache/php/mysql/elastic on the same machine. [https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html This doc] states that going below 8Gb of physical RAM is counter productive.
:I can't say that you won't be able to make it work with 128mb of ram but it's a tough challenge, looking at your dmesg entry java was oomkilled with only 24mb allocated and swap usage seems concerning.
:Note that we have working setups (all-in-one virtual machines for developers) with 2Gb of ram and ES_HEAP_SIZE set to 256m, note that this profile is designed for development purposes and relatively small wikis (not production usage).
:For production usage I'd suggest at least 8Gb of physical ram and ES_HEAP_SIZE set to 4g. [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 09:31, 12 October 2016 (UTC)
:Hi, DCausses
:Thanks. I have move my wiki to a 512mb VPS, and everything works fine now, thanks for you help. [[User:Jimmysitu|Jimmysitu]] ([[User talk:Jimmysitu|talk]]) 06:22, 14 October 2016 (UTC)
{{Archive bottom}}

== Display search suggestions for other searchable namespaces ==

Is there's a configuration variable or an available patch for enabling search suggestions on other namespaces without writing the namespace prefix. i.e. the search for "'''navigation'''" will also show "help:'''Navigation'''" if help is in [[Manual:$wgNamespacesToBeSearchedDefault|NamespacesToBeSearchedDefault]]. [[User:Wess|Wess]] ([[User talk:Wess|talk]]) 21:02, 12 October 2016 (UTC)
:No, unfortunately CirrusSearch is not able to do that yet.
:There is a [https://gerrit.wikimedia.org/r/#/c/141660/ patch] in mediawiki core that seems to be slightly related but I'm not sure it does exactly what you want. [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 09:09, 13 October 2016 (UTC)
:Hello, I've found the workaround for this. Edit your resources\src\mediawiki\mediawiki.searchSuggest.js
:Change from:
:( function ( mw, $ ) {
:mw.searchSuggest = {
:// queries the wiki and calls response with the result
:request: function ( api, query, response, maxRows, namespace ) {
:return api.get( {
:formatversion: 2,
:action: 'opensearch',
:search: query,
:'''namespace: namespace || '0','''
:limit: maxRows,
:suggest: true
:} ).done( function ( data, jqXHR ) {
:response( data[ 1 ], {
:type: jqXHR.getResponseHeader( 'X-OpenSearch-Type' ),
:query: query
:} );
:} );
:}
:};
:To:
:( function ( mw, $ ) {
:mw.searchSuggest = {
:// queries the wiki and calls response with the result
:request: function ( api, query, response, maxRows, namespace ) {
:return api.get( {
:formatversion: 2,
:action: 'opensearch',
:search: query,
:'''namespace: namespace || '0|3010','''
:limit: maxRows,
:suggest: true
:} ).done( function ( data, jqXHR ) {
:response( data[ 1 ], {
:type: jqXHR.getResponseHeader( 'X-OpenSearch-Type' ),
:query: query
:} );
:} );
:}
:};
:Where 3010 is your namespace ID. Use "|" to separate multiple ID's. [[Special:Contributions/2001:1AB8:2:0:47F:24FA:8FAE:8103|2001:1AB8:2:0:47F:24FA:8FAE:8103]] ([[User talk:2001:1AB8:2:0:47F:24FA:8FAE:8103|talk]]) 08:36, 1 March 2019 (UTC)

== Possible to exclude "to", "and", "or" etc. from search weights? ==

A sentence-length search on my wiki often returns results where the only highlighted words are matches to the short/frequent words like "and", "or", "to", etc. which are not useful as highlighted words. See this picture for example: http://i.imgur.com/f9IWEc0.png

Can short words be excluded? If I exclude them manually I get a much better result: http://i.imgur.com/bLm6nhQ.png [[User:T0lk|T0lk]] ([[User talk:T0lk|talk]]) 23:58, 12 October 2016 (UTC)
:Do you use the wikimedia experimental highlighter?
:If not I'd suggest you to give it a try, you'll need to install it as a plugin on every node of your elasticsearch cluster.
:See [https://github.com/wikimedia/search-highlighter this page] on github for more information on how to install it.
:Once installed (requires a cluster restart) you can activate it on mediawiki side by setting the following config options:
:<code>
:$wgCirrusSearchUseExperimentalHighlighter = true;
:$wgCirrusSearchOptimizeIndexForExperimentalHighlighter = true;
:</code> [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 09:01, 13 October 2016 (UTC)
:I followed the steps and installed the Experimental Highlighter. Nothing seems to have changed. Aside from not getting any errors, is there any way to make sure I installed it correctly/it's actually working? [[User:T0lk|T0lk]] ([[User talk:T0lk|talk]]) 18:12, 14 October 2016 (UTC)
:After some testing I confirmed I always had experimental highlighter plugin installed, and can retrieve results using the highlight syntax when running searches from the command line. So, it just seems like setting those two options to true did not make a change in the search results for that specific query perhaps. It would be useful to know what type of search term would yield different results when those two options are set to true for further testing. [[User:T0lk|T0lk]] ([[User talk:T0lk|talk]]) 05:29, 15 October 2016 (UTC)
:You can double check that the highlighter is in use by dumping the CirrusSearch query. You can ask Mediawiki to do so by adding the '''&cirrusDumpQuery''' URI param to the search URL. It will return a JSON page where you can have a look at the elasticsearch query sent by Cirrus.
:Under the section "highlight" you'll find the list of fields and '''type''' should be set to '''experimental'''. If it's not the case then it's probable that <code>$wgCirrusSearchUseExperimentalHighlighter</code> is not evaluated properly.
:If you see '''type: experimental''' then you are using the highlighter and unfortunately it's not smart enough to handle your example and the ''official'' answer to your question would be ''no''.
:You can read the following if you are comfortable with PHP and willing to hack your mediawiki installation.
:This highlighter supports a bunch of config options but unfortunately these options are not configurable via MediaWiki config vars.
:But if you'd like to hack something everything is in the php file '''includes/Search/ResultsType.php''' and more precisely the class '''FullTextResultsType'''. You can either try to tweak some scoring values such as '''boost_before''' or implement a very ugly hack to only highlight on the field which excludes stop words:
:Simply add 
:<code>
:<code>                        if( $name === 'text' ) { continue; }</code>
:</code>
:Inside the loop of the method <code>private function addMatchedFields( $fields ) {</code>
:It should look like:
:<syntaxhighlight lang='text'>        /**
         * @param array[] $fields
         * @return array[]
         */
        private function addMatchedFields( $fields ) {
                foreach ( array_keys( $fields ) as $name ) {
                        if( $name === 'text' ) { continue; } // ugly hack: force highlighting on field with stopwords excluded
                        $fields[$name]['matched_fields'] =  array( $name, "$name.plain" );
                }
                return $fields;
        }
</syntaxhighlight> [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 09:04, 17 October 2016 (UTC)
:That's awesome help, thank you. Unfortunately I see "type": "fvh". I'm not quite sure why my config is ignoring ExperimentalHighlighter settings. I will spend some time trying to figure that out. Thanks for your help! [[User:T0lk|T0lk]] ([[User talk:T0lk|talk]]) 03:57, 19 October 2016 (UTC)

== Does Manual:searchindex_table used in MW 1.27 when elasticsearch enabled? ==

Does [[Manual:Searchindex table]] used in MW 1.27 when elasticsearch enabled?Can I safely truncate searchindex table in database?

[[Extension:CirrusSearch]] [[User:Deletedaccount4567435|Deletedaccount4567435]] ([[User talk:Deletedaccount4567435|talk]]) 10:53, 21 October 2016 (UTC)
:With elasticsearch enabled this table is never queried, it is safe to remove. [[User:EBernhardson (WMF)|EBernhardson (WMF)]] ([[User talk:EBernhardson (WMF)|talk]]) 15:12, 24 October 2016 (UTC)

== Which version fo CirrusSearch &Elastica match with MW 1.27.1? ==
{{Archive top|status=resolved}}

When using these two extension from https://gerrit.wikimedia.org/r/p/mediawiki/extensions.git with git submodule foreach 'git checkout -b REL1_27 origin/REL1_27 || :'

'''After running composer install on both wiki folder and Elastica folder. I got following errors:'''

PHP message: PHP Deprecated:  Deprecated: Elastica\Query\Filtered passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/wiki/vendor/ruflin/elastica/lib/Elastica/Query/Filtered.php on line 32

PHP message: PHP Deprecated:  Deprecated: Elastica\Query\Filtered::setFilter passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/wiki/vendor/ruflin/elastica/lib/Elastica/Query/Filtered.php on line 64

'''After I switch to tar ball 1.27 version of these two extensions from ExtensionDistributor, and run composer install on both wiki folder and Elastica folder. I got following error message:'''

<b>Warning</b>:  Declaration of CirrusSearch\Search\FunctionScoreDecorator::addFunction($functionType, $functionParams, Elastica\Filter\AbstractFilter $filter = NULL, $weight = NULL) should be compatible with Elastica\Query\FunctionScore::addFunction($functionType, $functionParams, $filter = NULL, $weight = NULL) in <b>/var/www/wiki/extensions/CirrusSearch/includes/Search/RescoreBuilders.php</b> on line <b>299</b>

It seems that neither the git nor the tarball version were properly matched for MW 1.27? What else can I do to fix these errors? [[User:Deletedaccount4567435|Deletedaccount4567435]] ([[User talk:Deletedaccount4567435|talk]]) 08:08, 25 October 2016 (UTC)
:This looks like the version of the elastica library does not match what is expected.The REL1_27 branch of the Elastica extension has a dependency on version 2.3.1 of the ruflin/elastica library. The next version of the library, 3.0.0, deprecated the AbstractFilter class (and made other breaking changes) which is why you are getting these errors.
:Somehow composer is installing the wrong version, but I'm not sure how yet. I'll try this out when i get a chance and report back, but perhaps that gives you somewhere to start looking as well. [[User:EBernhardson (WMF)|EBernhardson (WMF)]] ([[User talk:EBernhardson (WMF)|talk]]) 21:27, 25 October 2016 (UTC)
:What I finally get it work is to get MW, CirrusSearch & Elastica all from git with REL1_27. 
:Delete the vendor folder, then run composer install on both wiki folder and Elastica folder. [[User:Deletedaccount4567435|Deletedaccount4567435]] ([[User talk:Deletedaccount4567435|talk]]) 08:14, 26 October 2016 (UTC)
{{Archive bottom}}

== Error during forceSearchIndex.php --skipLinks --indexOnSkip ==

Exception encountered, of type "Error"

[032dd5c194199f2d8b6b7860] [no req]   Error from line 360 of /wiki/extensions/Loops/Loops.php: Call to undefined method Message::escape()

Backtrace:

#0 /wiki/extensions/Loops/Loops.php(181): ExtLoops::msgLoopsLimit(string)

#1 /wiki/includes/parser/Parser.php(3817): ExtLoops::pfObj_loop(Parser, PPFrame_DOM, array)

#2 /wiki/includes/parser/Parser.php(3552): Parser->callParserFunction(PPFrame_DOM, string, array)

#3 /wiki/includes/parser/Preprocessor_DOM.php(1175): Parser->braceSubstitution(array, PPFrame_DOM)

#4 /wiki/includes/parser/Preprocessor_DOM.php(1697): PPFrame_DOM->expand(DOMElement, integer)

#5 /wiki/includes/parser/Preprocessor_DOM.php(1709): PPTemplateFrame_DOM->getNamedArgument(string)

#6 /wiki/includes/parser/Parser.php(4189): PPTemplateFrame_DOM->getArgument(string)

#7 /wiki/includes/parser/Preprocessor_DOM.php(1194): Parser->argSubstitution(array, PPTemplateFrame_DOM)

#8 /wiki/includes/parser/Preprocessor_DOM.php(1697): PPFrame_DOM->expand(DOMElement, integer)

#9 /wiki/includes/parser/Preprocessor_DOM.php(1709): PPTemplateFrame_DOM->getNamedArgument(string)

#10 /wiki/includes/parser/Parser.php(4189): PPTemplateFrame_DOM->getArgument(string)

#11 /wiki/includes/parser/Preprocessor_DOM.php(1194): Parser->argSubstitution(array, PPTemplateFrame_DOM)

#12 /wiki/includes/parser/Parser.php(3470): PPFrame_DOM->expand(DOMElement)

#13 /wiki/includes/parser/Preprocessor_DOM.php(1175): Parser->braceSubstitution(array, PPTemplateFrame_DOM)

#14 /wiki/includes/parser/Parser.php(3694): PPFrame_DOM->expand(DOMElement)

#15 /wiki/includes/parser/Preprocessor_DOM.php(1175): Parser->braceSubstitution(array, PPTemplateFrame_DOM)

#16 /wiki/includes/parser/Parser.php(3694): PPFrame_DOM->expand(DOMElement)

#17 /wiki/includes/parser/Preprocessor_DOM.php(1175): Parser->braceSubstitution(array, PPFrame_DOM)

#18 /wiki/includes/parser/Parser.php(3366): PPFrame_DOM->expand(DOMElement, integer)

#19 /wiki/includes/parser/Parser.php(1248): Parser->replaceVariables(string)

#20 /wiki/includes/parser/Parser.php(446): Parser->internalParse(string)

#21 /wiki/includes/content/WikitextContent.php(331): Parser->parse(string, Title, ParserOptions, boolean, boolean, integer)

#22 /wiki/includes/content/AbstractContent.php(497): WikitextContent->fillParserOutput(Title, integer, ParserOptions, boolean, ParserOutput)

#23 /wiki/extensions/CirrusSearch/includes/Updater.php(388): AbstractContent->getParserOutput(Title, integer)

#24 /wiki/extensions/CirrusSearch/includes/Updater.php(312): CirrusSearch\Updater->getContentAndParserOutput(WikiPage, integer)

#25 /wiki/extensions/CirrusSearch/includes/Updater.php(205): CirrusSearch\Updater->buildDocumentsForPages(array, integer)

#26 /wiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php(227): CirrusSearch\Updater->updatePages(array, NULL, NULL, integer)

#27 /wiki/maintenance/doMaintenance.php(103): CirrusSearch\ForceSearchIndex->execute()

#28 /wiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php(547): require_once(string)

#29 {main} [[User:Deletedaccount4567435|Deletedaccount4567435]] ([[User talk:Deletedaccount4567435|talk]]) 08:40, 26 October 2016 (UTC)

== Search settings in the RuWP ==

In the ru-wiki stopped working strict search for "ё". For example: " "ёё" " - search always corrects it as "её" - looking for "её" but is not looking for the requested "ёё".
https://ru.wikipedia.org/w/index.php?title=Служебная:Поиск&profile=default&fulltext=Search&search=%22%D1%91%D1%91%22
Change search "mode" in settings does not change anything. [[User:Sunpriat|Sunpriat]] 11:28, 1 November 2016 (UTC)
:Note that the search settings in the user preference tabs affects only autocomplete: suggestions displayed while you type your search query. It does not change how results are found in normal fulltext search.
:Folding of ё has been recently implemented and activated on russian wikipedia and causes the behavior you describe here.
:If you need to distinguish between е and ё you now have to use advanced search keywords:
:# simple source search: [https://ru.wikipedia.org/w/index.php?title=Служебная:Поиск&profile=default&fulltext=Search&search=insource%3Aёё insource:ёё]
:# advanced regular expression search : [https://ru.wikipedia.org/w/index.php?title=Служебная:Поиск&profile=default&fulltext=Search&search=insource%3A/трёхмерном/ insource:/трёхмерном/] [[User:DCausse (WMF)|DCausse (WMF)]] ([[User talk:DCausse (WMF)|talk]]) 13:10, 1 November 2016 (UTC)

== Extra words lead to no results ==

I could use some help configuring CirrusSearch...

I have a page titled '''Red Fox Runs'''.

When I use CirrusSearch, I notice if I enter even one additional word not present in the title, no results are returned. For example, the search string: '''Red Fox Runs Fast''' will fail to return any results. However, a very simple search for the word '''red''' *will* return that article in the list of results. I am not using any quotes around these search phrases.

Is this expected behavior? And if so, what configuration settings should I consider tweaking to produce results in a scenario like this? [[Special:Contributions/71.43.251.174|71.43.251.174]] ([[User talk:71.43.251.174|talk]]) 22:35, 3 November 2016 (UTC)
:It is correct that this is the expected current behaviour. CirrusSearch adds an explicit 'AND' between all words in the query, so 'Red Fox Runs Fast' is run as 'Red AND Fox AND Runs AND Fast'. We have done some internal testing for relaxing this so, for example, it would also search for documents that have 3 of the 4 words and score them lower than matches with all 4. We have unfortunately found that while this does improve recall for a wide variety of queries, it also introduces a significant decline in precision for a small percentage of requests.
:Unfortunately we don't currently have any configurable options that can change how this works.
:From an internals perspective, what would have to happen is the default_operator on query_string queries, and the operator on match queries, would both need to be changed from AND to OR. In addition both would need to have the minimum_should_match field added, and set to a reasonable percentage(perhaps 66% or 75%). [[User:EBernhardson (WMF)|EBernhardson (WMF)]] ([[User talk:EBernhardson (WMF)|talk]]) 19:28, 7 November 2016 (UTC)
:Thank you so much! This helped guide me in the right direction for configuring the search engine for our application. Essentially what I'm working on is a product knowledgebase wiki, so it's important that when a user tries to search for a specific product that similar models also show up in the search results.
:I ended up modifying was the buildQueryString function in SearchTextQueryBuilders.php
:I changed ''$defaultOperator'' to 'OR' and added the following query parameter:
:''$query->setParam( 'minimum_should_match', '1<-1 2<-35%' );''
:Perhaps there is a more eloquent way to do this? Thanks again. [[Special:Contributions/71.43.251.174|71.43.251.174]] ([[User talk:71.43.251.174|talk]]) 00:14, 8 November 2016 (UTC)

== $wgCirrusSearchPrefixSearchStartsWithAnyWord not working? (MW 1.27.1) ==

I have installed CirrusSearch according to all the instructions here, and everything seems to work - but the above mentioned option. My example is this:
* I have an article named "שלום בנייך" (that is, 'שלום' is the first word in the title)
* I type 'בנייך' into the search box
* I expect to get a suggestion for "שלום בנייך"
* I get zilch

I have run all the required scripts - updateSearchIndexConfig, forceSearchIndex, updateSuggesterIndex.

==== This is my setup: ====
* Ubuntu 12.04 LTS
* PHP 5.6 (php-fpm)
* MediaWiki 1.27.1
* ElasticSearch 1.7.5
* ElasticSearch plugins: ICU, ExperimentalHighlighter, WikimediaExtra
* Extensions Elastica & CirrusSearch from GitHub's REL1_27

==== These are my settings: ====
<syntaxhighlight lang="php">
$wgCirrusSearchServers = [ 'localhost' ];
$wgCirrusSearchPrefixSearchStartsWithAnyWord = true;
$wgCirrusSearchUseCompletionSuggester = 'yes';
# Turn off leading wildcard matches, they are a very slow and inefficient query
$wgCirrusSearchAllowLeadingWildcard = false;
$wgCirrusSearchUseIcuFolding = true;
$wgCirrusSearchUseExperimentalHighlighter = true;
$wgCirrusSearchOptimizeIndexForExperimentalHighlighter = true;
$wgCirrusSearchWikimediaExtraPlugin[ 'id_hash_mod_filter' ] = true;
</syntaxhighlight>

I seem to have all the relevant indices (<syntaxhighlight lang="bash" inline="">curl 'localhost:9200/_cat/indices?v'</syntaxhighlight>):
<pre>green  open   kz_dev_he_titlesuggest_1479232212       4   0       5167            0        3mb            3mb 
green  open   kz_dev_he_general_first                 4   0       2975          343     30.7mb         30.7mb 
green  open   mediawiki_cirrussearch_frozen_indexes   1   0          0            0       144b           144b 
green  open   mw_cirrus_versions                      1   0          9            4     19.8kb         19.8kb 
green  open   kz_dev_he_content_first                 4   0       4522          398    102.2mb        102.2mb 


What am I doing wrong? How can I debug this? FFS Talk 08:18, 16 November 2016 (UTC)Reply

It seems that would like to have subphrase matching in search suggestions.
$wgCirrusSearchPrefixSearchStartsWithAnyWord is an option that could help to do that but it has some limitations (explained at the end of this message).
In your config you enabled $wgCirrusSearchUseCompletionSuggester and $wgCirrusSearchPrefixSearchStartsWithAnyWord.
- $wgCirrusSearchUseCompletionSuggester: will enable the completion suggester
- $wgCirrusSearchPrefixSearchStartsWithAnyWord: will enable a special option for the prefixsearch.
The completion suggester and prefixsearch are two distinct algorithms to display search suggestions, if you enable the completion suggester activating the option $wgCirrusSearchPrefixSearchStartsWithAnyWord will have no effect since it affects prefix search behaviors.
So if what you want is to enable suggestions when the search term is not the first word of the title I'd suggest to disable the completion suggester with
$wgCirrusSearchUseCompletionSuggester = 'no';
And make sure that your main index has been updated after changing $wgCirrusSearchPrefixSearchStartsWithAnyWord. Simply run updateSearchIndexConfig again to make sure that the index is up to date.
Note that the behavior behind $wgCirrusSearchPrefixSearchStartsWithAnyWord is as follow:
For a page named Hello World search for hel or wor should match Hello World. The drawback is that searching for "he wo" or "wo he" will match "Hello World" as well.
I think this option may do what you want if most of your user queries contain only one word. If the search queries often contains multiple words (especially short words) then the behavior can be confusing. If this option was enabled on wikipedia starting to search for "to be or not to be" will suggest pages related to Tony Bennett as soon as you type to be.
Note that we've added subphrase matching in the Completion suggester (phab:T123015) and should be available in MW 1.28. Subphrase matching in the completion suggester behave a bit more consistently, e.g. searching for "he wo" won't match Hello World. DCausse (WMF) (talk) 10:13, 16 November 2016 (UTC)Reply
Thank you, that's a great and complete answer! I had no idea the CompletionSuggester and the PrefixSearch were competing options. Disabling the suggester worked immediately. I will try to make it clearer on the extension page, soon, as I did not get it at all.
It's now obvious to me that subphrase matching is the way to go, but upgrading to MW 1.28 might not be feasible for me, so I'll probably go with $wgCirrusSearchPrefixSearchStartsWithAnyWord for now.
Thank you again! FFS Talk 20:25, 16 November 2016 (UTC)Reply

Internal error

[edit]

After I enter a search query and press Enter, I see:

Internal error [94f022deb1bc675e00d549e7] 2016-12-21 08:55:30: Fatal exception of type "RuntimeException" VictorPorton (talk) 08:56, 21 December 2016 (UTC)Reply

Hm, maybe it was because Java JRE was not installed?
I installed it, but now it says:
/usr/share/elasticsearch/bin/elasticsearch
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000780060000, 1973026816, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1973026816 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /root/hs_err_pid13908.log
So this is probably not an option for me right now, because I can't pay for a server with more memory. Or can you help? VictorPorton (talk) 09:06, 21 December 2016 (UTC)Reply
How much memory do you have on this machine?
It was reported in this talk page that a machine with 512mb of physical RAM was able to run mediawiki and elasticsearch (see Extension talk:CirrusSearch/2016#h-Installing_CirrusSearch_issue-2016-09-19T08:04:00.000Z it may contain some important info to tune your system).
For production and relatively large wikis we generally recommend 8gb of RAM for the machine hosting elasticsearch. DCausse (WMF) (talk) 10:20, 21 December 2016 (UTC)Reply
I have 2GB on my production VPS.
Maybe I will move it to a 32GB machine next year. VictorPorton (talk) 13:50, 24 December 2016 (UTC)Reply
2GB should be enough to run light wiki, in case you encountered memory issues with JAVA you could try to decrease the mem allocated to the JVM by editing /etc/default/elasticsearch and set ES_HEAP_SIZE to 512mb (not more than 1gb). DCausse (WMF) (talk) 10:05, 26 December 2016 (UTC)Reply

Suggestion: Searching pages by metadata / properties (e.g. "dogs hasmedia:images")

[edit]

Problem: Attempting to search without knowing very well the details of a concept or object

Full Description:

Often when searching using any search engine, sometimes it is very hard to put it into words exactly what one is searching for. For example:

  • Searching for an actor of a movie without knowing their name (but remembering their appearance)
  • Searching for a tree name without knowing what it is called by typing some descriptive terms
  • Searching for a public domain song by a famous composer - few people know mozart's 5th symphony, but they will likely recognise the song
  • Searching for a concept - "Deja vu" is a quite hard to describe with static images, but trivial with videos

See also: http://www.dailyedge.ie/8-things-you-cant-describe-no-matter-how-hard-you-try-906176-May2013/

Suggestion:

Add a new search option / predicate or api that allows one to surface these options, e.g. search "cats hasmedia:images", "deja vu hasmedia:video", "mozarts songs hasmedia:audio", etc.

This would allow the end-user to filter the options and try to see if the content matches what they are looking for.

Workarounds: "insource:/\[\[File\:.*\.(jpg|gif|png|tiff)/" "insource:/\[\[File\:.*\.(ogv|ogg|webm)/"

Unfortunately the workaround probably doesn't work for transcluded images, requires one to know all media types used in the projects, requires one to know the localised aliases for the file namespace, fails for galleries, and isn't performant because of regex use. 197.218.90.97 (talk) 18:17, 25 December 2016 (UTC)Reply

Other useful page metadata may include:
  • withprop , e.g. (withprop:references)
* references - include only articles with references
* math - math tags in article
* links - has links
* geodata
* date - specifying range, a 1985 old article about aids is a lot less likely to be reliable than one published recently, it might also be a hoax, e.g. an article talking about michael jackson being dead from 1990 is certainly a hoax or unreliable
* graphs, maps, etc
Also, with the proliferation of predicates, it might be a better idea to use these through the api that is set using the UI rather than adding more inline predicates that may needlessly complicate search "markup". 197.218.90.97 (talk) 19:09, 25 December 2016 (UTC)Reply
Related?:
https://phabricator.wikimedia.org/T69914 197.218.90.97 (talk) 19:14, 25 December 2016 (UTC)Reply
We've recently deployed a few related options:
This isn't everything mentioned, but covers a few of the suggested use cases. Some, such as the published date of a reference, are a bit harder to work out due to needing to work on a wide variety of wiki's that aren't entirely consistent in how this metadata is stored. Needs some though.
UI work is certainly something else that will be required, and is being explored through a few different fronts. EBernhardson (WMF) (talk) 19:34, 17 January 2017 (UTC)Reply
Yes, these deployments are helpful, except that they only work in the specific File "properties" only work in the File namespace, compare :
It would be more intuitive if it worked in any namespace, especially the main namespace. In fact it theoretically already works, by simply using insource as described above, although one has to jump through considerable hoops to get it to work. The recent addition of a new acceptable file type (webp), just proves my point. The regex above would need to be changed to account for that...
Given that insource doesn't expand templates, it wouldn't detect those embedded images either.
Anyway, thanks for clarifying this. 197.218.83.152 (talk) 21:35, 17 January 2017 (UTC)Reply