Extension talk:CirrusSearch

Jump to navigation Jump to search

About this board

Discussion related to the CirrusSearch MediaWiki extension.

See also the open tasks for CirrusSearch on phabricator.

"Index is unknown retrying..." error on index generation script

6
MyWikis-JeffreyWang (talkcontribs)

When attempting to run php /var/www/mediawiki-1.35.2/w/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php, I'm getting the following output:

indexing namespaces...

mw_cirrus_metastore missing, creating new metastore index.

Creating metastore index... mw_cirrus_metastore_first Scanning available plugins...

none

ok

Index is unknown retrying...

I'm not sure how the index could be unknown when this is the script that's supposed to generate it. Here's my settings:

$wgCirrusSearchIndexBaseName = $wgDBname;

$wgDisableSearchUpdate = true;

$wgCirrusSearchClusters = [

    'default' => [

        [

            'host' => 'search-cirrus-randomcharsid.us-east-1.es.amazonaws.com',

            'port' => 443,

            'scheme' => 'https'

        ]

    ]

];

Any ideas on why this might be the case? Thanks in advance!

EBernhardson (WMF) (talkcontribs)

The message `Index is unknown retrying...` comes from the code that is waiting for the new metastore index to report it is fully created and healthy. While not particularly clear, it seems this means the request to create the index was submitted, but then the later requests to ask about the status of this new index are reporting no such index currently exists. Critically it appears no check is being performed against the index creation request response, it seems likely the index creation request is failing and then Cirrus bails when checking that index.

Cirrus will need to be adjusted to do a better job at error reporting here, but that will only improve reporting it wouldn't fix your actual problem. My first guess would be, which version of elasticsearch are you using? I'm pretty sure the existing metastore index would be rejected by elastic >= 7.0.

MyWikis-JeffreyWang (talkcontribs)

Hi @EBernhardson (WMF), thanks for your reply! I am using Elasticsearch 6.5.4 on AWS's managed version of Elasticsearch, and using MediaWiki 1.35 (and CirrusSearch and Elastica are both on the REL1_35 branch of Git).

If it helps, I have "elasticsearch/elasticsearch": "6.7.2" in my composer.local.json because of T276854. So I guess I'm already laughing a bit at myself for picking two different versions. I'm using 6.5.4 because of the recommendation on the CirrusSearch extension page, but I'm wondering if it's safe to go up to 6.7 or 6.8.

Ciencia Al Poder (talkcontribs)

MediaWiki 1.33.x to 1.36.x require Elasticsearch 6.5.x (6.5.4 recommended).

Any other version (even newer ones than the requirement) will report ES is not compatible and fail instantly. (or at least that was happening some versions ago)

MyWikis-JeffreyWang (talkcontribs)

@Ciencia Al Poder The problem is that when I don't add that line into my composer.local.json I get the issue reported at T276854.

MyWikis-JeffreyWang (talkcontribs)
Reply to ""Index is unknown retrying..." error on index generation script"

CirrusSearch breaks with PHP 8.0

6
Jcwren (talkcontribs)

I've updated my MediaWiki install to 1.36.0 with PHP 8.0.4, updated all the extensions to the 1.36 versions, and I'm getting the error below. Any thoughts as to what's going on?

I apologize for replying to my own post, but when I tried to post the version information I was getting Abusefilter-warning-linkspam errors. Breaking it up was the only way I found to get around it.

[05fe5f82e1a0b673c67b307a] /mediawiki/index.php?title=Special%3ASearch&search=test&fulltext=Search ParseError: syntax error, unexpected token "match", expecting ":"
       Backtrace:
       from /var/www/localhost/htdocs/mediawiki/extensions/CirrusSearch/includes/Query/SubPageOfFeature.php(124)
       #0 /var/www/localhost/htdocs/mediawiki/extensions/CirrusSearch/includes/Parser/FullTextKeywordRegistry.php(89): AutoLoader::autoload()
       #1 /var/www/localhost/htdocs/mediawiki/extensions/CirrusSearch/includes/Parser/QueryParserFactory.php(34): CirrusSearch\Parser\FullTextKeywordRegistry->__construct()
       #2 /var/www/localhost/htdocs/mediawiki/extensions/CirrusSearch/includes/Search/SearchQueryBuilder.php(129): CirrusSearch\Parser\QueryParserFactory::newFullTextQueryParser()
       #3 /var/www/localhost/htdocs/mediawiki/extensions/CirrusSearch/includes/CirrusSearch.php(240): CirrusSearch\Search\SearchQueryBuilder::newFTSearchQueryBuilder()
       #4 /var/www/localhost/htdocs/mediawiki/includes/search/SearchEngine.php(95): CirrusSearch\CirrusSearch->doSearchText()
       #5 /var/www/localhost/htdocs/mediawiki/includes/search/SearchEngine.php(187): SearchEngine->{closure}()
       #6 /var/www/localhost/htdocs/mediawiki/includes/search/SearchEngine.php(96): SearchEngine->maybePaginate()
       #7 /var/www/localhost/htdocs/mediawiki/includes/specials/SpecialSearch.php(446): SearchEngine->searchText()
       #8 /var/www/localhost/htdocs/mediawiki/includes/specials/SpecialSearch.php(228): SpecialSearch->showResults()
       #9 /var/www/localhost/htdocs/mediawiki/includes/specialpage/SpecialPage.php(646): SpecialSearch->execute()
       #10 /var/www/localhost/htdocs/mediawiki/includes/specialpage/SpecialPageFactory.php(1386): SpecialPage->run()
       #11 /var/www/localhost/htdocs/mediawiki/includes/MediaWiki.php(309): MediaWiki\SpecialPage\SpecialPageFactory->executePath()
       #12 /var/www/localhost/htdocs/mediawiki/includes/MediaWiki.php(913): MediaWiki->performRequest()
       #13 /var/www/localhost/htdocs/mediawiki/includes/MediaWiki.php(546): MediaWiki->main()
       #14 /var/www/localhost/htdocs/mediawiki/index.php(53): MediaWiki->run()
       #15 /var/www/localhost/htdocs/mediawiki/index.php(46): wfIndexMain()
       #16 {main}
Jcwren (talkcontribs)
Jcwren (talkcontribs)
Other
Extension Version License Description Authors
CirrusSearch 6.5.4 (264629b) 20:53, 26 May 2021 GPL-2.0-or-later Elasticsearch-powered search for MediaWiki Nik Everett, Chad Horohoe, Erik Bernhardson and others
EditSubpages 3.5.0 (3fbabba) 23:34, 26 May 2021 GPL-2.0-only Allows sysops to unlock a page and all subpages of that page for anonymous editing via MediaWiki:Unlockedpages Ryan Schmidt and Prod
Elastica 6.1.3 (9f6e66a) 23:34, 26 May 2021 GPL-2.0-or-later Base Elasticsearch functionality for other extensions by providing Elastica library Nik Everett and Chad Horohoe
Lockdown – (2409546) 03:05, 27 May 2021 GPL-2.0-or-later Per namespace group permissions Daniel Kinzler, Platonides, Mark A. Hershberger and others
DCausse (WMF) (talkcontribs)
Jdforrester (WMF) (talkcontribs)

David is right. Note that almost all work on wider PHP 8.0 compatibility testing is stalled on the CirrusSearch update to ElasticSearch 6.7+ to allow us to make progress.

Jcwren (talkcontribs)

Thank you. The system got updated to PHP 8.0 (although I'm not sure at what point), and MW was running except for the search box. I finally figured out how to get it running under PHP 7.4 after enabling some missing modules (mbstring and a few others). I have to mess with MW/PHP/Apache so infrequently that it's basically a relearning experience every time I get into it. And, of course, some of the distro tools change, I have to figure out if any of the PHP config options are relevant, blah blah blah. I generally try to leave it alone, but recent security updates forced me to lay hands on it.

Reply to "CirrusSearch breaks with PHP 8.0"
2806:106E:1E:E35:DEC1:FC4F:3B8A:5D6D (talkcontribs)

my wiki is in godaddy, how can I use elasticsearch in goddady, do I have to contract the service in elasticsearch ?. when executing UpdateSearchIndexConfig.php it tells me "Couldn't resolve host". I already went up and activated Elastica .Thanks

Ciencia Al Poder (talkcontribs)

You need to install ElasticSearch. This requires full server access, which means you need a VPS or bare metal/physical server.

Reply to "elasticsearch"

The problem of breaking index of elasticsearch

4
Naramoksu (talkcontribs)

I am running Personal MediaWiki (version 1.35.1) with CirrusSearch and elasticsearch (6.5.4). Two weeks ago, the elasticsearch index suddenly disappeared. Running UpdateSearchIndexConfig.php in CirrusSearch will recreate the index.

What could cause elasticsearch indexes to suddenly disappear?

Lucasjkr (talkcontribs)

See my post above yours on here. I was having the same problem, my indexes disappeared every two or three weeks. I think my server was being hit with a “meow” attack because port 9200 was accidentally open to the outside world.

It could be that you have port 9200 open. Or like me, you could have ES running in docker and think you’ve got it locked down since you’re using a firewall to block port 9200. The thing is, docker apparently bypasses the firewall rules by default. But there’s a solution for that in the link I provided.

I just implemented the fix today, so I don’t know if this is actually the fix, but reading about the meow attack, it seems to check out.

Ciencia Al Poder (talkcontribs)

When an index is recreated (the suggestions index at least), a new index is created, the alias is updated to point to the new one, and the old index removed. I haven't seen Cirrus Search removing indices by itself without creating a replacement one.

125.186.13.171 (talkcontribs)

This problem seems to be due to my wiki side. At that time, my wiki was sometimes overloaded while running runJobs.php and the page did not open, and in the process, the ElasticSearch index was broken. Currently, it is not recurring after adjusting the frequency.

Reply to "The problem of breaking index of elasticsearch"
Blinkingline (talkcontribs)

I posted this to the AdvancedSearch extension as well, but I thought I'd try here, too:

Product Version
MediaWiki 1.33.0
PHP 7.2.24-0ubuntu0.18.04.1 (fpm-fcgi)
MySQL 5.7.28-0ubuntu0.18.04.4
ICU 60.2
Elasticsearch 6.5.4
AdvancedSearch 0.1.0 (76eea63) 16:40, 25 March 2019 GPL-2.0-or-later Easy access to advanced search capabilities on Special:Search Thiemo Kreuz, Gabriel Birke, Tonina Zhelyazkova, Christoph Jauera, Kai Nissen and Tim Eulitz
CirrusSearch 0.2 (2daa9b8) 09:07, 25 March 2019 GPL-2.0-or-later Elasticsearch-powered search for MediaWiki Nik Everett, Chad Horohoe, Erik Bernhardson and others
Elastica 6.0.2 (eee38b0) 07:37, 10 September 2019 GPL-2.0-or-later Base Elasticsearch functionality for other extensions by providing Elastica library Nik Everett and Chad Horohoe

I'm sure this is something I just haven't configured correctly, but can someone point me to how this should be configured?

I'm trying to search multiple categories with Advanced Search. Using the default setup in CirrusSearch, which appears to be

$wgCirrusSearchCategoryEndpoint = '';

and getting the following message:

$wgCirrusSearchCategoryEndpoint should be set to a valid SPARQL endpoint to use deep category search.

So it makes sense that I'm getting the error, I just don't know what this endpoint should look like.

Any assistance?

Ciencia Al Poder (talkcontribs)

Looks like this requires Wikibase according to phab:T192942.

Why a local-wiki category search requires an entire wikibase instance? I don't know, but that doesn't look reasonable.

EBernhardson (WMF) (talkcontribs)

Deepcat requires both a blazegraph instance and regular dumps exported from your wiki to that blazegraph instance. Overall I do not think running deepcat outside WMF infrastructure is going to be satisfying, and will instead be a giant headache of trying to run additional complex services that generally only make sense if you already have those services for other purposes.

In theory the same thing could be implemented with recursive SQL queries without using blazegraph, but we would not deploy recursive SQL queries due to performance considerations. If a patch were provided to implement this slow-path we can review it for inclusion and use outside WMF.

Schlagmichdoch (talkcontribs)

When this error only occurs when trying to specify categories in which the pages should be, as it was the case with me, the following should help:

Disable "deepcat:" functionality in LocalSettings.php as it requires a seperate SPARQL Service.

As specified here, to do that simply add the following line to your LocalSettings.php:

$wgAdvancedSearchDeepcatEnabled = false; // disable deepcat: in favor of incategory:

Reply to "Error using deepcat"

Error valid SPARQL endpoint to use deep category search

4
Mikael44115 (talkcontribs)

Hello,

I can’t get CirrusSearch to work. When I run a search, I get this message:

A warning has occurred while searching: $wgCirrusSearchCategoryEndpoint should be set to a valid SPARQL endpoint to use deep category search.


My configuration is :

Product Version
MediaWiki 1.32.0
PHP 7.2.14 (apache2handler)
MySQL 5.7.24
ICU 63.1
Elasticsearch 5.6.16
Extension Version License Description Authors
AdvancedSearch 0.1.0 (9bbb17d) 22:15, 15 October 2018 GPL-2.0-or-later Easy access to advanced search capabilities on Special:Search Thiemo Kreuz, Gabriel Birke, Tonina Zhelyazkova and Christoph Jauera
CirrusSearch 0.2 (b1fa4bd) 13:47, 20 February 2019 GPL-2.0-or-later Elasticsearch-powered search for MediaWiki Nik Everett, Chad Horohoe, Erik Bernhardson and others
Elastica 1.3.0.0 (9fcf88c) 09:09, 11 October 2018 GPL-2.0-or-later Base Elasticsearch functionality for other extensions by providing Elastica library Nik Everett and Chad Horohoe

Thank you for helping me.


Mikael

MadX (talkcontribs)

Was anyone able to get this working? I am encountering the same issue.

Revansx (talkcontribs)

I'm getting it too

Schlagmichdoch (talkcontribs)

When this error only occurs when trying to specify categories in which the pages should be, as it was the case with me, the following should help:

Disable "deepcat:" functionality in LocalSettings.php as it requires a seperate SPARQL Service.

As specified here, to do that simply add the following line to your LocalSettings.php:

$wgAdvancedSearchDeepcatEnabled = false; // disable deepcat: in favor of incategory:

Reply to "Error valid SPARQL endpoint to use deep category search"
2806:106E:1E:17A7:EDE:1AC1:6547:8B0B (talkcontribs)

When I do a search, the labels appear in the results and it doesn't look good at all, how do I remove them.

example, searching stones Results:

Stones stones the stones 699 bytes (94 words) - 14:00 23 Apr 2021

and it would have to appear like this

Stones stones the stones 699 bytes (94 words) - 14:00 23 Apr 2021

Ciencia Al Poder (talkcontribs)

I don't see any difference in the examples you posted.

2806:106E:1E:17A7:EDE:1AC1:6547:8B0B (talkcontribs)

excuse me. example result, looks like this: :

<nowiki>

Stones

[[Stones]], ''the stones''...

</nowiki>


I want them to appear without the markup:

[[Stones]], ''the stones''...

Ciencia Al Poder (talkcontribs)

Are you sure you have Extension:CirrusSearch installed? I don't get markup on search results, and I don't have set any special setting that would control that behavior...

2806:106E:1E:17A7:EDE:1AC1:6547:8B0B (talkcontribs)

yes, and elastic too. although thinking about it, I think I only activated them and did not make any configuration...

Reply to "labels"

Parse Error from MetaNameSpace store when running on PHP 8

4
Summary by Jdforrester (WMF)

PHP 8 isn't supported yet.

Blinkingline (talkcontribs)

We're working on building out a wiki family. We're bringing our first wiki online, with wikiid "fr". When I try to update the Search Index config, I get the following error, and I'm not sure where it's coming from:

extensions/CirrusSearch/maintenance# php updateSearchIndexConfig.php --wiki=fr
indexing namespaces...
[1412852ef0291ba91e117343] [no req]   ParseError from line 79 of /var/www/html/aaen/extensions/CirrusSearch/includes/MetaStore/MetaNamespaceStore.php: syntax error, unexpected token "match"

Backtrace:
#0 /var/www/html/aa-en/extensions/CirrusSearch/includes/MetaStore/MetaStoreIndex.php(119): AutoLoader::autoload()
#1 /var/www/html/aa-en/extensions/CirrusSearch/maintenance/IndexNamespaces.php(35): CirrusSearch\MetaStore\MetaStoreIndex->namespaceStore()
#2 /var/www/html/aa-en/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(54): CirrusSearch\Maintenance\IndexNamespaces->execute()
#3 /var/www/html/aa-en/maintenance/doMaintenance.php(107): CirrusSearch\Maintenance\UpdateSearchIndexConfig->execute()
#4 /var/www/html/aa-en/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(70): require_once(string)
#5 {main}

Any help would be appreciated!

DCausse (WMF) (talkcontribs)

Hi,

you most probably use PHP 8 which is not supported yet.

Blinkingline (talkcontribs)

Hrm...we're running 7.4 from nginx. Let me double check from the CLI. Thanks!

Edit: Yup! PHP8 is installed and is what the CLI is using. Thanks again.

Inlimity (talkcontribs)

If you (like I did) really have to use php8, you can do a quick-and-dirty fix. The issue here is, that "match" is a restricted php keyword sinc php 8, so it is no longer available as class name. To fix it you can just search-and-replace it by something else.

cd wherever-your-mediawiki-installation-lives

grep -rlw extensions/CirrusSearch -e 'Match' | xargs -i@ sed -i 's/\([^a-zA-Z]\)Match\([^a-zA-Z]\)/\1ElasticMatch\2/g' @

grep -rlw extensions/Elastica -e 'Match' | xargs -i@ sed -i 's/\([^a-zA-Z]\)Match\([^a-zA-Z]\)/\1ElasticMatch\2/g' @

mv extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/Match.php extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/ElasticMatch.php

Again, I recommend using php7 while php8 is not officially supported. Take a look e.g. into how to set up php-fpm if you need to run different services on the same webserver.

Reply to "Parse Error from MetaNameSpace store when running on PHP 8"
Emmanuel Touvier (talkcontribs)

I am working with AWS, and AWS Elasticsearch servers are allowed to communicate on port 80 only.

I have setup the server in LocalSettings.php this way :

$wgCirrusSearchServers = [ [ 'host' => "elasticsearchserver.amazonaws.com", 'port' => 80 ] ];

It works well with Mediawiki 1.31.

When upgrading to 1.35, the ElasticSearch server is shown in Special:Version

MediaWiki 1.35.1
PHP 7.3.23 (apache2handler)
MySQL 5.7.26-log
Elasticsearch 6.5.4

I use

CirrusSearch 6.5.4 (f835850) 13 juillet 2020 à 11:28
Elastica 6.1.3 (3e3b76f) 13 juillet 2020 à 11:44

then when creating the index with UpdateSearchIndexConfig.php, I get the following error message :

PHP Fatal error:  Declaration of Elasticsearch\Endpoints\Indices\Exists::getParamWhitelist() must be compatible with Elasticsearch\Endpoints\AbstractEndpoint::getParamWhitelist(): array in ....../extensions/Elastica/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Endpoints/Indices/Exists.php on line 62


Do you think it is related to the exotic port ? Any clue ?

EBernhardson (WMF) (talkcontribs)

This doesn't look related to the port. This looks like a problem with mismatched versions between the Elastica library (ruflin/elastica) and the low level Elasticsearch client library (elasticsearch/elasticsearch). Perhaps composer is giving some errors when it is run?

Xdaveyx (talkcontribs)

I am running into the same problem and I am not sure I understand exactly what is happening.

When composer is run I see this:

- Installing elasticsearch/elasticsearch (v6.7.2): Loading from cache

I suspect this may be something to do with the version of Elastic Search. I will try updating that.


... Okay, Elasticsearch 6.7.2 doesn't help me either. same issue.

AnSiOLAtra (talkcontribs)

I had this same exact same issue and error.


Like @EBernhardson (WMF) mentioned, it appears to be an issue in the version of ruflin/elastica being used. A recent update to the elastica extension set composer to pull "ruflin/elastica": "6.1.3" instead of the 6.1.1 that was previously being pulled.


For me, I reset the commit locally to the 3e3b76f3b7208167342fee843c401f2587dacde3 commit of REL1_35, then ran composer, which pulled the working version of ruflin/elastica.

Xdaveyx (talkcontribs)
Reedy (talkcontribs)

If you look at https://github.com/ruflin/Elastica/compare/6.1.1...6.1.3...

There's no change in the constraint of elasticsearch/elasticsearch. There's also very minimum changes to the actual code either.


https://github.com/ruflin/Elastica/blob/6.1.1/composer.json#L17 - "^6.0

https://github.com/ruflin/Elastica/blob/6.1.3/composer.json#L17 - "^6.0"


I'm guessing what caused the issue is maybe https://github.com/wikimedia/mediawiki-extensions-Elastica/commit/c101a4c17fff7e8711b0199cc9f7c342699e1221 which allowed 6.7.2 aswell.


If we look at PHP Fatal error:  Declaration of Elasticsearch\Endpoints\Indices\Exists::getParamWhitelist() must be compatible with Elasticsearch\Endpoints\AbstractEndpoint::getParamWhitelist(): array in ....../extensions/Elastica/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Endpoints/Indices/Exists.php on line 62


https://github.com/elastic/elasticsearch-php/blob/v6.7.2/src/Elasticsearch/Endpoints/Indices/Exists.php#L45

https://github.com/elastic/elasticsearch-php/blob/v6.7.2/src/Elasticsearch/Endpoints/AbstractEndpoint.php#L51


The return type casting isn't there. So the error doesn't seem to match the version that is apparently being installed

Reedy (talkcontribs)
Emmanuel Touvier (talkcontribs)

give more weight to precise search

1
קיפודנחש (talkcontribs)

so, a user on hewiki searched for a phrase, without adding quotes.

they expected the articles containing the phrase to appear before other articles (with both individual words in title and/or content), but this is not the current order, and one of these articles (the one they were actually looking for) landed in place 83 in the search results.

SUGGESTION:

try a first search pass for the precise phrase (pretending it's quoted), and if this round finds anything, give those articles significant weight, so they will appear in the first 20 at least - methinks at the very top.

we should serve better readers who do not want to deal with "syntax", like quotes, and still expect articles with precise phrase to be on top.

peace

Reply to "give more weight to precise search"