Extension talk:CirrusSearch

About this board

Discussion related to the CirrusSearch MediaWiki extension.

See also the open tasks for CirrusSearch on phabricator.

An error has occurred while searching: We could not complete your search due to a temporary problem.

7
Clarasiir (talkcontribs)

For the past month of so, CirrusSearch has suddenly and randomly stopped working and given the message "An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later."

Our wiki has used CirrusSearch for a good while now with no issues, but recently our traffic has slowly been improving, and that is when trouble with search began. Restarting our VPS would solve the issue, but over time search would eventually go down again. As wiki traffic gradually increased, so did the frequency of the error, up to the point where search would go down daily.

Thinking it might be a memory use issue, I created a custom.options file in elasticsearch/jvm.options.d with the settings

-Xms3g

-Xmx3g

Nothing changed at first, as I didn't restart ElasticSearch, but the next morning search was down per usual, so I rebooted the VPS to get it working again. This time, that didn't solve the problem. The message "An error has occurred while searching" was still appearing.

I deleted the custom.options file I had created, and rebooted the VPS again. Still this didn't solve the problem.

To avoid not having any search function at all, we're now using the default mediawiki search. But I would much rather have CirrusSearch back again, so does anyone know what I should do to solve this issue and stop search giving nothing but error messages?

DCausse (WMF) (talkcontribs)

Hi,

I would suggest to analyze the elasticsearch logs to understand if it is having issues and why. The error you describe could have a wide variety of causes:

  • network issue between mediawiki and elastic
  • health status of your search indexes
  • elasticsearch crashing

https://www.elastic.co/guide/en/elasticsearch/reference/8.13/fix-common-cluster-issues.html might be interesting, note that this doc is for 8.13 and you might be running an older version but I suspect that most information you will find there still applies for 7x.

Please let us know if you have more precise information about the issue you are facing.

Good luck!

Clarasiir (talkcontribs)

Okay, well I checked the elasticsearch.log but it didn't seem to have anything useful. It did have the message "Native controller process has stopped - no new native processes can be started," but there were no other error messages or an explanation as to why search stopped. I'm not even sure if that's an error or that's just when I disabled CirrusSearch because it had already stopped working anyway and was showing the "error has occurred while searching" message.

I have noticed that with CirrusSearch disabled, our server's total memory use is very low. Just enabling CirrusSearch makes it jump to over 65%, and as time passes that number will slowly creep higher to around 80-83% before search then goes down.

To me that sounds similar to the "Circuit breaker errors" in the guide you linked, but there's no error like that in the logs. Our wiki does use an older 7.x version of elasticsearch, so I don't know if error logs work differently for this older version?

DCausse (WMF) (talkcontribs)

Unfortunately without more details I can only give you very broad guidance only, first I would try to understand if elasticsearch dies or not. It could be killed by the JVM itself because of high GC overhead but in that case you would see an error in the logs or it could be killed by the system oomkiller (which can be inspected in system logs or dmesg). Circuit breaker errors should also be logged in elasticsearch logs so you would have seen those, but if you believe this is affecting your setup please see: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/fix-common-cluster-issues.html#circuit-breaker-errors

Have you followed https://www.elastic.co/guide/en/elasticsearch/reference/7.10/system-config.html when setting up elasticsearch? If not I would encourage you to follow this documentation and make sure that your system is properly setup for elasticsearch to run smoothly.

Hope it helps, good luck!

Clarasiir (talkcontribs)

I'm not sure what more details you want me to provide, but our elasticsearch certainly does not have a server to itself, and I wouldn't want to change the settings as if it did only to have that crash our server or something.

We use a shared VPS server with 4 cores and our container having 8 GB guaranteed ram. That has been enough for elasticsearch to function with the default settings up until recently.

DCausse (WMF) (talkcontribs)

As I said earlier the CirrusSearch error message alone is not precise enough to identify the cause of the issue and without knowing the cause I can't guide you on a solution. I can only advise you to continue troubleshooting the problem until you understand its cause. Have you tried seeking for help on other forums more dedicated to elasticsearch? You might certainly get more precise guidance on how to troubleshoot an elasticsearch instance.

Clarasiir (talkcontribs)

I see, then I will try an elasticsearch forum and see if I can get more help figuring things out there, thank you.

Reply to "An error has occurred while searching: We could not complete your search due to a temporary problem."

Search Suggestion using DisplayTitle

3
Oetterer (talkcontribs)

I just installed CirrusSearch on my knowledge base wiki and it's working mostly as intended. I have no problems with my documents in the main namespace (where I put mostly infrastructure pages like teams, services, etc.). However, all my actual knowledge base articles reside in a second namespace (named KB), have a pagename containing of numbers which are randomly generated. The actual displayed title is set via {{DISPLAYTITLE:Nice Title}}.

The KB namespace is searched correctly and the results are displayed as intended. What is rather anoying is that the search suggestion works only on the actual page title (that is something like KB:1234567). Is there a way, I can have the suggestor use and display the display title for a) that namespace or b) all namespaces?

DCausse (WMF) (talkcontribs)

Sadly not at the moment, some work has been done to start collecting this data in the search index but nothing more. I would suggest to file a task in https://phabricator.wikimedia.org with the tag CirrusSearch describing your use-case.

Thanks!

DCausse (WMF) (talkcontribs)
Reply to "Search Suggestion using DisplayTitle"

Username and password authentication for Elastic server?

3
Brooke Vibber (WMF) (talkcontribs)

I tried setting up a local development instance using a default Docker installation of Elastic. This creates a username "elastic" and a generated password for authentication, however I can't find anything about specifying authentication in search host configuration for CirrusSearch, and the updater script won't connect with anything I've devised yet.

It doesn't seem to work to specify "elastic:<password>@localhost:9200" as the host:

Elastica\Exception\Connection\HttpException from line 186 of /var/www/html/w/extensions/Elastica/vendor/ruflin/elastica/src/Transport/Http.php: Malformed URL


nor "elastic:<password>@localhost" with default port assumed:

Elastica\Exception\Connection\HttpException from line 186 of /var/www/html/w/extensions/Elastica/vendor/ruflin/elastica/src/Transport/Http.php: Couldn't connect to host, Elasticsearch down?

Prefacing with 'http:' or 'https:' makes no difference.

Any ideas? I'm hoping to get this running so I can do some fixes on Extension:MediaSearch on my local development site. Thanks!

EBernhardson (WMF) (talkcontribs)

Generally i would suggest using docker-registry.wikimedia.org/repos/search-platform/cirrussearch-elasticsearch-image:v7.10.2-5 as it contains the extra plugins we use. This is based off the elasticsearch-oss image. Alternatively you can directly use https://www.docker.elastic.co/r/elasticsearch/elasticsearch-oss image. This image doesn't contain authentication, because the auth isn't part of the OSS offering.

I haven't tested it, but you should be able to provide authentication as part of the connection configuration. For example (untested):

    $wgCirrusSearchServers = [
        [
            'host' => 'localhost',
            'port' => '9200',
            'username' => '...',
            'password' => '...',
        ]
    ];
GregRundlett (talkcontribs)

The above dictionary worked for me. (configuring $wgCirrusSearchClusters['default'])

Reply to "Username and password authentication for Elastic server?"

Storage option for CirrusSearch in MW 1.39 Docker

1
Testergt1302 (talkcontribs)

Hi,

We are trying to put our wiki application into Docker Container (Azure Kubernetes). We made the MW working but, the elasticsearch is not working as expected. It generates the indexes for the first time, after one or two days, the index data is deleted. Not sure why this is happening. We used Azure Blob Container as data volumes for ES, but I think it is not supported.

Do we have a compatibility list for storage support by Elastic? I tried to find out from elastic website and docs, but I could not find any. Anybody having idea on this ?

Thanks in Advance.

Reply to "Storage option for CirrusSearch in MW 1.39 Docker"

Completion suggestions for other namespaces

5
Alex44019 (talkcontribs)

Hi,

as in title. Are these supported? If not, could you point me to a Phabricator task? On my wiki we've got two content namespaces (one for official, other for unofficial content), but unfortunately the second namespace is never suggested in suggestions. Are there any workarounds maybe? (without disabling Cirrus suggestions)

Alternatively, if someone has some pointers how to potentially implement it in the extension, I'd gladly appreciate them - though I've never done any search work in the past.

DCausse (WMF) (talkcontribs)

Getting suggestions (title completion) should be supported.

For 2 pages:

  • My_Page
  • Unofficial:My_Page

I suspect that what you want when typing "Ma Pag" in the search box is getting at least these two pages suggested?

If yes I think that the way to get this working is:

  1. Configure wgNamespacesToBeSearchedDefault with [ 0 => 1, 100 => 1 ] (assuming that 100 is the Unofficial namespace)

Note that changing wgNamespacesToBeSearchedDefault will require reindexing your wiki.

You can see it in action on https://es.wikipedia.org for examples, where the Author and Portal namespaces are searched by default, if you search for `Lenguas portuguesa` you should obtain results from both the main content namespace and the Portal namespace.

Note that it is likely that the suggestions from these extra namespaces are ranked very low compared to the ones from the main namespace.

Alex44019 (talkcontribs)

I'll link the wiki as it may be helpful for the thread: https://ark.wiki.gg/. We use the main namespace for official game content, and a "Mod" namespace for unofficial modifications, all following a format of a mod's main page at "Mod:modname", and mod's content as sub-pages to that main page. For example "Mod:ARK Additions/Acrocanthosaurus".

In my expectations, typing "Acro", "Acrocantho", or the full title "Acrocanthosaurus", in the mw-head search bar would suggest the article that's in the Mod namespace. We have no other page titled Acrocanthosaurus in any namespace (ignoring files of course). However, there are simply no results returned at all.

To get the suggestions, the reader has to type the mod namespace prefix and the mod's name. "Mod:ARK Additions/Acr" returns valid suggestions. There's no "partial" completion, the prefix must be complete and without typos. And that's not very intuitive or useful.

Regular Special:Search already handles this well [enough], and our mod namespace is weighed below main.

(I've put "enough" in brackets, as searching for "acrocanth" in Special:Search yields no results until a wildcard is added to the end. I'm not familiar with Cirrus's configuration though, so not sure if there's a setting to alter the behaviour so search acts as if there was a wildcard at all times. However, this is not related to this thread.)

DCausse (WMF) (talkcontribs)

You seem to use the fuzzy-subphrases profile of the completion suggester which allows it to complete in the middle of the titles. When running a completion search across multiple namespaces the CompletetionSuggester (if enabled) will only work and use this algorithm for the main namespace, the other namespaces will be searched using the classic prefix search algorithm. This is why searching for Acro does not yield Mod:ARK Additions/Acrocanthosaurus, you have to search for ARK Additions Acro for it to work.

So indeed, in order to support subphrase matching in your context the CompletionSuggester would have to be adapted to support multiple namespaces, sadly it was not designed with this use-case in mind. I'm unclear on what could be the main difficulty here to adapt the codebase but at a glance I think the context-suggester have to be used and I fear that the assumption that only NS_MAIN is indexed is probably hard-coded in many places.

An alternative might be to change how the classic prefix search works by enabling wgCirrusSearchPrefixSearchStartsWithAnyWord, we never enabled this on WMF wikis so I don't have much experience on how it behaves but it might greatly help to increase recall on non-main namespaces in your case

Note that enabling wgCirrusSearchPrefixSearchStartsWithAnyWord requires re-indexing your wiki with UpdateSearchIndexConfig.php.

Alex44019 (talkcontribs)

Interesting, thank you. I'll get in contact with our hosting platform provider about current Cirrus settings, and I'll set up a sandbox to test out the variable you mentioned. I might have a try at getting more familiar with the extension's internals for the CompletionSuggester (mainly for fun), but currently need to burn through my existing to-do lists...

Also... it seems the slash is required in "ARK Additions/Acro" to get article results. Dropping the slash only returns our legacy redirects. Still useful to know!

Reply to "Completion suggestions for other namespaces"

contradiction about version of Elasticsearch for 1.30

7
Aloist (talkcontribs)

The page Extension:CirrusSearch states:

MediaWiki 1.39+ require Elasticsearch 7.10.2

When I download the extension for Mediawiki 1.39 and look in README, it says:

Installation

------------

Get Elasticsearch up and running somewhere. Only Elasticsearch v6.8 is supported.

I would like that to be true, because I have 6.8.23. But is it true?

EBernhardson (WMF) (talkcontribs)

Unfortunately the README is wrong and the wiki page is correct. As linked in the wiki page there is a compatibility layer that can be activated for 1.39 to talk to 6.8.23, but it is focused on ensuring write compatability and it's possible you would run into query issues.

Aloist (talkcontribs)

Thank you.

I face the problem of upgrading from 1.35 to 1.39 on RHEL9.

I already established that 1.35 works with 10.5.22-MariaDB. So the database version can remain the same when I switch MW version. I expect update.php to do the job for database wikidb

But having to upgrade elasticsearch synchronously with Mediawiki is a problem.

Elasticsearch > 6.8 is not in Redhat repositories. I can get 8.x but not 7.10.2

Can I have two Elasticsearch versions installed at the same time? Like one port 9200 and another on 9250?

Instructions somehere?

EBernhardson (WMF) (talkcontribs)

It's techinically possible to run multiple versions of elasticsearch on the same host, but I'm not sure of any documentation to that end. Much would depend on your available infrastructure, and in my experience generally leads to ongoing complexities. In WMF infra we run multiple instances (of the same version) of elasticsearch on a single host and it's led to a number of minor problems and headaches over the years. If you have the ability to spin up virtual machines then one plausible way forward is to spin up a new instance running the newer version. Another potential option might be to use the docker container elastic makes available, those are isolated enough that it should reduce complexities of running two instances on one host.

Aloist (talkcontribs)

Is there someone to be reached who created the compatibiity layer found in (1.39 version)?

./CirrusSearch/includes/Elastica/ES6CompatTransportWrapper.php

This person might be able to answer about problems it creates.

In my wikis, I have little demands on search. All we do is the very common search for articles or for text inside articles.

May I suggest that extensions/CirrusSearch/README is updated?

Would anyone be able to tell whether 1.39 works with Elasticsearch 8.10.4-1 ?

Ciencia Al Poder (talkcontribs)
EBernhardson (WMF) (talkcontribs)

My teammate DCausse wrote the layer, but if you look inside you can see it is very simple. The problem this compatability layer solves for is a breaking change in the bulk write api of elasticsearch. It doesn't do anything with search requests. In WMF production we ran the upgrade such that we had a cluster running 7.10, and a cluster running 6.8. As the code was deployed that knew how to talk to 7.10 it would also switch it's query endpoint between clusters. Only the write layer requied compatability, because it had to write to both clusters at the same time.

There is a reasonable chance it would work for most simple queries. The general problem is that when Elastic releases a major version update they make a wide variety of breaking changes (see breaking changes list for 7.0). You could test and see what happens to work, but if problems do arise I don't know if there will be much we can do to help you.

Reply to "contradiction about version of Elasticsearch for 1.30"

CirrusSearch does not update automatically

4
Davidgbc (talkcontribs)

Hi,

we are currently using Mediawiki 1.39.4, PHP 7.4.33, MariaDB 10.4.12 and Elasticsearch 7.10.2.

I got the task to update the wiki in my company from version 1.35 to 1.39. Only now I am confused with CirrusSearch and Elasticsearch (we use locally on the server).

On the extension page of CirrusSearch it says you need Elasticsearch version 7.10.2, but in the CirrusSearch README it says that only version 6.8 is supported. Which of these is true?

I followed the steps in the README normally and the search works fine.

But when I create a new article it is not found in Special:Search, the content is not found either.


Please help me... :(

Ciencia Al Poder (talkcontribs)

The README is outdated. The correct version is on the wiki page.

Updates to the search index are triggered by jobs. See Manual:Job queue. Check if jobs are running, or if they're failing, or there's a large backlog of jobs that may delay the indexation of new content.

Davidgbc (talkcontribs)

Thanks for the answer, but shouldn't new pages still be indexed automatically?

We have a separate department in the company that only edits wiki pages and they say it worked with the old version...

If I create a job, then I would have to index the database very often, or how should such a job look, that a new page is found directly? I don't get on at all

Ciencia Al Poder (talkcontribs)

You don't have to create jobs, they're automatically created by MediaWiki (usually after saving an edit on a page or performing any other modification) and placed on the job queue. Then, jobs are picked from the queue on following page loads, or by a job runner, depending how did you configure things. See Manual:Job queue for more information.

Checking if there are stuck jobs with Manual:showJobs.php and setting custom log groups for exceptions may give you more information.

Reply to "CirrusSearch does not update automatically"

insource search by default

2
Sphynkx (talkcontribs)

Maybe useful for somebody.

Modification in extensions/CirrusSearch/includes/Searcher.php (function buildFullTextSearch( $term ) ) for search in insource mode as default:

@@ -294,11 +294,14 @@
                // whitespace. Cirrussearch treats them both as normal whitespace, but
                // the preceding isn't appropriately trimmed.
                // No searching for nothing! That takes forever!
+               global $wgInSourceSearchDefault;
                $term = trim( str_replace( "\xE3\x80\x80", " ", $term ) );
                if ( $term === '' ) {
                        $this->searchContext->setResultsPossible( false );
                }
-
+               if ( isset( $wgInSourceSearchDefault ) && $wgInSourceSearchDefault === true ) {
+                       $term = "insource:" . $term;
+               }
                $builderSettings = $this->config->getProfileService()
                        ->loadProfileByName( SearchProfileService::FT_QUERY_BUILDER,
                                $this->searchContext->getFulltextQueryBuilderProfile() );

Also set in LocalSettings.php:

 $wgInSourceSearchDefault = true;

Feature Request.. Would be nice to have check button on search page for insource-searching..

Ciencia Al Poder (talkcontribs)

It would probably be easier or less prone to breaking on upgrade, to add a JavaScript gadget that would automatically prepend the insource: text on the search term when submitting the form.

Reply to "insource search by default"

Indexing WIKI after a database restore

3
Raoufgui (talkcontribs)

Hello

i have two MW servers that work fine :

1 - production server

2- backup server

i will restore the data base backuped from production server to the second server in order to move it to run

the database on the production server is more recent and contains more data.


After each restore of the DB and running the update script :

- should i build index from scratch on the second server ?

- if no, does the new data (difference of data between the 02 DB) will be automatically indexed or shoud i run specific script to index the new data

-how to confirm that all data are indexed on the second server and i will have the same results of search like the first server ?

NB: I'm using CirrusSearch plugin and elasticsearch

Thanks

DCausse (WMF) (talkcontribs)

I'm assuming here that all the Mediawiki dependencies are running on the same server: PHP, your database and elasticsearch, if not please be careful, especially if your elasticsearch cluster is shared between your production and backup installation.

If this is the case, when restoring a database backup you should also reindex everything from scratch. The same way that your relational database will get erased by restoring the backup, elasticsearch also needs to be reset based on the new content of the restored database. This is the easiest and safest solution.

There are no ways to ensure that the same query will return identical results on two different elasticsearch servers, reason is that ranking uses some stats that will certainly differ even if the documents are the same. What you could do is run some sanity checks, e.g. counting the number of indexed documents in both elasticsearch servers to make sure that they are close.

Raoufgui (talkcontribs)
Reply to "Indexing WIKI after a database restore"
Novem Linguae (talkcontribs)

I was googling to see what the CompletionSuggester algorithm is. I found the page Extension:CirrusSearch/CompletionSuggester and it says The algorithm used to rank suggestions is still under development. Could someone knowledgeable consider updating that to describe the current algorithm? Thank you.

Novem Linguae (talkcontribs)
DCausse (WMF) (talkcontribs)

Thanks for the edit! I removed this bullet from the Limitations and added a more detailed section Ranking criteria.