Extension talk:CirrusSearch

Jump to navigation Jump to search

About this board

Discussion related to the CirrusSearch MediaWiki extension.

See also the open tasks for CirrusSearch on phabricator.

Elasticsearch version for upcoming MediaWiki 1.35

2
Cboltz (talkcontribs)

Which version of elasticsearch will be required with the upcoming MediaWiki 1.35?


Latest Elasticsearch release is 7.8.1, but the CirrusSearch README still says "6.5.4 or higher". Does the "higher" also include 7.8.x, or should I stay with 6.5.x?

DCausse (WMF) (talkcontribs)

Elastic 7.x is not yet supported, WMF is running 6.5.4 but I believe newer versions in the 6.x branch might work as well. If you plan to use WMF elasticsearch plugins then I'd suggest to use 6.5.4.

Reply to "Elasticsearch version for upcoming MediaWiki 1.35"

Search result: totalhits count is miss matched from total result row

4
147.1.18.25 (talkcontribs)

Result miss match.

Need some help here, totalhits count is miss matched from total result row, could you please help me here? why this is different is there any confirmation need to run or set?

{

    "batchcomplete": "",

    "warnings": {

        "search": {

            "*": "Unrecognized value for parameter \"srnamespace\": LL:."

        }

    },

    "query": {

        "searchinfo": {

            "totalhits": 6

        },

        "search": [

            {

                "ns": 3024,

                "title": "Debris Gas System",

                "pageid": 6048,

                "size": 2243,

                "wordcount": 291,

                "snippet": "Debris was found in the fuel gas system",

                "timestamp": "2018-10-24T17:46:48Z"

            },

]

}

}

DCausse (WMF) (talkcontribs)

totalhits is the total number of hits found in elasticsearch, it might vary from what you see for multiple reasons:

  1. the number of pages returned is controlled by a limit param (e.g. srlimit), in your example if you've set srlimit=1 this result looks perfectly normal
  2. the index is not up to date, if for some reasons the process responsible for keeping the index up to date did not function properly then some pages that have been deleted might be in the elasticsearch index but are filtered-out when displaying the results back to the user leading to such inconsistencies in totalmatch and what you could see in the results.

If you believe you are affected the second point try to re-sync your index using the maintenance/saneitize.php (or Saneitize.php if using a recent version of CirrusSearch). If this fixes the issue you should try to understand what happened so that it does happen again.

Rajeshrajesh.35 (talkcontribs)

@DCausse (WMF) - Thanks for your response.


I couldn't found the saneitize.php, i would like to inform you i am using MW-1.31

DCausse (WMF) (talkcontribs)
Reply to "Search result: totalhits count is miss matched from total result row"
PhotographerTom (talkcontribs)

Is there a way to see the top searches that have been performed over a period of time?

EBernhardson (WMF) (talkcontribs)

CirrusSearch doesn't have any similar functionality. It does have low-level logging which could be batch processed to aggregate the top searches, but from CirrusSearch's perspective it logs the request and never thinks about it again.

Kghbln (talkcontribs)

I guess this is a feature request that could be added as a task to Phabricator?

EBernhardson (WMF) (talkcontribs)

In my opinion it wouldn't really be a feature request for CirrusSearch. This seems more like asking for an analytics platform to be built into MediaWiki, which CirrusSearch could then piggy-back off of to provide analytics over search requests such as top queries.

Reply to "Expose top searches"

How to list more than 1 result from a wiki page

2
Chachacha2020 (talkcontribs)

Hi, I'm using


MediaWiki     1.27.1

PHP     5.5.9-1ubuntu4.22 (apache2handler)

MySQL     5.5.53-0ubuntu0.14.04.1

ICU     52.1

Elasticsearch     1.7.5

and kinda pleased with the search result. However, I have a problem. My wiki has a page "Windows tip" and 2 heading name "windows can't sleep" and "Windows wake from sleep". A search "windows sleep" only bring "Windows wake from sleep" then the result come from another page. How to list list more than 1 result from a wiki page?

PS: I can code a bit, so if this feature not available I can contribute.

Kghbln (talkcontribs)

I believe this is a feature request that could be added as a task to Phabricator. The developers are not so active on the wiki so a task at Phabricator will eventually draw more attention to it.

Reply to "How to list more than 1 result from a wiki page"

[8a4e47bbf50dc37d2271edc5] 2020-04-05 18:03:43: Fatal exception of type "Error" 

3
2003:EE:AF2B:6326:D826:7BD0:EDD2:3D0C (talkcontribs)

Hello everyone! 

I just istalled Mediawiki with the extentions carussearch and elasticsearch through the read me file. I did uplode some pages through the xml enwikipediadump and wanted to search a page I knew it was sertntyl in these pages. But then I got this error

[8a4e47bbf50dc37d2271edc5] 2020-04-05 18:03:43: Fatal exception of type "Error" 

Has anybody a clue how to fix this ?

Thank you in advance

Ciencia Al Poder (talkcontribs)

Temporarily set $wgShowExceptionDetails = true; in LocalSettings.php to view a more detailed error message.

Kghbln (talkcontribs)

Obviously no longer and issue.

Summary by Kghbln

Let's face reality: One needs to set up jobs with Redis.

Farvardyn (talkcontribs)

Hi,

  1. If no cache system like redis is available, then what to do in order to use CirrusSearch without job problem and avoid 'Notice'?
  2. As about JDK installation, where should I install its Linux bin on server?


EBernhardson (WMF) (talkcontribs)

CirrusSearch and elasticsearch are generally complex software to install and maintain, I would only suggest using it in a fairly advanced scenario. CirrusSearch makes heavy use of the job queue, it will likely only work with a full job queue implementation (like the redis one) installed. Essentially add a job queue to the list of requirements, it's just as essential as elasticsearch.

Kghbln (talkcontribs)

Thanks for your assessment!

Spas.Z.Spasov (talkcontribs)

I've moved our ElasticSearch server to an another machine, which is running Ubuntu 20.04 and where were installed the package default-jre, which contains Java 11 and the package elasticsearch-6.5.4.deb. The MediaWiki's version is 1.34 and the port 9200 is forwarded via SSH.

Then I've rebuild the search index according to the instructions provided in the README file. Everything went well.

Unfortunately when I tried to use the search feature of the wiki, by the web interface, I received the message: We could not complete your search due to a temporary problem. Please try again later.

After a while, I found the ElasticSearch service is dead, with the following reason:

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely...

So in order to get ElasticSearch operational I've switched to Java 8 (reference) by using the following commands:

sudo apt install openjdk-8-jre-headless 
sudo apt install openjdk-8-jdk-headless
sudo update-alternatives --config java
sudo update-alternatives --config javac
sudo systemctl restart elasticsearch.service 
curl 'http://127.0.0.1:9200' # do a test

Now everything works great!

I do not know which is the trouble maker Extension:CirrusSearch or the ElasticSearch service, but I think it will be meaningful to include some additional information of the compatibility with the different Java versions.

Regards! Spas Spasov

Kghbln (talkcontribs)

Thanks a lot for sharing this information. Indeed, having a Java compat overview will be great. Perhaps it is already there in the abyss of the extension's (code) docu.

Anyhow the bottom line appears to be that Java 8 is required for recent versions of Cirrus.

Spas.Z.Spasov (talkcontribs)

@Kghbln, I'm happy to do that :) Here is small update:

With my setup elasticsearch-6.5.4.deb constantly crashes after a few hours of work. So I switched back to elasticsearch-5.6.16.deb and it works without problems and need of restart for about a week yet. Despite of within the extension's documentations is written MediaWiki 1.33.x and 1.34.x require Elasticsearch 6.5.x.

Another thing that I remembered, when I started to use Extension:CirrusSearch I wasn't able to made the initial search index, unless changing the MySQL's database name from myWiki to my_wiki (without capital letters).

Kghbln (talkcontribs)

Thanks again for keeping us updated about your experience. I find it strange that ES 6.5.4 works with JDK 8 on Ubuntu 18.04. without issues whereas it appears that you need to use ES 5.6.16 with JDK 8 on Ubuntu 20.04. However you cannot beat reality. Did you track why ES was failing? Probably good to know and report.

The compatibility table in the documentation is based on what the developers of CirrusSearch think it should work with. If there is unofficial compatibility this is even better.

About the database name: I also ran into this earlier and found a then undocumented configuration parameter. Thus you could have done $wgCirrusSearchIndexBaseName = 'mywiki'; to avoid renaming the database name. I added an info about it directly to the extension's page rather than linking to many spots the explore the whole lot. :)

Reply to "Java version compatibility"

[166e71ff89c6a092549ca318] [no req] MWException from line 310 of mediawiki\includes\parser\ParserOutput.php: Bad parser output text. 5.

2
Summary by Legaulph

Converted database collision to Latin1_bin

Legaulph (talkcontribs)
MediaWiki   1.31.7
PHP 7.3.17 (apache2handler)
MySQL   8.0.18
Elasticsearch   5.6.16

I updated from mediawiki 1.31.1 to mediawiki 1.31.7 and now I'm seeing this error with elastic search. I added to ParserOutput.php to display the page name, however I can't find the page with "GTS" or "Standard Kanban Board in Leankit", that makes sense. I also searched for https://server/api.php?action=query&prop=info&pageids=5716 and did not see anything like what is displayed.


[ mykaidevdb] Indexed 10 pages ending at 5716 at 13/second [166e71ff89c6a092549ca318] [no req] MWException from line 310 of D:\Bitnami\wampstack\apps\mediawiki\includes\parser\ParserOutput.php: Bad parser output text. 5. GTS’ Standard Kanban Board in Leankit Backtrace:

  1. 0 [internal function]: ParserOutput->{closure}(array)
  2. 1 D:\Bitnami\wampstack\apps\mediawiki\includes\parser\ParserOutput.php(320): preg_replace_callback(string, Closure, string)
  3. 2 D:\Bitnami\wampstack\apps\mediawiki\includes\content\WikiTextStructure.php(152): ParserOutput->getText(array)
  4. 3 D:\Bitnami\wampstack\apps\mediawiki\includes\content\WikiTextStructure.php(225): WikiTextStructure->extractWikitextParts()
  5. 4 D:\Bitnami\wampstack\apps\mediawiki\includes\content\WikitextContentHandler.php(150): WikiTextStructure->getOpeningText()
  6. 5 D:\Bitnami\wampstack\apps\mediawiki\extensions\CirrusSearch\includes\Updater.php(366): WikitextContentHandler->getDataForSearchIndex(WikiPage, ParserOutput, CirrusSearch)
  7. 6 D:\Bitnami\wampstack\apps\mediawiki\extensions\CirrusSearch\includes\Updater.php(204): CirrusSearch\Updater->buildDocumentsForPages(array, integer)
  8. 7 D:\Bitnami\wampstack\apps\mediawiki\extensions\CirrusSearch\maintenance\forceSearchIndex.php(218): CirrusSearch\Updater->updatePages(array, integer)
  9. 8 D:\Bitnami\wampstack\apps\mediawiki\maintenance\doMaintenance.php(94): CirrusSearch\ForceSearchIndex->execute()
  10. 9 D:\Bitnami\wampstack\apps\mediawiki\extensions\CirrusSearch\maintenance\forceSearchIndex.php(679): require_once(string)
  11. 10 {main}
Legaulph (talkcontribs)

Figured out the issue! I was using utf8 for the database and converted it to latin_bin and everything started working.

Error valid SPARQL endpoint to use deep category search

3
Mikael44115 (talkcontribs)

Hello,

I can’t get CirrusSearch to work. When I run a search, I get this message:

A warning has occurred while searching: $wgCirrusSearchCategoryEndpoint should be set to a valid SPARQL endpoint to use deep category search.


My configuration is :

Product Version
MediaWiki 1.32.0
PHP 7.2.14 (apache2handler)
MySQL 5.7.24
ICU 63.1
Elasticsearch 5.6.16
Extension Version License Description Authors
AdvancedSearch 0.1.0 (9bbb17d) 22:15, 15 October 2018 GPL-2.0-or-later Easy access to advanced search capabilities on Special:Search Thiemo Kreuz, Gabriel Birke, Tonina Zhelyazkova and Christoph Jauera
CirrusSearch 0.2 (b1fa4bd) 13:47, 20 February 2019 GPL-2.0-or-later Elasticsearch-powered search for MediaWiki Nik Everett, Chad Horohoe, Erik Bernhardson and others
Elastica 1.3.0.0 (9fcf88c) 09:09, 11 October 2018 GPL-2.0-or-later Base Elasticsearch functionality for other extensions by providing Elastica library Nik Everett and Chad Horohoe

Thank you for helping me.


Mikael

MadX (talkcontribs)

Was anyone able to get this working? I am encountering the same issue.

Revansx (talkcontribs)

I'm getting it too

Reply to "Error valid SPARQL endpoint to use deep category search"

Http error communicating with Elasticsearch

2
Summary by Legaulph

Had the wrong endpoint

Legaulph (talkcontribs)
MediaWiki 1.31.6
PHP 7.3.15 (cgi-fcgi)
MySQL 5.6.41-log
LinkedWiki 3.3.7
CirrusSearch 0.2 (ad9a0d9) 16:24, 17 April 2018
Elastica 1.3.0.0 (7019d96) 20:49, 13 April 2018
CirrusSearch was working and I can connect to the elasticsearch 
D:\xampp\htdocs\mediawiki>curl server.com:9200 --verbose
*Trying fe80::a8b7:e5c8:323:6b1d:9200...
*TCP_NODELAY set
*Connected to server.com (fe80::a8b7:e5c8:323:6b1d) port 9200 (#0)
> GET / HTTP/1.1
> Host: server.com:9200
> User-Agent: curl/7.68.0
> Accept: */*
>
*Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 328
<
{
 "name" : "nqKWfGO",
 "cluster_name" : "elasticsearch",
 "cluster_uuid" : "nfxRkxclRwat03NRtCZuhA",
 "version" : {
   "number" : "5.6.16",
   "build_hash" : "3a740d1",
   "build_date" : "2019-03-13T15:33:36.565Z",
   "build_snapshot" : false,
   "lucene_version" : "6.6.1"
 },
 "tagline" : "You Know, for Search"
}
*Connection #0 to host AWSACRNVA1046.jnj.com left intact

Trying to update I get:

Fetching Elasticsearch version...
Unexpected Elasticsearch failure.
Http error communicating with Elasticsearch:  Operation timed out.
Legaulph (talkcontribs)

Sorry this is my error