Extension talk:CirrusSearch

Jump to navigation Jump to search

About this board

Discussion related to the CirrusSearch MediaWiki extension.

See also the open tasks for CirrusSearch on phabricator.

TazzyTazzy (talkcontribs)

Will MediaWiki 1.34 use an updated version of ElasticSearch? Currently, 6.5.x is supported, which came out in December 2019. It would be great to support 7.x series, but even a bump to 6.8.x series would be amazing.

Is there a roadmap somewhere?

EBernhardson (WMF) (talkcontribs)

There is no particular roadmap for elasticsearch versions. Unfortunately elasticsearch is an ever moving target, and upgrades take a significant (multiple month) time investment. Currently I'm not aware of anything in the later elasticsearch versions that make that investment worthwhile.

Reply to "Newer elasticsearch with mw 1.34?"

CirrusSearch not working, nor meeting bugs

3
Neil9830409 (talkcontribs)

On my machine, ElasticSearch is installed.

In Special:Version, it shows that both Elastica and CirrusSearch have been installed already.

However, clicking the "(?)Help" sign beside the search bar leads me to "Help:Searching" instead of "Help:CirrusSearch," and everything shows that I can't use any of functions of CirrusSearch, except for the "Special:Version" page.

What might be the hidden problem? Haven't I correctly installed preparations and the CirrusSearch module?

TeeDizzle (talkcontribs)

You might want to check your CirrusSearch extension version. I ran into the same problem after upgrading from Ubuntu 18.04 LTS to 19.04 and Mediawiki 1.27 to 1.31 on my quest to set up PDF indexing and search support. The CirrusSearch installation recommendation for non-supported versions of Mediawiki (1.27 on Ubuntu 18.04 LTS) was to try the extension for the latest Mediawiki version available (1.33). Ubuntu 19.04 ships with Mediawiki 1.31. After installing the correct CirrusSearch extension version (1.33 -> 1.31) it started working like a charm.

Ciencia Al Poder (talkcontribs)

On my wiki, the help link still points to Help:Searching, however Cirrus Search is fully functional with all its features.

Does a simple search work? Please elaborate on the results you get

Reply to "CirrusSearch not working, nor meeting bugs"

Class 'Elastica\Client' not found

4
Summary by Lemonlvor

Had to run composer on Elastica directory even though I didn't install from git

Lemonlvor (talkcontribs)

Here's what I'm running:

MediaWiki 1.33.0

PHP 7.2.19-0ubuntu0.18.04.2 (apache2handler)

MySQL 5.7.27-0ubuntu0.18.04.1


I have followed the steps on the extension page:

- Installed Elasticsearch 6.5.4 per the instructions for MediaWiki 1.33.0.

- Installed Elastica, shows up on Special:Version

- Installed CirrusSearch, causes Special:Version to not load after inserting require_once "$IP/extensions/CirrusSearch/CirrusSearch.php"; in LocalSettings.php


The following error is displayed on Special:Version:

[6615ca91ab54427a72fcf3c5] 2019-08-20 14:52:28: Fatal exception of type "Error"


When I attempt to run the first step of the CirrusSearch instructions, this happens:

hincb@cbwiki2:/var/www/html/mediawiki$ php extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php

indexing namespaces...

[c8fe2a117d6f3ed5dab3e395] [no req]   Error from line 90 of /var/www/html/mediawiki/extensions/Elastica/includes/ElasticaConnection.php: Class 'Elastica\Client' not found

Backtrace:

#0 /var/www/html/mediawiki/extensions/Elastica/includes/ElasticaConnection.php(62): ElasticaConnection->getClient()

#1 /var/www/html/mediawiki/extensions/CirrusSearch/includes/Connection.php(133): ElasticaConnection->setConnectTimeout(integer)

#2 /var/www/html/mediawiki/extensions/CirrusSearch/includes/Connection.php(113): CirrusSearch\Connection->__construct(CirrusSearch\SearchConfig, string)

#3 /var/www/html/mediawiki/extensions/CirrusSearch/includes/Maintenance/Maintenance.php(115): CirrusSearch\Connection::getPool(CirrusSearch\SearchConfig, string)

#4 /var/www/html/mediawiki/extensions/CirrusSearch/includes/Maintenance/Maintenance.php(224): CirrusSearch\Maintenance\Maintenance->getConnection()

#5 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/indexNamespaces.php(35): CirrusSearch\Maintenance\Maintenance->maybeCreateMetastore()

#6 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(54): CirrusSearch\Maintenance\IndexNamespaces->execute()

#7 /var/www/html/mediawiki/maintenance/doMaintenance.php(96): CirrusSearch\Maintenance\UpdateSearchIndexConfig->execute()

#8 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(70): require_once(string)

#9 {main}


Here's verification of my ElasticSearch version:

hincb@cbwiki2:/var/www/html/mediawiki$ curl -X GET "localhost:9200/?pretty"

{

  "name" : "Og7QKNt",

  "cluster_name" : "elasticsearch",

  "cluster_uuid" : "geDUNZn5Tk63F6wjWs8Smg",

  "version" : {

    "number" : "6.5.4",

    "build_flavor" : "default",

    "build_type" : "deb",

    "build_hash" : "d2ef93d",

    "build_date" : "2018-12-17T21:17:40.758843Z",

    "build_snapshot" : false,

    "lucene_version" : "7.5.0",

    "minimum_wire_compatibility_version" : "5.6.0",

    "minimum_index_compatibility_version" : "5.0.0"

  },

  "tagline" : "You Know, for Search"

}


Any ideas why this isn't working? I appear to have the correct versions of everything installed. I've previously configured ElasticSearch 5.4.3 on MediaWiki 1.30.0 and didn't have any issues at all so I generally know what I'm doing until I start getting errors like this. Thanks.

Ciencia Al Poder (talkcontribs)

Maybe you need to run composer for those dependencies to install correctly, even if installation instructions tell that's only necessary when installing from git

Lemonlvor (talkcontribs)

I'll try it.

Running sendData on cluster default 0s after insertion

2
S0ring (talkcontribs)

CirrusSearch is installed and running. The following line in ''LocalSettings.php'' records the logging messages:

$wgDebugLogGroups['CirrusSearch'] = "$IP/extensions/CirrusSearch/error.log";


and the following message is frequently recorded:

root@da0488489991:/var/www/html# more extensions/CirrusSearch/error.log

...

2019-08-15 10:46:29 da0488489991 kimmw: Running sendData on cluster default 0s after insertion

2019-08-15 10:46:29 da0488489991 kimmw: Running sendData on cluster default 0s after insertion

2019-08-15 10:46:31 da0488489991 kimmw: Running sendData on cluster default 0s after insertion

2019-08-15 10:46:31 da0488489991 kimmw: Running sendData on cluster default 0s after insertion

2019-08-15 10:46:32 da0488489991 kimmw: Running sendData on cluster default 0s after insertion

...


What does it mean?

DCausse (WMF) (talkcontribs)

These are debug messages that you can safely ignore.

Reply to "Running sendData on cluster default 0s after insertion"

Search backend error during full_text search: number_format_exception

3
S0ring (talkcontribs)

Hi,


after the CirrusSearch installation I get the following error (similar like Topic:U9x3b2ub76pdtfrs):

2019-08-02 13:04:55 0292f8cece5d wiki: Search backend error during full_text search for 'science' after 419: number_format_exception: For input string: "0,5"


Here the configuration:

MediaWiki 1.31.3

PHP  7.2.19

MySQL  5.7.26

Elasticsearch  5.6.0

CirrusSearch  0.2

Elastica  1.3.0.0


Is there a fix for this issue?


S0ring (talkcontribs)

The problem disappeared after replacing $wgShellLocale from de_DE.utf8 to en_EN.utf8

DCausse (WMF) (talkcontribs)

This issue should have been fixed some time ago, see phab:T189877.

Reply to "Search backend error during full_text search: number_format_exception"

How do I enable Completion Suggester?

3
Afiqyazid (talkcontribs)

Hi, can anybody help me to enable completion suggester feature in my wiki? The completion suggester options also does not appear in Special:Preferences#mw-prefsection-searchoptions. Any steps that I miss? Thanks for any help.

Nirobbins (talkcontribs)

I also struggled to get the suggester working on the latest version. Without knowing your exact setup, this is what worked for me:


LocalSettings.php:

$wgCirrusSearchUseCompletionSuggester = 'yes';

$wgCirrusSearchCompletionSettings = 'fuzzy-subphrases';

$wgCirrusSearchPhraseSuggestProfiles = 'default';

$wgCirrusSearchCompletionSuggesterSubphrases = [

   'build' => true,

   'use' => true,

   'type' => 'anywords',

   'limit' => 10,

];

$wgCirrusSearchCompletionSuggesterUseDefaultSort = true;

You will also need to run <wiki directory>/extensions/CirrusSearch/maintenance/updateSuggesterIndex.php at least once. I ended up setting a cron job to run the updater script nightly.

S0ring (talkcontribs)

Using your configuration I got the following error:

Exception caught: Config entry CirrusSearchPhraseSuggestProfiles must be an array or unset",

therefore I just unset $wgCirrusSearchPhraseSuggestProfiles

Reply to "How do I enable Completion Suggester?"

Issue: Can't sort by creation date (create_timestamp_asc )

3
197.235.55.190 (talkcontribs)

Steps to reproduce:

1. Go to www.mediawiki.org/w/index.php?sort=create_timestamp_asc&search=monkey

Expected: Pages Sorted by ascending Actual Error :"An error has occurred while searching: Query was not understood. Please make it simpler. The query was logged to improve the search system."

Note: This happens on this very wiki, it doesn't seem to affect others.

EBernhardson (WMF) (talkcontribs)

It looks like some fields were missing in the search index for mw.org. I've started up a process to fix them, and will need to come up with some process to check the rest of the wikis. It will probably take a few hours before the new index is promoted.

197.235.66.41 (talkcontribs)

It seems to be working again for this wiki. Thanks.

Reply to "Issue: Can't sort by creation date (create_timestamp_asc )"

Search inside uploaded documents

12
Xaris~mediawikiwiki (talkcontribs)

Question: Can this extention search inside documents which have been uploaded to the wiki like PDF's?

Ricordisamoa (talkcontribs)
NEverett (WMF) (talkcontribs)

I've just backported most of the features in Cirrus' master branch to the REL1_22 branch, including this. If you want to try it make sure to get the new version of the Elastica plugin on its REL1_22 branch as well and rebuild your index.

2.82.64.19 (talkcontribs)

Do we need to force index the pdf files? I'm seeing no results from pdfs.

Nemo bis (talkcontribs)

First of all try a null edit and wait some time (at most few hours) for the job queue; report back if that wasn't enough.

Chris d edge (talkcontribs)

I've been working on a method that parses document files (PDFs, Word, PPT, etc.) using Tika to extract the document text, and then re-insert the extracted text into the file_text field of the WIKI_general_first index inside Elasticsearch. On this point, I have a couple of questions: 1) Does this sound like the proper method to provide searchable text from documents in CirrusSearch? 2) Has anyone else done anything similar?

On point 2, the reason I ask is that for some documents I'm extracting text from, the resulting text can be huge (100s of MBs) and can grind the search to a hault for some queries (mostly for terms which there aren't many of inside the index).

Any pointers would be greatly appreciated.

SmartK (talkcontribs)
173.164.76.121 (talkcontribs)

Hello Everyone,

I have just added the extension CirrusSearch with all the dependencies. I am not able to search through PDF, txt and, docx. Please help!

Regards,

Dgennaro (talkcontribs)

I am also not able to index documents. I have Image Authorization configured.

Andreas Plank (talkcontribs)

I got PDF search working but not for *.doc files (PDF search works on MW 1.26.2 and MW 1.28.2). You need at least to

  1. Does anybody have a solution for searching inside *.doc files yet?
  2. Did I miss some configuration to set up?
  3. Would it need an FileHandler for doc files to get it working?
SmartK (talkcontribs)
S0ring (talkcontribs)
Reply to "Search inside uploaded documents"

Error valid SPARQL endpoint to use deep category search

2
Mikael44115 (talkcontribs)

Hello,

I can’t get CirrusSearch to work. When I run a search, I get this message:

A warning has occurred while searching: $wgCirrusSearchCategoryEndpoint should be set to a valid SPARQL endpoint to use deep category search.


My configuration is :

Product Version
MediaWiki 1.32.0
PHP 7.2.14 (apache2handler)
MySQL 5.7.24
ICU 63.1
Elasticsearch 5.6.16
Extension Version License Description Authors
AdvancedSearch 0.1.0 (9bbb17d) 22:15, 15 October 2018 GPL-2.0-or-later Easy access to advanced search capabilities on Special:Search Thiemo Kreuz, Gabriel Birke, Tonina Zhelyazkova and Christoph Jauera
CirrusSearch 0.2 (b1fa4bd) 13:47, 20 February 2019 GPL-2.0-or-later Elasticsearch-powered search for MediaWiki Nik Everett, Chad Horohoe, Erik Bernhardson and others
Elastica 1.3.0.0 (9fcf88c) 09:09, 11 October 2018 GPL-2.0-or-later Base Elasticsearch functionality for other extensions by providing Elastica library Nik Everett and Chad Horohoe

Thank you for helping me.


Mikael

MadX (talkcontribs)

Was anyone able to get this working? I am encountering the same issue.

Reply to "Error valid SPARQL endpoint to use deep category search"
82.3.50.193 (talkcontribs)

Hi,

On our private wiki, we removed the restriction to display title due to the types of pages we are producing. Over the last few days I have been trying to modify cirrussearch in order to search on displaytitle. I'm not entirely sure this was possible, but I noticed reference to extracting displaytitle, as well as the schema mentioning it. Has anyone attempted this before?

DCausse (WMF) (talkcontribs)
Reply to "Searching on DISPLAYTITLE"