Extension talk:CirrusSearch/2022
Add topic| This page used the Structured Discussions extension to give structured discussions. It has since been converted to wikitext, so the content and history here are only an approximation of what was actually displayed at the time these comments were made. |
Discussion related to the CirrusSearch MediaWiki extension.
See also the open tasks for CirrusSearch on phabricator.
Dependency questions
[edit]I'm trying to install "Elastic Search" on a new company wiki.
The description of the installation is confusing and ambiguous.
Guestions:
1. Why are two extensions installed instead of one: ""Elastica" and "CirrusSearch"?
2. In the description of the installation it is recommended to pay attention to the version of "Elastic Search". Was this not fixed when downloading the extension version?
3. There is a recommendation to install "Elastic Search" as a service in the Docker image container, but also as an extension. Are both steps required or only one of them? JacekGdanski (talk) 10:21, 8 February 2022 (UTC)
- I don't know, Elastica looks like a library to facilitate integration with ElasticSearch
- Elastic Search is an external standalone software that you must install. It's a database system that provides search and indexing functionality, and where the current text of all pages of your wiki will be indexed for faster search results. The communication between MediaWiki and ElasticSearch is done through web services. Every version of ElasticSearch change how those web services work, and cause compatibility problems. You must install a version of Elastic Search compatible with the MediaWiki version you're currently used, and not the other way round.
- See previous point. Cirrus Search enhances the search functionality by using Elastic Search, while the native MediaWiki search uses a table in the same database as the wiki with very simple search functionality. Ciencia Al Poder (talk) 11:03, 8 February 2022 (UTC)
- Thank you for your answer. That clears up many things.
- 1. Could someone add exactly these words [2] to the description of the extension? Can I do this?
- 2. Where is it described how external "Elastic Search" service is fed with Wiki data? In this description it is simply "magic" - there is no word about the basic mechanism. JacekGdanski (talk) 13:57, 8 February 2022 (UTC)
- What I wrote was a TL;DR, but everything is explained if you follow the links on the page: ElasticSearch link points to a page describing what ElasticSearch is, and since it's a dependency, when you would go to the page for installation you'll see it's a new program. If you feel that this TL;DR is needed, feel free to add it to the page.
- About how it's fed, this is part of the setup instructions (the Now follow the setup instructions in the CirrusSearch README delivered) Ciencia Al Poder (talk) 15:57, 8 February 2022 (UTC)
MW 1.39+ Elasticsearch version?
[edit]I can see in the page "MediaWiki 1.33.x - 1.38.x require Elasticsearch 6.5.x - 6.8.x (6.8.23+ recommended)". Are there any plans to bump Elasticsearch version in MW 1.39 or future version (current Elasticsearch version is 8.13)? Also, is it likely that with MW 1.39 there will be php 8 support (for a setup with Cirrus search). Spiros71 (talk) 16:46, 20 April 2022 (UTC)
- > Are there any plans to bump Elasticsearch version in MW 1.39 or future version
- Doubtful that 1.39 will be updated. 1.40 will likely support 7.10.2. Due to licensing changes we will not be continuing with elasticsearch beyond 7.10.2. High probability the opensearch project will be replacing elasticsearch, but not currently decided.
- > Also, is it likely that with MW 1.39 there will be php 8 support
- iirc php 8 support is limited by the library used to talk to elasticsearch 6. Similar to the above, it's most likely going to be in 1.40. EBernhardson (WMF) (talk) 20:53, 26 April 2022 (UTC)
- Thanks for the replies! Apparently, opensearch being a fork, there is not much trouble in the transition. I have read some good things about Vespa too: https://vespa.ai/vespa-elastic-solr Spiros71 (talk) 22:17, 29 April 2022 (UTC)
How to know that elasticSearch and MW communicate ?
[edit]Hello,
I try to install cirrusSearch, so I have elasticSearch running as service on a windows server. I think it's running ,here is it health.json.
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 13,
"active_shards" : 13,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
But when I try to search on my wiki I have a search error that it says it's a technical error. So with this url I check if it's cirrusearch with adding &cirrusDumpQuery and I get this json.
{
"__main__": {
"description": "full_text search for 'sql'",
"path": "wikig4_content\/page\/_search",
"params": {
"timeout": "20s",
"search_type": "dfs_query_then_fetch"
},
"query": {
"_source": [
"namespace",
"title",
"namespace_text",
"wiki",
"redirect.*",
"timestamp",
"text_bytes"
],
"stored_fields": [
"text.word_count"
],
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"query_string": {
"query": "sql",
"fields": [
"all.plain^1",
"all^0.5"
],
"phrase_slop": 0,
"default_operator": "AND",
"allow_leading_wildcard": true,
"fuzzy_prefix_length": 2,
"rewrite": "top_terms_boost_1024"
}
},
{
"multi_match": {
"fields": [
"all_near_match^2",
"all_near_match.asciifolding^1.5"
],
"query": "sql"
}
}
],
"filter": [
{
"terms": {
"namespace": [
0
]
}
}
]
}
},
"highlight": {
"pre_tags": [
"\ue000"
],
"post_tags": [
"\ue001"
],
"fields": {
"title": {
"type": "fvh",
"number_of_fragments": 0,
"order": "score",
"matched_fields": [
"title",
"title.plain"
]
},
"redirect.title": {
"type": "fvh",
"number_of_fragments": 1,
"order": "score",
"fragment_size": 10000,
"matched_fields": [
"redirect.title",
"redirect.title.plain"
]
},
"category": {
"type": "fvh",
"number_of_fragments": 1,
"order": "score",
"fragment_size": 10000,
"matched_fields": [
"category",
"category.plain"
]
},
"heading": {
"type": "fvh",
"number_of_fragments": 1,
"order": "score",
"fragment_size": 10000,
"matched_fields": [
"heading",
"heading.plain"
]
},
"text": {
"type": "fvh",
"number_of_fragments": 1,
"order": "score",
"fragment_size": 150,
"no_match_size": 150,
"matched_fields": [
"text",
"text.plain"
]
},
"auxiliary_text": {
"type": "fvh",
"number_of_fragments": 1,
"order": "score",
"fragment_size": 150,
"matched_fields": [
"auxiliary_text",
"auxiliary_text.plain"
]
},
"file_text": {
"type": "fvh",
"number_of_fragments": 1,
"order": "score",
"fragment_size": 150,
"matched_fields": [
"file_text",
"file_text.plain"
]
}
},
"highlight_query": {
"query_string": {
"query": "sql",
"fields": [
"title.plain^20",
"redirect.title.plain^15",
"category.plain^8",
"heading.plain^5",
"opening_text.plain^3",
"text.plain^1",
"auxiliary_text.plain^0.5",
"title^10",
"redirect.title^7.5",
"category^4",
"heading^2.5",
"opening_text^1.5",
"text^0.5",
"auxiliary_text^0.25"
],
"phrase_slop": 1,
"default_operator": "AND",
"allow_leading_wildcard": true,
"fuzzy_prefix_length": 2,
"rewrite": "top_terms_boost_1024"
}
}
},
"suggest": {
"text": "sql",
"suggest": {
"phrase": {
"field": "suggest",
"size": 1,
"max_errors": 2,
"confidence": 2,
"real_word_error_likelihood": 0.95,
"direct_generator": [
{
"field": "suggest",
"suggest_mode": "always",
"max_term_freq": 0.5,
"min_doc_freq": 0,
"prefix_length": 2
}
],
"highlight": {
"pre_tag": "\ue000",
"post_tag": "\ue001"
},
"smoothing": {
"stupid_backoff": {
"discount": 0.4
}
}
}
}
},
"stats": [
"suggest",
"full_text",
"full_text_querystring",
"simple_bag_of_words"
],
"rescore": [
{
"window_size": 8192,
"query": {
"query_weight": 1,
"rescore_query_weight": 1,
"score_mode": "multiply",
"rescore_query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "incoming_links",
"modifier": "log2p",
"missing": 0
}
}
]
}
}
}
}
],
"size": 21
},
"options": {
"timeout": "20s",
"search_type": "dfs_query_then_fetch"
}
}
}
Here is a copy of Spécial:Version
| Produit | Version |
|---|---|
| MediaWiki | 1.37.1 |
| PHP | 8.1.2 (apache2handler) |
| MariaDB | 10.4.22-MariaDB |
| ICU | 70.1 |
| Elasticsearch | 6.8.23 |
Strange that the wiki shows the version of elasticSearch...
So how to know that elasticSearch and MW communicate well?
Any other idea is apricied
Thank you, Nicolas senechal (talk) 11:25, 3 May 2022 (UTC)
- Hello,
- when the search engine is set to CirrusSearch, you will get red box with warning message within the search results page if there is a trouble with Elasticsearch. Spas.Z.Spasov (talk) 11:37, 3 May 2022 (UTC)
- Thank you for your quick response , it's what I get, so what I have to do with ElasticSearch how I can check if it works properly ?
- because in my logs I don't have error, I check with another wiki(who work) that I use and the elasticSearch log's are the same exeptc for this line :
[2022-05-02T14:23:04,386][INFO ][o.e.c.r.a.AllocationService] [Y4F2XBY] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[test_content_first][2], [test_content_first][0], [mw_cirrus_metastore_first][0]] ...]).Nicolas senechal (talk) 12:39, 3 May 2022 (UTC)- So, I go to http://localhost:9200/_cat/indices?format=json&pretty and my server is OK, but I have 4 parts on my json and in my wikitest I have the same (and it works) so I don't know what I can do or where I can watch to know the issue of this...
- here is the result of http://localhost:9200/_cat/indices?format=json&pretty
- Nicolas senechal (talk) 14:50, 4 May 2022 (UTC)
[ { "health" : "green", "status" : "open", "index" : "test_archive_first", "uuid" : "jQZYnyGUStWWqDVjfLxpHg", "pri" : "4", "rep" : "0", "docs.count" : "0", "docs.deleted" : "0", "store.size" : "1kb", "pri.store.size" : "1kb" }, { "health" : "green", "status" : "open", "index" : "test_content_first", "uuid" : "x9Y9ACxWSg-oBLxvKbzpjw", "pri" : "4", "rep" : "0", "docs.count" : "5", "docs.deleted" : "1", "store.size" : "44.2kb", "pri.store.size" : "44.2kb" }, { "health" : "green", "status" : "open", "index" : "mw_cirrus_metastore_first", "uuid" : "rIRWtNZ_T6GxuLrKH6lstw", "pri" : "1", "rep" : "0", "docs.count" : "25", "docs.deleted" : "6", "store.size" : "15.4kb", "pri.store.size" : "15.4kb" }, { "health" : "green", "status" : "open", "index" : "test_general_first", "uuid" : "39kWAi7cSnyME6R0BWlhyQ", "pri" : "4", "rep" : "0", "docs.count" : "21", "docs.deleted" : "4", "store.size" : "192kb", "pri.store.size" : "192kb" } ]
I test with my production setting of media wiki, on my test wiki everything it's OK, so... if it's not the server, not the wiki, not the communication between server and wiki. The only thing that I see it's a server response problem or server don't index the pages with the database... so how I can test that , how I can view the connection between database and elasticSearch because after the look on Google, I don't find some test with MW?- So I follow UPGRADE and now I don't have any error (yeah) but I have no result so, I think I should index but, the first part of upgrade alrady do that?
- I have a warrning with the segond part, I don't know if it's important or not, so I passed out.
- here is the part of the warning in my localsettings.
# php metastore.php --upgrade PHP Warning: Undefined array key "REMOTE_ADDR" in D:\WikiG4\xampp\htdocs\WikiG4\LocalSettings.php on line 138 Warning: Undefined array key "REMOTE_ADDR" in D:\WikiG4\xampp\htdocs\WikiG4\LocalSettings.php on line 138 mw_cirrus_metastore is up and running with version 2.0
Sorry I forgot why I put that but, I think it's an error issu, because my wiki it's private, so with some extention it could be have a bug so here is the solution... Nicolas senechal (talk) 13:33, 5 May 2022 (UTC)$wgGroupPermissions['*']['edit'] = false; $wgGroupPermissions['interface-admin']['gadgets-edit'] = true;//config gadget $wgGroupPermissions['interface-admin']['gadgets-definition-edit'] = true;//config gadget if ( $_SERVER['REMOTE_ADDR'] == $serverAdress ) { $wgGroupPermissions['*']['read'] = true; $wgGroupPermissions['*']['edit'] = true; $wgGroupPermissions['*']['writeapi'] = true; }
Make installation simpler
[edit]It all sounds kinda straightforward until you open that README file and realize that it requires a special kind of MW nerd to get this thing up and running. Can|t this all be a little easier? I know MW hates GUI type installation and configuration but jeez.... a graphical installer with several steps to go through would be great 2001:9E8:957:7200:C2A0:1CA5:258E:5914 (talk) 10:45, 23 July 2022 (UTC)
- It looks complicated only first few times :) Here is one script that I'm using for past few years to create the index:Spas.Z.Spasov (talk) 05:55, 25 July 2022 (UTC)
#!/bin/bash # @author Spas Z. Spasov <spas.z.spasov@metalevel.tech> # @copyright 2022 Spas Z. Spasov # @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later) # # @name /usr/local/bin/mlw-maintenance-cirrusSearch-elasticsearch-create-index.sh # @desc Create elastic search index for an MediaWiki instance # # @source https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README IP="/var/www/wiki.example.com" # STEP 1 sed -i 's#^$wgSearchType#// $wgSearchType#' $IP/LocalSettings.php sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' $IP/LocalSettings.php echo -e '\n\n\n*\n* $IP/LocalSettings.php\n*\n' grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php echo sleep 3 printf -- '\n\n*\n* Generate ElasticSearch Index for %s -----\n*\n\n' "$IP" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver --conf $IP/LocalSettings.php # STEP 2 sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' $IP/LocalSettings.php grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php echo sleep 3 printf -- '\n\n*\n* Bootstrap the Search Index for %s -----\n*\n\n' "$IP" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip --conf $IP/LocalSettings.php /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse --conf $IP/LocalSettings.php # STEP 3 sleep 3 printf -- '\n\n*\n* Enable Cirrus Search for %s -----\n*\n\n' "$IP" sed -i 's#^// $wgSearchType#$wgSearchType#' $IP/LocalSettings.php echo -e '\n\n\n*\n* $IP/LocalSettings.php\n*\n' grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php echo # Step 4 sleep 3 printf -- '\n\n*\n* Update Cirrus Search Suggestions for %s -----\n*\n\n' "$IP" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php --conf $IP/LocalSettings.php
- I bet practice helps, but it's a bummer how much manual fiddling is involved in getting an extension to work, the number of wikis not making the switch to CS because of that must be enormous 2001:9E8:958:7000:FC1D:AA47:52C6:681F (talk) 07:41, 28 July 2022 (UTC)
match_phrase_prefix
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Mediawiki 1.38.2 and CirrusSearch generate Elastic queries using query_string. How do I make Cirrus use match_phrase_prefix instead ?
This will allow me to find page using partial keywords: Example "Cirr" will return pages with "Cirrus" inside.
Any ideas ? Thanks. 195.65.152.115 (talk) 16:31, 18 August 2022 (UTC)
MediaWiki 1.39 + Elasticsearch 7.10.2 + php 8
[edit]- Has the above been tested to work without issues?
- I read in the page: Note that Elasticsearch versions prior to 6.8 are not compatible with PHP 8. Spiros71 (talk) 12:39, 25 September 2022 (UTC)
- MediaWiki 1.40 + Elasticsearch 7.17 + php 8.1 has been working fine since September. Utpark (talk) 00:24, 1 November 2022 (UTC)
- 7.10 is not a version prior to 6.8 so I would expect it to work from reading this. You never know with Elasticsearch in combination with MediaWiki.
- Anyone with actual experience? [[kgh]] (talk) 11:31, 27 September 2022 (UTC)
- CirrusSearch on 1.39 just had a couple patches back-ported from master to work flawlessly on elasticsearch 7.10.2.
- Regarding PHP8: overall compatibility has increased a lot and the test suites are passing but since WMF is not running PHP8 it is hard to answer yes to your question. Most of the blockers on the CirrusSearch side have been resolved so please let us know if you encounter issue running this setup. DCausse (WMF) (talk) 13:01, 27 September 2022 (UTC)
- The point is I have a rather large wiki with MW1.31, Elasticsearch 5.6.13, and php 7.3 so I try to be extra careful.
- The idea is to upgrade all software in one go (will MW 1.39 support upgrade in one move form 1.31?) rather than going through all the hassle just to bump to php 7.4. It was quite a ride getting Cirrus to work properly with ICU folding last time: Extension talk:CirrusSearch/2018#h-CirrusSearch_for_MW1.31_with_ICU_plugin_support?-2018-12-06T12:20:00.000Z
- I see many Wikimedia sites face similar issues, still running php 7.2 and php 7.3, is there any special way you deal with EOL php versions secutiry-wise? Spiros71 (talk) 14:07, 27 September 2022 (UTC)
- I understand why you want to be careful here and sadly there might still be few issues regarding PHP8 in MW 1.39.
- Regarding your upgrade plan: beware that you won't be able to jump from elasticsearch 5.6 to 7.10 without re-creating all your indices from scratch. DCausse (WMF) (talk) 14:29, 27 September 2022 (UTC)
- Thanks, yes, I was aware that I need to recreate indices; hope that analysis-icu plugin and the Extra Queries and Filters will be available for that version -:) Spiros71 (talk) 15:11, 27 September 2022 (UTC)
- Yes all these plugins should be available for elasticsearch 7.10.2. The extra plugin might not get ported to more recent versions as the WMF is unlikely to deploy more recent versions of elasticsearch due to licensing issues. DCausse (WMF) (talk) 15:15, 27 September 2022 (UTC)
- Yes, I remember you telling me about the possibility of switching to opensearch. Spiros71 (talk) 15:49, 27 September 2022 (UTC)
Does ElasticSearch 7.x work with MediaWiki 1.38?
[edit]The dependencies section of the instructions say:
- MediaWiki 1.33.x - 1.38.x require Elasticsearch 6.5.x - 6.8.x. (6.8.23+ recommended)
Does that mean that anything higher than 6.8.23 _including_ 7.x is recommended, or only 6.8.23 but less than version 7? Coyotefather (talk) 15:44, 25 November 2022 (UTC)
- Anything between the 6.5.x - 6.8.x versions, not higher than that.
- 6.9 (if that version even exists) and 7.0 would not be compatible Ciencia Al Poder (talk) 14:17, 29 November 2022 (UTC)
- Ok, thank you! Coyotefather (talk) 14:24, 29 November 2022 (UTC)
MediaWiki 1.39 + Elasticsearch 7.17.7 + php 7.4.3 +SMW 4.0.2
[edit]My hosting only support Elasticsearch 7.17 & 7.9.
It seems that MW 1.39 only support Elasticsearch 7.10.
When I run this command
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php
I got this error message
PHP Deprecated: Use of PersonalUrls hook was deprecated in MediaWiki 1.39. [Called from SMW\MediaWiki\Hooks::register in /home/XXX/public_html/extensions/SemanticMediaWiki/src/MediaWiki/Hooks.php at line 151] in /home/XXX/public_html/includes/debug/MWDebug.php on line 381
Updating cluster ... indexing namespaces... mw_cirrus_metastore missing, creating new metastore index. Creating metastore index... mw_cirrus_metastore_first Scanning available plugins... analysis-icu, analysis-phonetic ok Green!Creating mw_cirrus_metastore alias to mw_cirrus_metastore_first. Indexing namespaces...done content index... Fetching Elasticsearch version...7.17.7...partially supported You use a version of elasticsearch that is partially supported, you should upgrade to 7.10.x
It seems that I need to use Elasticsearch 7.10.X only . Lotusccong (talk) 14:43, 8 December 2022 (UTC)
- Yes, it only supports 7.10.x Ciencia Al Poder (talk) 17:45, 8 December 2022 (UTC)
- Any roadmap to support Elasticsearch 7.17 ?
- Can we use Elasticsearch 7.9 with MW 1.39 ? Lotusccong (talk) 05:57, 9 December 2022 (UTC)
- I don't know what's wrong with ElasticSearch that breaks compatibility so much between versions, but the required versions on this page are accurate and anything outside of what's expected there would indeed break Ciencia Al Poder (talk) 19:03, 9 December 2022 (UTC)
- Sadly versions above 7.10.2 are distributed under the SSPL license and makes such versions unlikely to be installed on the WMF production servers, please see T272111 for more details. While nothing is set in stone I believe that the WMF is unlikely to invest time in making CirrusSearch compatible with elasticsearch 7.17 but rather will try to find an alternative to elasticsearch. DCausse (WMF) (talk) 08:13, 12 December 2022 (UTC)