Extension talk:CirrusSearch
Add topic![]() Archives
|
---|
- Discussion related to the CirrusSearch MediaWiki extension.
- See also the open tasks for CirrusSearch on phabricator.
How to search for ASCII translated Umlaut handling in URLs in source code using quotes?
[edit]I have template source code using external URLs. Some contain ASCII-translated Umlauts like fl%C3%BCgel
for flügel
.
When searching via API using insource and quotes it doesn't find it.
insource:"/path/fl%C3%BCgel"
It only finds it without quotes insource:/path/fl%C3%BCgel
92.50.65.235 16:59, 28 January 2025 (UTC)
Elasticsearch
[edit]Looking on the Elasticsearch website and having trouble. Elastic Cloud costs a fortune and Self-managed Elasticsearch options are "not suitable for production use". What gives? 81.151.8.175 15:48, 6 April 2025 (UTC)
Get CirrusSearch work for chinese
[edit]Hi all, I have a site that runs on 1.43, most of the contents are simplified & traditional chinese, and I am trying to get CirrusSearch + Elasticsearch work, but still struggling..
Basically I want to make it work just like https://zh.wikipedia.org/ (not sure if it is using the same approach CirrusSearch + Elasticsearch?) The primary issue I currently have is that, for e.g. if I search for "方济各", I want the results to only show pages that has this phrase "方济各" (like if you search for the same on wikipedia: https://zh.wikipedia.org/w/index.php?search=%E6%96%B9%E6%B5%8E%E5%90%84&title=Special%3A%E6%90%9C%E7%B4%A2&profile=advanced&fulltext=1&ns0=1&searchToken=68qlm8r96e8225klwzw6fkjyd), not ["方" or "济" or "各"] (which is what it currently is doing).. Here is my current test instance if you want to try it: http://44.199.64.14/w/index.php?search=%E6%96%B9%E6%B5%8E%E5%90%84&title=Special%3A%E6%90%9C%E7%B4%A2&wprov=acrw1_-1&ns0=1&ns1=1&ns14=1&ns4100=1&ns4200=1
Here are what I have done so far:
Installed Elasticsearch 7.10.2
Installed extension CirrusSearch and Elastic
Installed Elasticsearch plugin: analysis-ik(IK Analyzer), and analysis-stconvert
I have also modified the /extensions/CirrusSearch/includes/Maintenance/AnalysisConfigBuilder.php file to include the reference to IK analyzer.
I also just read this page User:TJones (WMF)/Notes/Chinese Analyzer Analysis, that "The short version is that SmartCN+STConvert did the best on all the corpora", so... does this mean out of box (SmartCN+STConvert) it should be working fine for chinese, and I don't need IK analyszer plugin? If the SmartCN+STConvert is enough, what I am missing to make it work like https://zh.wikipedia.org/w/index.php?search=%E6%96%B9%E6%B5%8E%E5%90%84&title=Special%3A%E6%90%9C%E7%B4%A2&profile=advanced&fulltext=1&ns0=1&searchToken=68qlm8r96e8225klwzw6fkjyd?
Thank you everyone! Paulxu20 (talk) 03:12, 16 May 2025 (UTC)
- Just some updates after discussion with @DCausse (WMF) in IRC:
- Looks like if just to get it working like zh.wikipedia.org, I dont need the IK analyzer, so I am now reverting my code changes to AnalysisConfigBuilder.php back to the OOB version.
- I do have a question now - is that do I need to install the "analysis-icu" plugin to make the chinese search work well? it looks like so but want to double check. Paulxu20 (talk) 20:21, 16 May 2025 (UTC)
- Hi,
analysis-icu
is a useful plugin and it's generally a good idea to install it but whether or not it is useful for Chinese? I would defer this to @TJones (WMF) to answer. DCausse (WMF) (talk) 09:28, 19 May 2025 (UTC)- Thank you @DCausse (WMF) @TJones (WMF), I do have analysis-icu installed along with a few other plugins, here is the list of all plugins currently installed:
:::name component version :::ip-172-26-3-55 analysis-icu 7.10.2 :::ip-172-26-3-55 analysis-ik 7.10.2 :::ip-172-26-3-55 analysis-smartcn 7.10.2 :::ip-172-26-3-55 analysis-stconvert 7.10.2 :::ip-172-26-3-55 extra 7.10.2-wmf12 :::
- I have put all the stuff I did on this page: http://44.199.64.14/wiki/CirrusSearch_Test, including the plugins I installed, the LocalSettings.php, the command I ran, etc. After a week of trying and digging I think I am making some progress, for example now when searching for "方济各", it does find the page (http://44.199.64.14/wiki/TestPage2) which contains the whole phrase and show it on top of the search results, however, at the same time the search results are still showing other pages that either has "方", or "济", or "各". Not sure what I am missing..
- Really appreciate all your help! Paulxu20 (talk) 13:03, 19 May 2025 (UTC)
- Hello @DCausse (WMF) @TJones (WMF)
- Just to follow up after more digging today, I realized that zh.wikipedia.org's search is also not working well.. it is just that there are many pages contains "方济各" and they are ranked high, so they are being list on top of the search results, but when I check more pages, for e.g. 1500 pages later (https://zh.wikipedia.org/w/index.php?limit=500&offset=1500&profile=default&search=%E6%96%B9%E6%B5%8E%E5%90%84&title=Special:%E6%90%9C%E7%B4%A2&ns0=1), it is also showing pages that has nothing to do with the phrase "方济各", for e.g. this page "https://zh.wikipedia.org/zh-cn/%E7%BE%8E%E6%B5%8E%E7%A4%81" is in the search result, it is an island name and absolutely has nothing to do with "方济各" (who is the name of Pop who just passed away).
- What should happen, is that when searching for "方济各", it should be the combined results of below two queries (with duplicates removed):
- 1- https://zh.wikipedia.org/w/index.php?search=%22%E6%96%B9%E6%B5%8E%E5%90%84%22&title=Special%3A%E6%90%9C%E7%B4%A2&profile=advanced&fulltext=1&advancedSearch-current=%7B%22fields%22%3A%7B%22phrase%22%3A%22%5C%22%E6%96%B9%E6%B5%8E%E5%90%84%5C%22%22%7D%7D&ns0=1
- 2- https://zh.wikipedia.org/w/index.php?search=%22%E6%96%B9%E6%BF%9F%E5%90%84%22&title=Special%3A%E6%90%9C%E7%B4%A2&profile=advanced&fulltext=1&advancedSearch-current=%7B%22fields%22%3A%7B%22phrase%22%3A%22%5C%22%E6%96%B9%E6%BF%9F%E5%90%84%5C%22%22%7D%7D&ns0=1
- The first one is the query when doing exact match for simplified Chinese "方济各", and the second one is the query to exact match for traditional Chinese "方濟各", which is the same phrase "方济各" but just traditional Chinese.
- So there are two problems here:
- 1) should only show pages that contains the exact phrase "方济各"
- 2) should be able to show results for both simplified Chinese and traditional Chinese, irrespective what form the original keyword is (simplified or traditional)
- Let me know if this makes sense. Paulxu20 (talk) 20:28, 19 May 2025 (UTC)
- Just to add - behavior wise, I think it should work just like how it works out of box without using CirrusSearch (that is native SQL query?), but the problem with that is that it is much slower than Elasticsearch.. Paulxu20 (talk) 20:34, 19 May 2025 (UTC)
- Hi,
Autocomplete not updating with new page titles
[edit]When I create a new page, I do not see it when I start typing on search bar (even after hours). Job queue empty (cron job set up ever 3 mins).
Using
wfLoadExtension( 'Elastica' ); wfLoadExtension( 'CirrusSearch' ); $wgCirrusSearchUseIcuFolding = 'yes'; $wgSearchType = 'CirrusSearch';
MediaWiki | 1.39.12 |
PHP | 8.3.20 (fpm-fcgi) |
MariaDB | 10.6.21-MariaDB |
ICU | 67.1 |
Lua | 5.1.5 |
Elasticsearch | 7.10.2 |
Spiros71 (talk) 12:17, 18 May 2025 (UTC)
- @Spiros71 Few causes to explore:
- You are using the Completion suggester (
$wgCirrusSearchUseCompletionSuggester
), this index specialized to do autocompletion is not updated in realtime but via the maint scriptUpdateSuggesterIndex.php
- You may some issues writing to elasticsearch, is this page findable using
Special:Search
? If yes I'm not sure what could have happened and might require more investigations. If no you might get some insights from mediawiki logs indicating why the had page failed to get indexed?
- You are using the Completion suggester (
- Few techniques to help debugging:
- See how the page is indexed: append
?action=cirrusDump
to the page URL - See the query sent to elasticsearch: append
&cirrusDumpQuery=yes
- See how the page is indexed: append
- DCausse (WMF) (talk) 09:41, 19 May 2025 (UTC)
- Thank you David,
- There was no
$wgCirrusSearchUseCompletionSuggester = 'yes';
in LocalSettings.php (despite that, it did work without issues for quite some time). I could not find any instruction in the extension readme as for the necessity of the above (nor that it would be a good idea to run on a cron also). I ended up doing full reindexing:
php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse php extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php --recreate php maintenance/runJobs.php --memory-limit=max
- I also ran the below (as I had yellow status and 1 unassigned shard):
curl -X PUT "localhost:9200/_all/_settings" \ -H 'Content-Type: application/json' \ -d '{"index": {"number_of_replicas": 0}}' # …and make it the default for any new index CirrusSearch creates curl -X PUT "localhost:9200/_cluster/settings" \ -H 'Content-Type: application/json' \ -d '{ "persistent": { "index.number_of_replicas": 0 } }'
- Spiros71 (talk) 10:45, 19 May 2025 (UTC)
- The completion suggester is optional and completion should have worked even without having it enabled, sorry if my comment made it sound like it was required.
- Documentation about the completion is a bit sparse I agree, you might find some in
docs/settings.txt
and Extension:CirrusSearch/CompletionSuggester. DCausse (WMF) (talk) 21:10, 19 May 2025 (UTC)
- Spiros71 (talk) 10:45, 19 May 2025 (UTC)