Help talk:CirrusSearch

Jump to navigation Jump to search

About this board

Synonym Token Filter and Stop Token Filter

6
Gpontespc (talkcontribs)

How do I configure synonym token filter and stop token filter? Help me please


    curl -X PUT "localhost:9200/kb_mediawiki_general_first/_settings?pretty" -H 'Content-Type: application/json' -d'                                                                                                             

    {                                                                                                                                                                                                             

    ┆   "settings": {                                                                                                                                                                                             

    ┆   ┆   "analysis": {                                                                                                                                                                                         

    ┆   ┆   ┆   "filter": {                                                                                                                                                                                       

    ┆   ┆   ┆   ┆   "my_stop": {                                                                                                                                                                                   

    ┆   ┆   ┆   ┆   ┆   "type": "stop",                                                                                                                                                                           

    ┆   ┆   ┆   ┆   ┆   "stopwords_path": "analysis/stop.txt"                                                                                                                                                     

    ┆   ┆   ┆   ┆   }                                                                                                                                                                                             

    ┆   ┆   ┆   }                                                                                                                                                                                                 

    ┆   ┆   }                                                                                                                                                                                                     

    ┆   }                                                                                                                                                                                                         

    }'


    curl -X PUT "localhost:9200/kb_mediawiki_content_first/_settings?pretty" -H 'Content-Type: application/json' -d'                                                                                             

    {                                                                                                                                                                                                             

    ┆   "settings": {                                                                                                                                                                                             

    ┆   ┆   "analysis": {                                                                                                                                                                                         

    ┆   ┆   ┆   "filter": {                                                                                                                                                                                       

    ┆   ┆   ┆   ┆   "my_stop": {                                                                                                                                                                                   

    ┆   ┆   ┆   ┆   ┆   "type": "stop",                                                                                                                                                                           

    ┆   ┆   ┆   ┆   ┆   "stopwords_path": "analysis/stop.txt"                                                                                                                                                     

    ┆   ┆   ┆   ┆   }                                                                                                                                                                                             

    ┆   ┆   ┆   }                                                                                                                                                                                                 

    ┆   ┆   }                                                                                                                                                                                                     

    ┆   }                                                                                                                                                                                                         

    }'


This way doesn't seem to work :/

Gpontespc (talkcontribs)

I'm using a version MediaWiki 1.30.0, PHP 7.0.27, ElasticSearch 5.4.3, CirrusSearch 0.2 and Elastica 1.3.0.0

DCausse (WMF) (talkcontribs)

Sadly there are no ways to configure this without modifying the source code of CirrusSearch and changing such settings directly inside elasticsearch is not possible on an existing index. Please don't hesitate to file a feature request using Phabricator.

Gpontespc (talkcontribs)

Which file needs to change?

DCausse (WMF) (talkcontribs)
Gpontespc (talkcontribs)

Thank You!!!!

Reply to "Synonym Token Filter and Stop Token Filter"
TheDJ (talkcontribs)

I'm pretty sure from memory that results from wgContentNamespaces get boosted in the search results, but this doesn't seem to be mentioned anywhere, including in the page weighting section.. Something we might want to add ?

DCausse (WMF) (talkcontribs)

This configurable but in general (at least for WMF wikis) wgNamespacesToBeSearchedDefault namespaces are slightly overboosted. Because this is all configurable there are no single and simple answer to this question, it is wiki dependent. I'm not against adding something about that but I'm not sure where.

Cpiral (talkcontribs)
Reply to "content namespace weighting"

php updateSearchIndexConfig.php is giving error

5
PushpendraJadaun12 (talkcontribs)

While installing CirrusSearch Extension after updating mediawiki from 1.28 to 1.32 version when I run command(following README):

php updateSearchIndexConfig.php

I get this error : Elastica\Exception\Connection\HttpException from line 187 of $MW_INSTALL_PATH\extensions\Elastica\vendor\ruflin\elastica\lib\Elastica\Transport\Http.php: Couldn't resolve host

I tried searching but couldn't find luck.Please let me know what I need to do to resolve it ?


Note : Versions of CirrusSearch,Elastica and mediawiki is same 1.32.


EBernhardson (WMF) (talkcontribs)

Couldn't resolve host suggests that whatever hostname it's finding for the elasticsearch cluster isn't able to be resolved. Are you using $wgCirrusSearchServers? Or one of the more complex configuration options? What format are you using to specify $wgCirrusSearchServers?

PushpendraJadaun12 (talkcontribs)

Problem is solved.Actually elastic server was not running properly it was getting closed after some time which was causing this error.

Bruceillest (talkcontribs)

Can you please elaborate on how you fixed it. I'm having the same issue and I'm not familiar with elastic or CirrusSearch. File locations will also be helpful.


Thanks

118.148.199.239 (talkcontribs)

me too Bruceillest


Reply to "php updateSearchIndexConfig.php is giving error"
Ahsan9991 (talkcontribs)

I have installed elastic search and run it using console.

After that I installed Elastica, did the composer thing.

After that I installed Cirrussearch.

php CirrusSearch/maintenance/updateSearchIndexConfig.php it generates the indexes.

But still the search it same as before, no full text search or 2nd word search in title.

Any body knows why?

Cpiral (talkcontribs)
Reply to "Results not showing!"
Errezeta89 (talkcontribs)

when i search with wildcard the search results inside the pages are not highlighted. only the title words are highlighted. is there some option to be activated for this?

DCausse (WMF) (talkcontribs)

It depends on the highlighter you use. On WMF installations we use https://github.com/wikimedia/search-highlighter which tries to highlight all the words it can.

The default highlighter provided by elasticsearch will do it on a best effort basis. Basically if the wildcard expression expands to more than 1024 terms then it's likely that some words won't be highlighted. This is due to the nature of an inverted index. The fact that you see highlights in the title is most probably due to the fact that the dictionary of words in your titles is less diverse than the one on the body.

In short I suggest to install https://github.com/wikimedia/search-highlighter and enable the following options:

$wgCirrusSearchUseExperimentalHighlighter = true;
$wgCirrusSearchOptimizeIndexForExperimentalHighlighter = true;
This post was hidden by 151.0.189.140 (history)
Errezeta89 (talkcontribs)

it works! thank you

Reply to "search results not highlighted"
Saharma (talkcontribs)

I have followed the installation procedure, searching will result in the files I have uploaded however, I am still unable to search inside the uploaded document themselves.

TheDJ (talkcontribs)

what kind of documents ? I think we only support searching inside djvu and pdf files.. Other document types don't have mediahandlers that support extracting textual information from the documents for search indexing purposes as far as I'm aware.

Saharma (talkcontribs)

A pdf file. Generated from a word document. This is the output from ?action=cirrusDump

Note that "file_text": false


[{"_index": "my_wiki_general_first","_type": "page","_id": "5","_version": [],"_source": {"version": 9,"wiki": "my_wiki","namespace": 6,"namespace_text": "File","title": "Pdf doc.pdf","timestamp": "2019-08-15T12:16:24Z","create_timestamp": "2019-08-14T07:25:58Z","category": [],"external_link": [],"outgoing_link": [],"template": [],"text": "An invalid user was specified to permission testing to embed this PDF. This is a PDF Document.","source_text": "<pdf>File: Pdf doc.pdf</pdf>\n\nThis is a PDF Document.","text_bytes": 53,"content_model": "wikitext","language": "en","heading": [],"opening_text": null,"auxiliary_text": [],"defaultsort": false,"file_text": false,"file_media_type": "OFFICE","file_mime": "application/pdf","file_size": 6499,"file_width": 0,"file_height": 0,"file_bits": 0,"file_resolution": 0,"display_title": null,"redirect": [],"incoming_links": 0}}]

TheDJ (talkcontribs)

When i get to a desktop ill see if i can find if maybe this requires setting a specific config variable or something

Reply to "Search in a document"

autocomplete of title page on search bar

2
Errezeta89 (talkcontribs)

when i start write a letter on search bar it's not show the pages that start with that letter. is there a option for this?

Errezeta89 (talkcontribs)

i resolved with 'classic search' in the preferences

Reply to "autocomplete of title page on search bar"
85.76.139.129 (talkcontribs)

I originally wrote the following here at the English Wikipedia, but thought I should report this here, too:

The deepcat example (deepcat:"musicals") seems to be off. For some reason, if the category's first letter is not capitalized, the search doesn't seem to search in the subcategories. First see en:Category:Musicals and then compare these searches: deepcat:"musicals" vs. deepcat:"Musicals". The latter search also gives the following warning: "A warning has occurred while searching: Deep category query returned too many categories". Maybe we should change the example search and category to a smaller category with correct capitalization, for example to deepcat:"Musicals by topic"?

PerfektesChaos (talkcontribs)

I would regard different behaviour on musical and Musical as a bug.

The category name is to be normalized; that does mean: It is to be brought into the identical format which is expected by a database query:

  • First letter capitalized.
  • All _ to be turned into spaces.
  • Some additional types of spaces (and tab) turned into standard spaces, and some invisible codes deleting.
  • Multiple spaces turned into single spaces, trimming both sides.
TheDJ (talkcontribs)
Reply to "Deepcat(egory)"
Brickscrap (talkcontribs)

I've installed CirrusSearch and ElasticSearch and all running fine, however any new pages don't appear to get added into the search index? Anything that existed before install is fine, but everything added since doesn't search. Am I missing something obvious? There are no jobs in the queue for it as far as I can see.

4omni (talkcontribs)

The table in section "Explicit sort orders" (more precise: the descriptions of the possible sort orders) should be translatable but at the moment it isn't.

Shirayuki (talkcontribs)

Yes Done

4omni (talkcontribs)

Thanks. Yes Translated to German

Reply to "Not translatable"