Help talk:CirrusSearch

Jump to navigation Jump to search

About this board

Mediawiki fulltext search for czech language

21
Svrl (talkcontribs)

Dear Mediawiki CirrusSearch community,

I would like to ask, how one have to set up a Mediawiki Elasticsearch (ES) index, to allow czech fulltext search - in order to following: icu_folding, czech stemmer and lowercase shift. I have installed CirrusSearch, Elastica, Elasticsearch to and around my Mediawiki (MW) installation. I am currently on 1.31 MW version with 5.6.16 ES. I have these versions available because of internal purposes, but there is possiblity of upgrade to MW 1.35 and ES 6.5.4.


I think I have installed (LocalSettings.php reference, run php maintenance/update.php and so on) & configured everything properly according to these steps:

1) Add to LocalSettings.php: $wgDisableSearchUpdate = true;

2) Generating of ES index: php extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php

*while index already created with my settings, this requirement pops out:

--startOver nebo --reindexAndRemoveOk

Started with --startOver, because -reindexAndRemoveOk did nothing, but the same pop out.

3) Remove fromLocalSettings.php: $wgDisableSearchUpdate = true;

4) Bootstrap index: php extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip

5) Bootstrap index: php extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse

6) Add $wgSearchType = 'CirrusSearch';



Estimated index settings (one of my examples without dictionary) using CURL CLI (I know the MW index is more detailed):

curl -X PUT localhost:9200/omkmediawikitest_general_first/  -d '

{

  "settings": {

    "index": {

      "number_of_shards": "1",

      "number_of_replicas": "0",

      "analysis": {

        "analyzer": {

          "czech": {

            "type": "custom",

            "tokenizer": "standard",

            "filter": ["lowercase","czech_stemmer","icu_folding"]

          }

        },

        "filter": {

          "czech_stemmer": {

            "type": "stemmer",

            "name": "czech"

          }

        }

      }

    }

  }

}'

curl -X PUT localhost:9200/omkmediawikitest_content_first/ -d '

{

  "settings": {

    "index": {

      "analysis": {

        "analyzer": {

          "czech": {

            "type": "custom",

            "tokenizer": "standard",

            "filter": ["lowercase","czech_stemmer","icu_folding"]

          }

        },

        "filter": {

          "czech_stemmer": {

            "type": "stemmer",

            "name": "czech"

          }

        }

      }

    }

  }


}'




Questions:

  1. Is even able to set Mediawiki index according to my needs? I think czech Wikipedia have this issue already solved, so there might be solution so it seems: https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:Verze. Is this case solveable by upgrading to MW version 1.35 where equivalent ES version allow these settings automatically?
  2. Should I somehow edit, how the MW index is created to include my own settings? Or should I somehow set the index to only pass MW settings, that will add new settings not overwrite? Or should I add my own settings to the MW indexed index to reindex it again with proper MW settings plus mine settings?
  3. How can I please solve that case & issue?
DCausse (WMF) (talkcontribs)

You can definitely run what is running on cs.wikipedia.org. You can even have a look at the analysis config that is there.

For this first you need to install these elasticsearch plugins:

Then you will have to set your wiki configuration as follow:

$wgLanguageCode = 'cs';
$wgCirrusSearchUseIcuFolding = 'yes';

Note that cs.wikipedia.org does not use ICU by default yet since we haven't investigated what is the set of chars that should not be folded. If you want some chars to be skipped by ICU folding then then you can use wgCirrusSearchICUFoldingUnicodeSetFilter:

// e.g. do not fold åäöÅÄÖ into a or o
$wgCirrusSearchICUFoldingUnicodeSetFilter = "[^åäöÅÄÖ]";

And re-create your index using: php extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now

Svrl (talkcontribs)

Hello, than you for answer! :)


I would like to mention, that there is no word in the README documentation about wikimedia extra plugin, and I maybe missed it in the official web documentation. I have only found after your reply mention in /docs/settings.txt.

When I go to the analysis config, error is given:

{ "error": { "code": "badvalue", "info": "Unrecognized value for parameter \"action\": cirrus-settings-dump|running.", "*": "See https://cs.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes." }, "servedby": "mw1388" }


Is the Wikimedia extra plugin available for version 5.6.16 or should I upgrade to ES 6.5.4 = MW 1.35?

These settings for LocalSettings.php - is there any specific order how to drop them in the configuration?

$wgLanguageCode = 'cs';
$wgCirrusSearchUseIcuFolding = 'yes';
$wgCirrusSearchICUFoldingUnicodeSetFilter = "[^åäöÅÄÖ]";

So after I upgrade and install or just install the missing wikimedia extra plugin, I will:

Do steps in my question 1-6 and continue with suggested:

7. Add suggested configuration by you above: *LangCode*, *SearchIcu*, *SetFilter*

8. And perform this command: php extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now


Is this suggested order correct? Can you please correct me if not?

DCausse (WMF) (talkcontribs)

Sorry for the broken link, it should be: https://cs.wikipedia.org/w/api.php?action=cirrus-settings-dump

I'm also sorry the documentation is very lacking... the CirrusSearch.php has some documentation about the various config options (docs/settings.txt as well).

For earlier versions of the extra plugin the latest of the 5.6 series we built is: https://repo1.maven.org/maven2/org/wikimedia/search/extra/5.6.14/extra-5.6.14.zip (if you want to make it compatible with 5.6.16 you might want to try unzipping it and changing the plugin descriptor file to force the elastic version to 5.6.16 it might just work otherwise you will have to build it from the source and 5.6 branch: https://gerrit.wikimedia.org/r/plugins/gitiles/search/extra/+/refs/heads/5.6).

About your steps, it looks correct to me but if you do start from scratch do step 7 first so that you don't need step 8.

Svrl (talkcontribs)

I will have to probably upload whole implementation including php and MW with ES. The installer doesn't accept customized file as you mentioned to do. Maybe the update of ES ato 6.5.4 and all other SW to equivalent versions would be only outcome. But I'm not sure if viable according to my needs as @TJones (WMF) said. Migh I ask you to your thoughts here?

Svrl (talkcontribs)

Thank you for informations provided. I will try and give proper feedback.


Have a nice day :).

TJones (WMF) (talkcontribs)

@DCausse (WMF), I'm not sure that you can use $wgLanguageCode = 'cs'; and $wgCirrusSearchUseIcuFolding = 'yes'; together. The Czech analyzer from Elastic is "monolithic" so it isn't customizable, and adding the ICU folding filter won't do anything.

If you unpack it or use parts of it, like @Svrl did above, then you can customize it. (Note that the Czech analyzer also includes Czech stop words, which you may or may not want, @Svrl.)

Customization is always complicated.

Svrl (talkcontribs)

Thank you for answer.

But this actually little confused me now. What am I about to do now? Is the usage of extra plugin still viable, but using only as you mentioned one of two settings or just the $wgLanguageCode = 'cs';?

I have feeling that the ICU folding actually did something, because it was able to find some words that wasn't before AFAIK oc.

TJones (WMF) (talkcontribs)

I'm not sure what the right way is to define your own custom analyzer within MediaWiki. @DCausse (WMF), do you know how to do that?

The custom stemmer you defined above does use ICU folding, so if you can get that working, you can do whatever you want/need.

Svrl (talkcontribs)

Addition:

How can I even set an index if the indexing my Mediawiki rewrite all settings of predefined index? The index is not able to be set after Mediawiki indexing neither. Should I somehow edit, how the Mediawiki & CirrusSearch set the index? I thought that suggested solution by @DCausse (WMF) was about not setting the index bymyself, but just let it to be set by Mediawiki indexing with Elasticsearch having installed ICU and Extra search with those LocalSetings.php settings.

Svrl (talkcontribs)

According to some czech guides which I find viable I think it's one of the possbile right approaches. I will do upgrade, check all dependencies and test multiple settings.

Have a nice day .).

DCausse (WMF) (talkcontribs)

Indeed, thanks for pointing this limitation but I think it might still be somewhat effective for the plain field, so stems won't be ICU folded due to the limitation you mention but "plain" words should. Not ideal and a bit misleading but perhaps better than nothing?

DCausse (WMF) (talkcontribs)

Please scratch my comment above, I responded too quickly. ICU folding won't be effective even for plain words, the only place where it will be effective for cs is during completion search of titles from the search box top-right. Sorry!

TJones (WMF) (talkcontribs)

@Svrl: @DCausse (WMF) and I talked about this some more today, and I opened a ticket to look into making this easier. We think it should be possible to do something like this:

$wgLanguageCode = 'custom';
$wgCirrusTextAnalyzer = 'CzechIcuText';
$wgCirrusPlainAnalyzer = 'CzechPlain';

But we need to look into the code more carefully and make sure it's as feasible as it seems. There are also, as always, issues of prioritization and planning, so I don't know when we'll get to it, but you can track progress and make further comments on the Phab ticket.

Svrl (talkcontribs)

Thank you for your last reply.

Let me ask you please, is this action as I metioned above: setting MW, CS, Elastica, ES indexed MW data to be able to search through Czech language the way how it is expected: word stemming, searching through uppercase and lowercase character equivalent and searching through diactitics (c-č, z-ž, a-á) and so on all combined possible?

We were able to prepare and set index the way we think it is right, but we are blocked by the way how the Mediawiki during indexation rewrite defined structure of precreated index. It is not even able to change the settings afterwards. So the Mediawiki indexes itself the way how it wants to be, only using the ES installed icu_folding istead of asci_folding, which is only difference which also affects the search, but in such a unimportant minority.

It it is not able in this moment, let me know please, but we know that czech Wikipedia already solved that issue, so there must be a way so I turned up to you, to the source :).

Svrl (talkcontribs)

Dear community and developers,

Allow me to share with you my recent experience of upgrading and setting up Elasticsearch and Mediawiki in order to reach some search criteria of mine.

I have tried an update of php version to 7.3.x, Elasticsearch to 6.5.4 and Mediawiki to 1.35.1, and installing extra and icu_folding plugins into Es, upgrading composer in Elastica and going through indexing the MW with some kind an expectation of possible further or better configuration, or possible compatibility of new version advantage, but all with no major improvement at all.


Steps I have maintained:

After installation of MW, ES, Elastica, CirrusSearch, installing icu_folding and extra plugin for Es, adding CirrusSearch and Elastica LocalSettings.php Mw definition, into running update.php and updating composer for Elastica I have just done expected steps for indexing:


Add this to LocalSettings.php:

1. wfLoadExtension( 'Elastica' );

wfLoadExtension( 'CirrusSearch' );

$wgDisableSearchUpdate = true;


2. Now run this script to generate your elasticsearch index:

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php


3. Now remove $wgDisableSearchUpdate = true from LocalSettings.php.  Updates should start heading to Elasticsearch.


4. Next bootstrap the search index by running:

php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip

php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse


5. Note that this can take some time.  For large wikis read "Bootstrapping large wikis" below.

Once that is complete add this to LocalSettings.php to funnel queries to ElasticSearch:

$wgSearchType = 'CirrusSearch';


I have to admit I am very lost and I feel like I have not enough information pieces correctly puzzled together. Am I missing some step or point? After some portion of testing and research I have finished with these findings below:

Comparrison of index of mine vs index of cs.mediawiki:

I will mention here two examples of search/index settings. One you have had provided above csmediawiki, second is from my wiki. My question here is, which steps do I have to maintain to be my index/search settings same or simillar to the cs.mediawiki?

From my own wiki - from disc google: https://docs.google.com/document/d/e/2PACX-1vRMnWjIrTsN9Y_V84Cxq4Ys_V899Qup9hfOx0MCYxhYX9-CKGuQ6eyhoN6eqsXy9j7OMFPHfon0-Fzq/pub


Partial indexing?

Is possible that not all pages and categories are indexed correctly or fully? I have just witnnessed a reality that word with "á" character was not found for the first search and after third and second search (refreshing the search page of this keyword) just appeared.

For next time I witnessed similar reality that page with some of these characters: "í, é, á, ý" was not able to be found unless I visited the particular page containing this character in title. Did I did something horribly wrong?

Or is this expected use case? All apologies for my possible knowledge limitation here, but wouldn't it all be structurally explained in some kinda a documentation to this particular use case? Like combination of Elastica, CirrusSearch, Elasticsearch and Mediawiki to make all connected and working together?

Wouldn't it be problem with this Topic:Ud6sblxvbtlzlm16? Or this one Topic:V5iwq5ev1fmwnkq5?

Reindexation of customized index?

Might I ask you which steps do I have to mainting to reach the same or simillar settings as in the cs Wikipedia? My point here is, how can I customize the way how the index is created and filled? Is it correct way to let the Mediawiki be indexed and afterwards change the index and reindex it again?


Standalone ICU server lib/sw?

If I uderstand clearly if I do not want to use icu_folding in Elasticsearch as a plugin, I can use ICU library as server software as available in cs wikipedia "ICU".


Additional post-install settings of CirrusSearch?

Everything what I have missed is post-install configuration for CirrusSearch, because there is no clear explanation of what settings is crucical and what is optional. Example of settings in this ticket: Topic:Ud6sblxvbtlzlm16


All questions have common basic ground which in my humble opinion is that I am just not able to find some kinda a proper documentation and explanation of what is optional and what is crucial, thinking about very strict README files or even more strict Mediawiki official documentation for extensions.

I am sorry for this long post and I thank you for your time and effort,

Svrl

TJones (WMF) (talkcontribs)

Hi @Svrl—We've been talking about this, and @DCausse (WMF) thinks he has an approach that may help, and I'll try to add some information specific to Czech. It'll take a few days to test and write it up, but we're hoping to have a real reply on Friday.

TJones (WMF) (talkcontribs)

Hi @Svrl@DCausse (WMF) figured out a way you can insert your own custom analysis config into CirrusSearch. There is a paste on Phab with the code to add to your LocalSettings.php file. After I updated my LocalSettings.php file, I reindexed with mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now and the expected configuration was created.

Note that CirrusSearch has a final round of analysis customization code still runs on your config. It checks $wgCirrusSearchUseIcuFolding to control whether ASCII folding gets changed to ICU folding, for example.

The customization also enables the homoglyph_norm filter if the extra plugin is available, so you will see that in your config. (The homoglyph_norm filter tries to convert mixed-script tokens to single-script tokens when it can; it keeps the original mixed-script token, too, which means it is not compatible with some other filters... only aggressive_splitting that we know of, see Phab T268730.) If having homoglyph_norm is a problem, we can look at ways to disable it (currently the config to do so is private within the relevant code).

I'm not 100% sure about whether lowercase_keyword is used correctly in the paste example, but I'll talk to DCausse about it on Monday and we will get back to you here if there is a problem.

TJones (WMF) (talkcontribs)

DCausse updated the paste to include both lowercase_keyword and plain, and added comments explaining why. It should be good to go.

Svrl (talkcontribs)

Hello, I would like to say that I very appreciate your work and effort generated for this issue. Thank you.


I will definitely try out configuration you have provided on Phabricator here.

I have only one question. What implementation details towards to needed software, plugins and versions are in need?

May I suppose:

Mediawiki 1.35,

CirrusSearch and Elastica in relevant version,

Elasticsearch 6.5.4,

Extra plugin in relevant version,

NOT icu folding (according to the configuration definition),

Is that all? May I ask you to complete the list please? It would be really helpful for me to understand whole scope of the thought the configuration is refereing to.


Thank you!

Best regards,

Svrl

DCausse (WMF) (talkcontribs)

Hi,

you need:

  • MediaWiki and its extensions: CirrusSearch and Elastica
  • elasticsearch 6.5.4
  • the extra plugin (latest version should be: 6.5.4-wmf-11)
  • the analysis-icu plugin is required

Hope it helps,

David.

Reply to "Mediawiki fulltext search for czech language"

How can I search for all the work of a particular artist? Search box is useless.

2
2601:643:8880:160:9181:5380:4E4D:62BB (talkcontribs)

How can I search for all the work of a particular artist?  Search box is useless.

Speravir (talkcontribs)

Do you speak of media in Commons? If so:

  • First you could search for a template {{Creator}} for the artist you search for. Type in search: creator: artist name or, if this lists too many results, creator: "artist name".
  • Then with the known creator template search for file: hastemplate:"creator:artist name" (strangely, this is apparently case sensitive for "artist name", which differs from the default behaviour of this filter).
  • Also, for every artist there should be a category which could be explored.

If want to search for media that is from an artist, but does not has the Creator template search for file: insource:"artist name" -hastemplate:"creator:artist name". If someone does not have a creator template then leave out the last part.

Reply to "How can I search for all the work of a particular artist? Search box is useless."

several corrections/improvements in description of regexps

9
Lustiger seth (talkcontribs)

hi!
i'm very used to regular expressions, but it's very hard for me to understand the corresponding paragraph here. in particular:

  1. in the sentence "These return much much faster when [...]" it's not clear to me, what "These" refers to.
  2. "All regexp searches also require that the user develop a simple filter to generate the search domain for the regex engine to search:"
    • it should be "the users develop" or "the user develops".
    • the examples following that sentences should make clearer that one part creates the search domain (if i understood it correctly).
    • in the first example: what is the difference of searching via
    insource:"debian.reproducible.net" insource:/debian\.reproducible\.net/ or via
    insource:"debian.reproducible.net"? if there is no difference, then the example is not good.
  3. after the examples there's some text about an example with "FULLPAGENAME". it's not clear to me, whether FULLPAGENAME is meant as a meta syntactic variable or literarally.
  4. what is an "HTML timeout"? does it mean http/server timeout?
  5. the given link in section "Metacharacters" should be updated to https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html, right?
  6. the example
    • /"literal back\slash"/ is as good as /literal back\/slash/
    seems wrong to me. shouldn't it be
    • /"literal back\slash"/ is as good as /literal back\\slash/ ?
  7. the typical line break characters "are not reserved for matching a newline".
    • so how do i search for a string that does not contain a newline?
    • what happens, if i use \r or \n? are they treated as literal r and n respectively?
  8. "The number # sign means something": ok, but what does it mean?

-- seth (talk) 09:34, 12 December 2020 (UTC)

Speravir (talkcontribs)

Ad 1: "These" refers to regexp searches. I think the paragraph before has been rephrased, but it was overlooked, how the next paragraph starts.

Ad 2: Compare with other parts of the help text. I think the singular form "the user develops" would be right here. Everyone can edit the text. We just need a translation admin afterwards.
Someone thought it would be clear from the context that the search domain is the first part of every example. Marking the search domain is a good idea, though, but how to? Both italic and bold are already used.
insource:"debian.reproducible.net" is an indexed based search which has two consequences: It is case insensitive, and the period is a grey space character meaning that also occurrences with e.g. space in between would be found. Cf. section Words, phrases, and modifiers.

Ad 3: I thought it would be clear from the example (I did not add this) that a literal {{FULLPAGENAME}} was meant. And the folowing note tells you that this does not work in the search bar, but in links only (I assume using templates like en:Template:Search link.)

Ad 4: I guess you are right. It’s the timeout after about 20 seconds also being warned in the warning box slightly above this text.

Ad 6: Yes, this is wrong.

Ad 7: The developers decided for a reduced function amount. (Auf Deutsch sage ich immer: „Die Regex-Suche ist leider kastriert.“). This includes that you cannot search for newlines. There is this one example, but this works only under certain circumstances. If you add \r or \n in your search query they will from my understanding be literally searched.

Speravir (talkcontribs)

I now made some fixes regarding items 2, 4 an 6, and I also marked the search domains in the one example block (item 2; I also fixed there another mistake I noticed).

@Shirayuki: Because you are in most cases the translation admin who marks new versions for translation: In my opinion the text block 286 (the one referred to in item 1) should be inserted into text block 288. How to to it in best way? Splitting T:288? It has 2 sentences now, T:286 should be inserted between them, and afterwards the text has to be slightly adjusted (note that I added another sentence). What I think of is:

An "exact string" regexp search is a basic search; it will simply "quote" the entire regexp, or "backslash-escape" all non-alphanumeric characters in the string. Regexp searches return ''much much'' faster when you limit the regexp search-domain to the results of one or more index-based searches. All rexexp searches block some server capacity for the time of search query. Therefore, all regexp searches also require that the user develops a simple filter to generate the search domain for the regex engine to search (in examples index based search domain is marked bold, regexp part marked in italics):

Slightly an issue is that a bit later the info regarding adding an indexed based search domain is also pointed out in other words (292, 293, 565), but I think there are already some more repetions of this important info in the whole help text, anyway.

This post was hidden by Speravir (history)
Lustiger seth (talkcontribs)

thanks for your answer and some corrections! ad 1: i tried to solve that now. your hidden(?) solution is also ok for me. ad 2--6: yes, that's better now. :-) ad 7: so it is not possible to search for two strings that have to be written on the same line (in a given order, but with arbitrary chars between them), right? (in perl syntax: /foo.*bar/ or /foo(?-s:.*)bar/)

Speravir (talkcontribs)

/foo(?-s:.*)bar/ will not work, because modifiers are not supported (with exception of i after the closing slash).

/foo.*bar/ will work, but will match on

  • foo bar (and an almost unlimited number of spaces in between, should only be limited by maximum of wiki text.)
  • foo (who the heck had the idea to use this as example placeholder) bar
  • foo\n
    bar (note: not a literal \n, just marks the line break here; matches also an almost unlimited number of line breaks in between).

That’s what I meant with you cannot search for newlines.

BTW will also /foo *bar/ work, but not /foo\s*bar/, at least not reliable. A real example from Dewiki I had been asked for:

(The ping for Shiruyuki you can see above did not work. In the hidden contribution I tried it with an explicit signature, but this did not work either. Because this message does not contain any other information I have hidden it.)

Speravir (talkcontribs)
DCausse (WMF) (talkcontribs)

Ad 5: updated, thanks

Ad 8: The number sign is part of the Lucene RegExp syntax, it denotes the empty language which is not useful for the purpose of insource://.

Lustiger seth (talkcontribs)

ad 8: ok, thanks, i added the reference there. but i'm not sure whether i used the right syntax (with all that translate stuff). so i would be great, if somebody would check.

Reply to "several corrections/improvements in description of regexps"

Boleh sy guna bahasa melayu?

4
113.210.117.21 (talkcontribs)

Bolehkah sy guna bahasa melayu. Krn sy tak faham bahasa inggriss..tkasih.

TJones (WMF) (talkcontribs)

EN: We can try to communicate with Google Translate. We can ask others to help check our translations.

MS: Kami boleh cuba berkomunikasi dengan Terjemahan Google. Kami boleh meminta orang lain untuk membantu memeriksa terjemahan kami.

TJones (WMF) (talkcontribs)

@Tofeiku, @Bennylin, and @Yosri: can you help translate and/or check Google's translations? Thanks!!

Tofeiku (talkcontribs)

@TJones (WMF) Your Google translation is fine and understandable.

Reply to "Boleh sy guna bahasa melayu?"

cirrus search not giving suggestion while typing in search box

2
Pooja2425 (talkcontribs)

Hi Iam using these

MediaWiki 1.35.1
PHP 7.4.15 (apache2handler)
MySQL 8.0.25
Elasticsearch 6.5.4

Done these steps

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php

php /extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip

php /extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse

Now $wgSearchType = 'CirrusSearch';


After that i run all the jobs. Now iam able to search pages into my search engine But, its not showing auto suggestion while i am typing like default mediawiki search engine, iam getting result only when i hit enter in search box,


pls suggest , if i missed any of the step, kindly check all above steps and let me know further steps to enhance my search engine..

169.149.246.113 (talkcontribs)

showing error in console

{"error":{"code":"toomanyvalues","info":"Too many values supplied for parameter \"namespace\".

The limit is 50.","limit":50,"lowlimit":50,"highlimit":500,

Reply to "cirrus search not giving suggestion while typing in search box"

Cirrus Search SearchEngine not giving results

3
Pooja2425 (talkcontribs)

Iam using these

MediaWiki 1.35.1
PHP 7.4.15 (apache2handler)
MySQL 8.0.25
Elasticsearch 6.5.4

Done these steps

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php

php /extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip

php /extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse

Now $wgSearchType = 'CirrusSearch';


then Indexed 10 pages ending at 66507 at 185/second

after i run php maintenance/runJobs.php the jobs which are into php maintenance/showJobs.php --group.

Some i unable to run , then i directly deleted them from job table.

Now since there are no job in the queue.


Should i have to do connect with db, if yes , pls suggest steps.

Bcs i unable to get any result from my wiki search engine.

Pooja2425 (talkcontribs)

When Checking Response in console getting error.

{"error":{"code":"toomanyvalues","info":"Too many values supplied for parameter \"namespace\".

The limit is 50.","limit":50,"lowlimit":50,"highlimit":500,"docref":"See http://wikipoc.equinor.com/wiki135/api.php for API usage.

Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce>

for notice of API deprecations and breaking changes."}}

Pooja2425 (talkcontribs)
Reply to "Cirrus Search SearchEngine not giving results"

Searching Wikipedia on Nintendo Switch

1
86.141.193.112 (talkcontribs)

I could search Wikipedia on Nintendo Switch, but it takes very long (2+ min) to appear search by type dialog. For 2 mins, it appears three dots (●●●) below advanced search and above search results.

Reply to "Searching Wikipedia on Nintendo Switch"

Issue with CirrusSearch

1
Summary by DCausse (WMF)
Pooja2425 (talkcontribs)

Iam using these

MediaWiki 1.35.1
PHP 7.4.15 (apache2handler)
MySQL 8.0.25
Elasticsearch 6.5.4

But when trying to use php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php , facing below issue.

I havent changes anything in my Localsettings.php just used and not using elasticsearch Php Client yet..

wfLoadExtension( 'Elastica' );

wfLoadExtension( 'CirrusSearch' );

$wgDisableSearchUpdate = true;

indexing namespaces...

PHP Fatal error:  Declaration of Elasticsearch\Endpoints\Indices\Exists::getParamWhitelist() must be compatible with Elasticsearch\Endpoints\AbstractEndpoint::getParamWhitelist(): array in /data/www/html/wiki135/extensions/Elastica/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Endpoints/Indices/Exists.php on line 45

please suggest.

Cirrus Search settings with elastic search

1
Summary by DCausse (WMF)

closing (asked twice)

143.97.2.35 (talkcontribs)

I have installed Elastic Server on my mediawiki project.

Extension elastica and Cirrus Search also insatlled by using

wfLoadExtension( 'Elastica' );

wfLoadExtension( 'CirrusSearch' );

Iam able to see both extensions and elastic search in special:version.


But Now unable to configure it with my wiki existing search engine.

pls suggest so that i am able to use search engine as elastic search engine.

thanks!

Elasticsearch v7.13.1 in mediawiki 1.35.1

14
143.97.2.35 (talkcontribs)

Hi , i have installed elasticsearch version 7.13.1 , into my mediawiki app 1.35.1 and Php 7.4.15 . Elastic search is installed properly and running .

now i have added cirrus Search and Elastica extensions.

Currently i am getting search engine result as MW default Search engine . No changes into my search engine.

should i downgrade the elastic search version or it will work fine.???


Please suggest me ,

DCausse (WMF) (talkcontribs)

Hi,

If you don't see any errors it is perhaps because CirrusSearch is not yet activated, please follow the steps in the README files:

  • run the maintainance script to create and populate the indices UpdateSearchIndexConfig.php and ForceSearchIndex.php
  • activate CirrusSearch setting $wgSearchType = 'CirrusSearch';

If you see errors please share them so that we can better you.

ZPapierski (WMF) (talkcontribs)

Hi,


One more thing to add - Cirrussearch isn't compatible with Elasticsearch 7.* - I recommend 6.* version, I personally use 6.5.4.


Hope that helps.

DCausse (WMF) (talkcontribs)

Sorry (I misread your question) as ZPapierski pointed out CirrusSearch with MediaWiki 1.35 is not compatible with elasticsearch 7.13.1 please check the README file for information about compatible versions.

Pooja2425 (talkcontribs)

Thanks for your rely!!

I think Cirrus Search is not Starting, I am getting MW default search engine results.

MediaWiki 1.35.1
PHP 7.4.15 (apache2handler)
MySQL 8.0.25
Elasticsearch 6.5.4

Done these steps

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php

php /extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip

php /extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse


Now $wgSearchType = 'CirrusSearch';


Not getting further what to do , to start my search engine as elastic search engine.

Please suggest me.

DCausse (WMF) (talkcontribs)

If all the previous steps worked without error after setting $wgSearchType = 'CirrusSearch'; CirrusSearch should be active.

You can verify by searching on your wiki and adding &cirrusDumpQuery at the end of the search results URL, it should display a JSON document reprensenting the query that CirrusSearch is sending to elasticsearch. If not it means that $wgSearchType = 'CirrusSearch'; is not taken into account or that the CirrusSearch extension is not loaded properly (make sure that it is loaded using wfLoadExtension).

Pooja2425 (talkcontribs)

Thanks for repling !!

Yes i am getting Json document after adding &cirrusDumpQuery .

But getting this into freshly installed test wiki.

In my actual wiki application, when i was trying ,,

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php

php /extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip

php /extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse

Now $wgSearchType = 'CirrusSearch';


indexing namespaces...

PHP Fatal error:  Declaration of Elasticsearch\Endpoints\Indices\Exists::getParamWhitelist() must be compatible with Elasticsearch\Endpoints\AbstractEndpoint::getParamWhitelist(): array in /data/www/html/wiki135/extensions/Elastica/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Endpoints/Indices/Exists.php on line 45


Since elastic search have installed, on root so test wiki working fine But my actual mediawiki application showing error.

pls suggest...

DCausse (WMF) (talkcontribs)
143.97.2.35 (talkcontribs)

Thanks for your help!!

Now this problem is resolved ,But Further when I have used these.

1)php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php

2)php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip

3) php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse


then Indexed 10 pages ending at 66507 at 185/second


But when i try to search anything in my search engine its not giving any result.

Since i have used yet above 3 steps only. Should i have to use Bootstrapping large wikis methods???

DCausse (WMF) (talkcontribs)

Hi,

CirrusSearch relies on the MediaWiki JobQueue for processing its updates, please be sure it's setup properly and that no jobs are still in the queue: php maintenance/showJobs.php --group

If it's empty please check mediawiki logs and elasticsearch logs there might be error messages there. If it's not empty please see Manual:Job_queue.

169.149.213.185 (talkcontribs)

Thanks for reply,

Please suggest, what are the setting need to do with my DataBase.

Pls suggest!! Bcs not getting search result .

This post was hidden by Pooja2425 (history)
Pooja2425 (talkcontribs)

Hi Since i have Run jobs , Now they is no job in the queue.

PoolCounter and Bootstrapping large wikis methods are not tried yet.

Should i have to implement all also . Bcs ima not getting results form search engine

Pls suggest !!!

Reply to "Elasticsearch v7.13.1 in mediawiki 1.35.1"