Help talk:CirrusSearch/2017
Add topic| This page used the Structured Discussions extension to give structured discussions. It has since been converted to wikitext, so the content and history here are only an approximation of what was actually displayed at the time these comments were made. |
This talk page is about the usage of CirrusSearch. See installation, or development, or bugs, or Wikipedia for related discussions. |
Special page : disable text and tables/templates
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
I'd like to disable the text (You may create the page "xx" on a blank page, request its creation or create it using the New Entry Creator!) etc., as well as the tables/templates so that I can see the entries where the term xx, without a page of its own yet, can be seen directly. Thank you so much in advance. JMGN (talk) 19:11, 17 January 2017 (UTC)
- I think that's a special feature at Wiktionary. Do you want to hide wikt:en:MediaWiki:Searchmenu-new ? That can be done using your account's "CSS" pages. Whatamidoing (WMF) (talk) 20:55, 17 January 2017 (UTC)
- 'That can be done using your account's "CSS" pages'
- First of all, thanks for replying. could you please give me some step-by-step guidance on how to proceed to take that away?
- Thank you in advance. JMGN (talk) 21:46, 17 January 2017 (UTC)
- I don't know the exact code needed, although I can tell you that you'd want to put the code in wikt:en:User:Backinstadiums/common.css (which you'd have to create – it's similar to creating a new Wiktionary entry, but the interface looks a bit different).
- @Mr. Stradivarius or @PrimeHunter, can you tell us the necessary line of code to suppress this MediaWiki message? Whatamidoing (WMF) (talk) 19:56, 19 January 2017 (UTC)
- As Whatamidoing says, the request appears to be about English Wiktionary searches like https://en.wiktionary.org/w/index.php?search=Example+text&title=Special:Search. You can hide "See whether another page links ..." and "You may create the page ..." with this in your wikt:Special:MyPage/common.css:
.mw-search-createlink {display: none;}- You can hide "These entry templates ..." and the rest with this:
#searchmenu-new-preload {display: none;}PrimeHunter (talk) 22:18, 19 January 2017 (UTC)- @PrimeHunter Hi, where should I insert those chunks of code exactly? In 'create source'? Thank you in advance. JMGN (talk) 22:49, 19 January 2017 (UTC)
- Yes, if you don't already have a page there then click "Create source', insert the code and save. PrimeHunter (talk) 23:03, 19 January 2017 (UTC)
- @PrimeHunter O.k. Should I paste them one after the other on a different line?
- .mw-search-createlink {display: none;}
- searchmenu-new-preload {display: none;} JMGN (talk) 23:17, 19 January 2017 (UTC)
- They hide different things. You can paste either line or both. Use separate lines if you choose both. PrimeHunter (talk) 23:31, 19 January 2017 (UTC)
- Backinstadiums, it looks like you did this yesterday. Is it working the way that you want now? Whatamidoing (WMF) (talk) 06:49, 22 January 2017 (UTC)
- @Whatamidoing (WMF) Yes, it is. Thank you so much, 'cuz it does exactly what I wanted. JMGN (talk) 08:05, 22 January 2017 (UTC)
Arabic script diacritics to avoid homographs
[edit]Hi, I'd like to know how to enable searching Arabic terms using diacritics. Thank you in advance JMGN (talk) 08:20, 21 January 2017 (UTC)
- Hi, thanks for replying. I do not understand you. What are you going to send me? JMGN (talk) 18:32, 22 January 2017 (UTC)
- thanks for your helping me. I look forward to good news then. JMGN (talk) 18:35, 22 January 2017 (UTC)
- OK WHAT WHERE YOU ASKING? 63.143.114.164 (talk) 18:39, 22 January 2017 (UTC)
- Currently, it is not possible to search an arabic term with diacritics. For example, to look up عَمِلَ you have to use the basic form عمل. Yet, some entries appear in ONLY in their ROOT form, so the only way of finding لَقًى “offal” is to look up its root, 'ل ق ي', due to the lack of a definition for لَقًى in the entry for the form لقى, in which only لُقًى appears. JMGN (talk) 18:46, 22 January 2017 (UTC)
- I'm not a developer. CirrusSearch has Arabic analyzers and one may customize it. I suppose many others may have this problem solution to share?
- Just for grins, how big of a loss is it, the loss of diacritics? In other words, how many words does the (search-)able root have? Stemming can also add many imprecise results. Does page ranking fix it? Cpiral (talk) 05:46, 23 January 2017 (UTC)
- Hi, I do not know anything about how CirrusSearch works, and unfortunately the page 'Arabic analyzers' links to makes no sense to me. I'm just a common user. JMGN (talk) 10:46, 23 January 2017 (UTC)
- @Backinstadiums is your question related to search on english wiktionary?
- If yes we will activate ICU folding very soon that should help dealing with non latin letters on english wikis such as wiktionary.
- But reading your question it's unclear to me what you'd like to achieve, would it be possible to have a list of queries and the expected match, e.g.
- - searching for عمل should match the word عَمِلَ
- Thanks! DCausse (WMF) (talk) 18:53, 23 January 2017 (UTC)
- Hi,
- Yes, I'm on English version of wiktionary.
- First I'd like to focus on the second issue, namely that some entries appear in ONLY in their ROOT form, so the only way of finding لَقًى “offal” is to look up its root, 'ل ق ي', due to the lack of a definition for لَقًى in the entry for the form لقى, in which only لُقًى appears.
- I do not know what 'ICU folding' is, so where can I find more info.? JMGN (talk) 19:03, 23 January 2017 (UTC)
- ICU folding will remove diacritics in almost every letters supported by unicode.
- Could clarify what is the search query you used and what is the document you expect to see in the results?
- The current behavior is:
- - I search for لَقًى I get ل ق ي as the first and only result.
- But you would like:
- - I search for لَقًى I want to see لقى in the search results.
- Is that right? DCausse (WMF) (talk) 19:51, 23 January 2017 (UTC)
- Hi, I choose لَقًى on purpose because it's easy to see that, having no entry of its own, it must be looked for typing its root. Yet, things get complicated with
- كُتُبِيّ (kutubiyy, "bookseller”). In an arabic text, you are to find its basic form without diacritics, that is كتبي, but if you look up for كتبي you will never find this term with this specific meaning, but one of the many possible homographs of it . You cannot type its diacritics since you do not know them yet, so you have to carry out a second search of its root 'ك ت ب'.
- Nevertheless, things may get way more complicated. For instance, try to find داوا (dāwā, “to treat a disease”).
- Hopefully it's been clarified, otherwise let me know. JMGN (talk) 20:11, 23 January 2017 (UTC)
- I'm have no experience with Arabic lexicography, so forgive me is this is a silly question, but should كُتُبِيّ be listed under كتبي ? I see other cases, like آجر, where 9 variants with diacritics are listed. Looking at a random selection of words from here, I noticed that most have verb forms but no noun forms. Is that to be expected?
- So, I'm wondering if, in the case of كُتُبِيّ, it's a problem of incompleteness of Wiktionary. If we turn on ICU folding for English Wiktionary, searching for كُتُبِيّ should give كتبي as a result, but there won't be an entry for it there. On the other hand, searching for كتبي should match the existing instance of كُتُبِيّ under the root form—which would be helpful.
- Thanks for the interesting question, and for helping us learn more about Arabic. TJones (WMF) (talk) 20:35, 23 January 2017 (UTC)
- كُتُبِيّ should be under كتبي in the classical arabic lexicography (paper dictionaries/lexicons). Yet, the real issue is finding كُتُبِيّ with the meaning 'bookseller' regarless of where it's been added. Again, داوا (dāwā, “to treat a disease”) appears on the entry for دواء, a noun which theoretically has the same root, but then again it's sometimes IMPOSSIBLE to know what root a certain term has. JMGN (talk) 20:43, 23 January 2017 (UTC)
- @Backinstadiums, I'm also not entirely sure what you are trying to figure out, but I'd like to help. Are you trying to search English Wiktionary, or was that just an example?
- Searching for لَقًى on English Wiktionary does give one result, ل ق ي, but it is at the bottom of the page, after all the template links. I've always found that confusing, especially since the results are off the page in many cases. Is that not what you expected? I'm not sure, since you indicated you had to look up the root ل ق ي.
- @Cpiral's link to the Arabic analyzer makes me wonder whether you are running your own instance of Mediawiki, instead of just searching Wiktionary. Is that the case?
- In any event, I'm happy to explain some of what's going on there, however keep in mind it only applies to wikis that have Arabic as their main language (e.g., Arabic Wikipedia or Arabic Wiktionary), and does not apply to English Wiktionary, which uses the English Analyzer (more on that in a moment). The key part is the list at the end of the code block at before all the closing braces. The first step is lowercasing (which doesn't apply to Arabic, but does apply to any, say, Latin or Cyrillic or other text with a case distinction that happens to be present). Next comes the removal of Arabic stop words, which are words that don't carry much meaning. I can't find a link to the default list that Elasticsearch uses, but in English it would include "the, a, an, of, to, in, it", etc. The next step is Arabic "normalization", which has some Arabic-specific rules, including the removal of at least some Arabic diacritics (the harakat). "Keywords" are included, but the list is empty by default. Last comes stemming, which is an attempt to rewrite a word in such a way that related forms end up being the same. Stemming uses a combination of rules and dictionaries, and is far from perfect, as @Cpiral mentioned. Good stemming helps more than it hurts, though.
- On English Wiktionary, we use a similar process, but for English. This doesn't do much for Arabic text. We use ASCII-folding, which "folds" most accented Latin characters into their non-accented counterparts. Both the original and folded versions are indexed, so searching for [https://en.wikipedia.org/w/index.php?title=Special:Search&profile=default&fulltext=Search&search="resume" "resume"] (with quotes) matched résumé, because the folded version (resume) was also indexed, but searching for [https://en.wikipedia.org/w/index.php?title=Special:Search&profile=default&fulltext=Search&search="résumé" "résumé"] (with quotes) only matches the accented version (and so gets many fewer results). The interactions here can get pretty complex, and I'm just trying to give an overview.
- ICU-folding, which @DCausse (WMF) mentioned, is similar, but applies to non-Latin characters. Documentation of the exact behavior for all characters is hard to come by, but it's like ASCII-folding, in that it converts "complex" or "non-standard" versions of characters into "simpler" or "standard" ones. I've documented some examples that came up in a test of enabling ICU folding on English Wikipedia, which includes some Arabic characters.
- If ICU folding doesn't do enough—or does too much—for particular character sets on particular wikis, we can look into modifying its behavior, usually by telling it not to fold so aggressively, but also by adding particular characters to be folded into other characters.
- I hope that helps. If you have any more questions, I'm happy to try to answer them.
- (And of course I took so long replying that I missed some back and forth! I'll try to catch up!) TJones (WMF) (talk) 20:15, 23 January 2017 (UTC)
- Hi, I'm on english wiktionary.
- I choose لَقًى on purpose because it's easy to see that, having no entry of its own, it must be looked for typing its root.
- Yet, things get complicated with
- كُتُبِيّ (kutubiyy, "bookseller”). In an arabic text, you are to find its basic form without diacritics, that is كتبي, but if you look up for كتبي you will never find this term with this specific meaning, but one of the many possible homographs of it . You cannot type its diacritics since you do not know them yet, so you have to carry out a second search of its root 'ك ت ب'.
- Nevertheless, things may get way more complicated. For instance, try to find داوا (dāwā, “to treat a disease”).
- Hopefully it's been clarified, otherwise let me know. JMGN (talk) 20:23, 23 January 2017 (UTC)
Avoid homograph hyperlinks
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Hi, I suppose it's an automatic process, so perhaps this is not the place to post. I'll provide an example:
In the English entry for 'pilgrim', the etymology section provides the following info : 'Middle English (early 13th century) pilegrim', yet its link directs you to an entry of its Norwegian homograph without any reference to Middle English.
I'd like to avoid it, so that I do not waste time following a link which will not provide any relevant info.
Thank you in advance. JMGN (talk) 18:35, 25 January 2017 (UTC)
- This isn't really a CirrusSeach issue, but I think I can explain. Note that the link includes "#Middle_English" at the end. That's directing your browser to the right section on the page. Unfortunately, that section doesn't exist in this case. So, the link is correct, in that it's pointing to the place where the information should be, even though there's no information there right now.
- I can see arguments for or against this practice. On the one hand, it's confusing because you follow a link and the info you want isn't there. On the other hand, the pilgrim entry already has a link that points to where the correct information should be, so that when it is eventually added, no one needs to go back and figure out where links need to be created.
- Which way to do things is best is a question for the English Wikitionary community to decide. It looks like the Beer Parlour may be the place to ask. On the Community Portal it is described as being "for policy discussion and cross-entry discussion". If not, I'm sure they can point you to the right place to discuss it. TJones (WMF) (talk) 21:36, 25 January 2017 (UTC)
First sentence
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
In the English version, the base of all translations (!), the first sentence is broken in the moment. It seems like a corruption of Help:Searching:
- H:S “The quickest way to find information in MediaWiki is to look it up directly. On every page there is a search box.”
- H:CS “CirrusSearch is the quickest way to find information in MediaWiki is to look it up directly. On every page there is a search box.”
What was the intended information? Speravir (talk) 20:56, 25 January 2017 (UTC)
- It looks like that content was moved up from another section, where it was already incorrect english. Unfortunately the diff interface in wikimedia (and really most things that track text changes) is unsufficient for finding where the source of this content was or what edit's changed this particular piece of content. If i had to guess it should read something like:
- CirrusSearch is the quickest way to find information in MediaWiki. On every page there is a search box. EBernhardson (WMF) (talk) 21:16, 25 January 2017 (UTC)
- Thanks for your answer. Are you able to fix this, or do you know, how to inform a translation admin about his issue? Speravir (talk) 21:37, 25 January 2017 (UTC)
- @Speravir, I cleaned it up a little. Special:CirrusSearch is linked from all Wikimedia projects Special:Search pages (unless the local community has modified the link to point to a different, often local, page).
- Here's the Special:Search page on the Macedonian Wikipedia. Look for the "? Помош" link in the right corner to see how this help page is being referenced.
- I appreciate @Iniquity's help in trying to make the introduction more welcoming. I modified the language to reference search in a generic way, not just here on MediaWiki.org. I hope that my edits didn't break things too much. :) Additional contributions are welcome!
- One final note, Help:CirrusSearch describes how the search works on Wikimedia projects, all of which are using the CirrusSearch MediaWiki extension. However, Help:Searching describes the search feature in the default MediaWiki software. There is some overlap in how these two searches work, but do take care in coping information between the help pages. Some information does not apply to both! :) CKoerner (WMF) (talk) 23:31, 26 January 2017 (UTC)
- Thanks for help! :) Iniquity (talk) 23:35, 26 January 2017 (UTC)
Information about Deepcat
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
The information about Deepcat (at the end of Help:CirrusSearch#Intitle and incategory) is outdated:
- The maximal search depth of 15 subcats should be added. By the way you get this info, as well, in a box below the search bar, but before search results.
- The information about embedding the CSS must be removed. de:Benutzer:Christoph Fischer (WMDE)/Gadgets/DeepCat.css is empty, and the edit comment tells us ”removed due to new structure”. The description, how to embed, is wrong, anyway (a JavaScript command won’t work in a CSS).
This applies to wikitech:Nova Resource:Catgraph/Deepcat, too. If someone knows, how to get active people there aware of this issue, it would be nice to tell them. Speravir (talk) 18:53, 29 January 2017 (UTC)
- Thank you. I believe Phabricator T156603 is the way to go. I hope they raise the priority to "unbreak now".
- For new users I've updated the gadget source (and css) to point to User:Tobijat/Gadgets/DeepCat.js, but existing users will have to update there custom JavaScript. Cpiral (talk) 00:31, 30 January 2017 (UTC)
- @Cpiral: , now I noticed you changed the script source, too. Please, revert this part. The version of Tobijat tells us, it can only search up to 50 cats to a depth of 10 subcats, so it behaves differently. My request was only for removing the obsolete (and wrong) CSS embedding information.
- Edit: Ooh, I noticed, I could change it on my own now. Some days ago it was not allowed for me to edit the English version … Seems I have reached a certain level of something I do not know.
- But no, I think it would be better, if Cpiral would do the necessary change, perhaps by taking the comment of Perfektes Chaos (phab:T156603#2980853) into consideration. Speravir (talk) 18:51, 30 January 2017 (UTC)
- Well, I’ve also answered on the Phab task, but let me mention here, too, that https://de.wikipedia.org/wiki/User:Christoph_Fischer_(WMDE)/Gadgets/DeepCat.js still exists. BTW: Thank you, Cpiral for bringing it up in Phabricator. Speravir (talk) 05:00, 30 January 2017 (UTC)
- Solved for this help text. Speravir (talk) 18:59, 31 January 2017 (UTC)
Search in wiki.hu and AWB
[edit]I have some problems with the search function. When I search for a word I get a totally different result (hits) from for the same word between quotation marks. (e.g. egyenlő and "egyenlő") This happens in both headlined searches. The biggest problem is that the most part of the hits (without quotation marks) don't even contain the searched word! I would be grateful for help. -Pegy22 (talk) 23:21, 30 January 2017 (UTC)
- The stems of egyenlő can match too. The stemmer is on by default but can be turned off using "egyenlő" quoted. Cpiral (talk) 01:06, 31 January 2017 (UTC)
- I still experience some error. If I search for "Rákosi" (not stem) I get 1953 hits and some of them don't contain the word Rákosi. (e.g. hu:Lakatos István (író))
- I have another question too. How can I use the operators AND and OR with search? (e.g. "Rákosi" AND "Lakatos") Thank you in advance. -Pegy22 (talk) 16:09, 31 January 2017 (UTC)
- Rákosi is present in hu:Lakatos István (író) under the section századi magyar irodalom as a link to Rákosi Viktor.
- Yes you can use boolean operators, AND OR and NOT but their support is still very limited (no support for parenthesis and very poor support when used with special keywords). Note that currently the default operator between words is a AND so "Rákosi" "Lakatos" is equivalent to "Rákosi" AND "Lakatos". DCausse (WMF) (talk) 14:50, 1 February 2017 (UTC)
- Now I see is there. I used the the Chrome's search function which couldn't find "Rákosi". (!?)
- Thank you! Pegy22 (talk) 20:03, 2 February 2017 (UTC)
- @Pegy22
- 5317 62.4.55.96 (talk) 01:17, 11 March 2017 (UTC)
REST API to search?
[edit]ElasticSearch supports a rest api endpoint (typically something like /_search?q=search for something. However, I can't seem to find any endpoints that MediaWiki or Wikipedia expose that allow searching via a rest api. Is this capability part of the CirrusSearch extension? I was hoping to find out to try it out and analyze the results to see if they would suit my needs.
Thanks! 2001:4898:80E8:C:0:0:0:1E4 (talk) 19:55, 27 March 2017 (UTC)
- I should note also that opensearch won't work because it doesn't provide a relevancy score at all. And even though opensearch supports a relevancy score, there is no extension for it and it isn't built into the core opensearch implementation that is part of MW core. 2601:600:8300:1B70:50B8:D76E:1B82:152A (talk) 19:37, 28 March 2017 (UTC)
- There is an API, just not a "REST api", see API:Search and discovery. 197.218.88.234 (talk) 20:30, 28 March 2017 (UTC)
- Okay, that helps, I'm able to execute the query now. However, it looks like the score data was removed. Is there any way to return a weight, relevancy or score weighting anymore? Is it possible to search elastic search directly for an MW site? 2601:600:8300:1B70:50B8:D76E:1B82:152A (talk) 21:05, 28 March 2017 (UTC)
- Err, the page has more about internal information , perhaps reading it more thoroughly would help. 197.218.88.234 (talk) 21:15, 28 March 2017 (UTC)
- There is an internal undocumented query string argument, cirrusDumpResult, which you can append. This is an undocumented debug api. You are free to use it, but i can't promise it will always work and has no compatibility guarantees with respect to format and such:
- https://www.mediawiki.org/w/api.php?action=query&list=search&srsearch=test&cirrusDumpResult
- There are also plans working their way through to offer elasticsearch within the wikimedia labs environment with a full copy of production indices and full elasticsearch query access. If everthing goes to plan and budgets are approved this might go live sometime in the first half of 2018. EBernhardson (WMF) (talk) 22:33, 28 March 2017 (UTC)
- This is exactly what I was looking for. The other cirrus dump functions returned too much data and didn't return the score.
- Will this be added to the Cirrus Extension or made available in some way as part MW? Even bringing back the score value which was deprecated for some reason would do the trick. Ultimately I need this for an on-premises installation in order to aggregate with an internal search engine that pulls data from multiple sources. The relevancy score is required in order to consistently and accurately merge the results.
- Thank you! 2601:600:8300:1B70:6921:732F:B07:5B2A (talk) 22:46, 28 March 2017 (UTC)
- We could look into bringing the score field back. I wasn't around when it was deprecated so can't say why exactly it was removed. According to the git history most search backends used by mediawiki don't have a score that can be exposed, and those that do use very arbitrary scores. For example whenever we change how queries are built the scores change, but those changes have no bearing other than their use in relation to other results for the same query.
- For example the best score for https://www.mediawiki.org/wiki/?search=developer+summit&fulltext=1&cirrusDumpResult is 638, but the best score for https://www.mediawiki.org/wiki/?search=developer+summit+mediawiki&fulltext=1&cirrusDumpResult is 358. That certainly doesn't mean the top result for the first query is twice as good as the top result for the second query, it's just arbitrary. As another example, we changed how we build search queries abotu 6 months ago and the top score for the first query dropped from 1057 to the current 638. The resulting top result (in this case) is exactly the same, but the score dropped. This isn't an indication the result is worse, it's just an arbitrary number. EBernhardson (WMF) (talk) 21:39, 30 March 2017 (UTC)
- Thanks. Yeah, an arbitrary score isn't great as I could add that after the fact theoretically :). The score from the unofficial cirrusDumpResult call though, is that provided by ElasticSearch? If it is, I would expect that it isn't arbitrary then. So access to ElasticSearch's scoring results would be the most ideal. Any chance that unofficial call could get added to the public API and supported?
- Thanks. 2001:4898:80E8:4:0:0:0:4DA (talk) 15:44, 4 April 2017 (UTC)
Suggestion: Show results with category information
[edit]I would like to display the category of the article in the search results everytime.
At the moment the category only shows up if the category name is entered in the search query too.
Is there any possibility to change that ?
@EBernhardson (WMF) ? :) Lanthanis (talk) 12:07, 6 April 2017 (UTC)
- Hi @Lanthanis,
- We have an upcoming A/B test that will be displaying metadata on the search results page for each result that is returned.
- You can actually test this out right now, in your own browser, using your logged-in Wikipedia account. We have instructions for this self-guided testing of display of categories and more, along with a couple other tests that might be of interest to you.
- Let us know if this is what you were looking for! DTankersley (WMF) (talk) 14:07, 6 April 2017 (UTC)
- se 178.222.124.29 (talk) 16:12, 8 May 2017 (UTC)
Sort by page title would be nice
[edit]I know my page on Wiktionary contains "Native American tribe" and begins with Chi, but there are tons of results in random order to wade through. Equinox (talk) 20:29, 6 April 2017 (UTC)
- https://en.wiktionary.org/w/index.php?search=%22Native+American+tribe%22+prefix%3AChi&title=Special:Search&go=Go gives 3 matches. FriedhelmW (talk) 17:58, 7 April 2017 (UTC)
Using deprecated arguments
[edit]I just installed CirrusSearch, Elasticsearch and Elastica. Everything in the setup process worked but whenever I try to search something, I get a few errors, the two most frequent errors are:
Deprecated: Deprecated: Elastica\Query\FunctionScore::addFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 90
Deprecated: Deprecated: Elastica\Query\FunctionScore::addWeightFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 242
I can see what the error is, somewhere the functions get called with wrong parameters but I can't find out from where they get called incorrectly. Did I do something wrong, is it a common error?
Using:
Mediawiki: 1.28.2
PHP: 5.6.30-11+deb.sury.org~trusty+3 (apache2handler)
MySQL: 5.5.55-0ubuntu0.14.04.1
Elasticsearch: 1.7.6
CirrusSearch: 0.2 (dcb0cf9)
Elastica: 1.3.0.0 (4607acf)
EDIT:
I'm able to search and it finds results, so the errors aren't critical for the search but I suspect they are for filtering. Jonathanwinter (talk) 09:02, 31 May 2017 (UTC)
- It looks like the version of Elastica extension is too new. Filters were deprecated in elasticsearch 2.x, and the version of Elastica that supports 2.x emits those deprecation warnings. The REL1_28 branch for Elastica extension is at 0959e38 best would be to downgrade to that version. EBernhardson (WMF) (talk) 23:33, 31 May 2017 (UTC)
- EBernhandson, thank you for your answer.
- Actually, I used the older version from branch REL1_27 and upgraded now to REL1_28 but I still get this error.
- I left out some error messages, because I thought they weren't relevant but since your answer didn't solve the issues, I now publish all error messages:
- Deprecated: Deprecated: Filters are deprecated. Use queries in filter context. See https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-filters.html in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Filter/Query.php on line 8
- Deprecated: Deprecated: Filters are deprecated. Use queries in filter context. See https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-filters.html in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Filter/AbstractFilter.php on line 8
- Deprecated: Use BoolQuery instead. Filtered query is deprecated since ES 2.0.0-beta1 and this class will be removed in further Elastica releases. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/Filtered.php on line 8
- Deprecated: Deprecated: Elastica\Query\Filtered passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/Filtered.php on line 33
- Deprecated: Deprecated: Elastica\Query\Filtered::setFilter passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/Filtered.php on line 65
- Strict Standards: Declaration of CirrusSearch\Search\FunctionScoreDecorator::addFunction() should be compatible with Elastica\Query\FunctionScore::addFunction($functionType, $functionParams, $filter = NULL, $weight = NULL) in /var/www/mediawiki-1.28.2/extensions/CirrusSearch/includes/Search/RescoreBuilders.php on line 299
- Deprecated: Deprecated: Filters are deprecated. Use queries in filter context. See https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-filters.html in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Filter/Terms.php on line 7
- Deprecated: Deprecated: Elastica\Query\Filtered passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/Filtered.php on line 33
- Deprecated: Deprecated: Elastica\Query\Filtered::setFilter passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/Filtered.php on line 65
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addWeightFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 242
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 90
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addWeightFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 242
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 90
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addWeightFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 242
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 90
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addWeightFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 242
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 90
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addWeightFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 242
- Deprecated: Deprecated: Elastica\Query\FunctionScore::addFunction passing AbstractFilter is deprecated. Pass AbstractQuery instead. in /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php on line 90
- Warning: Cannot modify header information - headers already sent by (output started at /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php:242) in /var/www/mediawiki-1.28.2/includes/WebResponse.php on line 45
- Warning: Cannot modify header information - headers already sent by (output started at /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php:242) in /var/www/mediawiki-1.28.2/includes/WebResponse.php on line 45
- Warning: Cannot modify header information - headers already sent by (output started at /var/www/mediawiki-1.28.2/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/FunctionScore.php:242) in /var/www/mediawiki-1.28.2/includes/WebResponse.php on line 45
- I don't know if it's something I did wrong in the installation or if there are other extensions installed which could harm cirrussearch. Jonathanwinter (talk) 08:51, 6 June 2017 (UTC)
- These are all basically the same thing, they are emitted by the Elastica library when using a version of Elastica that is designed to support elasticsearch 2.x. Since you are running 1.7 (the version supported by 1.28) somehow Elastica needs to be downgraded to the appropriate version.
- Typically the elastica library itself is pulled in by composer, perhaps double check what version of Elastica composer is loading? You should have version 3.1.1 per https://github.com/wikimedia/mediawiki-extensions-Elastica/blob/REL1_28/composer.json EBernhardson (WMF) (talk) 18:01, 9 June 2017 (UTC)
Problem with ElasticsearchIllegalArgumentException[failed to execute script]; nested: ScriptException[scripts of type [inline], operation [update] and lang [groovy] are disabled]
[edit]Hello, I was attempting to setup CirrusSearch for use under Centos 7
Versions:
MediaWiki 1.23 (From EPEL)
Elastica: Elastica-REL1_23-e112720
CirrusSearch: REL1_23-8e386cc
ElasticSearch: 1.7.6 (from https://www.elastic.co/guide/en/elasticsearch/reference/1.7/setup-repositories.html)
Following the readme,
- cat extensions/CirrusSearch/elasticsearch.yml >> /etc/elasticsearch/elasticsearch.yml (restarting elasticsearch)
- Set MW_INSTALL_PATH
- php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --startOver (seemed to work)
- php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip (Works the first time, fails during additional calls
- php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse (fails)
(Pruned output, but repeated lines of)
[ wiki] Indexed -1 pages ending at 1429 at -3/second
[ wiki] Indexed -1 pages ending at 1482 at -4/second
In the php logs, I see
update: /wiki_content_first/page/2873 caused ElasticsearchIllegalArgumentException[failed to execute script]; nested: ScriptException[scripts of type [inline], operation [update] and lang [groovy] are disabled];
update: /wiki_content_first/page/2877 caused ElasticsearchIllegalArgumentException[failed to execute script]; nested: ScriptException[scripts of type [inline], operation [update] and lang [groovy] are disabled];
update: /wiki_content_first/page/2879 caused ElasticsearchIllegalArgumentException[failed to execute script]; nested: ScriptException[scripts of type [inline], operation [update] and lang [groovy] are disabled];
[Called from CirrusSearch\ElasticsearchIntermediary::failure in /usr/share/mediawiki123/extensions/CirrusSearch/includes/ElasticsearchIntermediary.php at line 97] in /usr/share/mediawiki123/includes/debug/Debug.php on line 303
In researching "scripts of type [inline], operation [update] and lang [groovy] are disabled", it looks like I should be able to use
script.disable_dynamic setting: false
I have also tried variations of:
script.groovy.sandbox.enabled = true,
script.engine.groovy.inline.update: on
(As well as all the groovy.*.*'s to on as listed on the above page), but so far no luck. Ncoulson bnet (talk) 21:51, 12 June 2017 (UTC)
Find function doesn't work
[edit]I have a column of numbers some of with the minus symbol denoting a negative number. In selecting the column and the Find Function to identify those number with a minus symbol nothing computes. 107.77.224.117 (talk) 14:07, 24 June 2017 (UTC)
- It seems that you asked at the wrong place. CirrusSearch searches articles, not spreadsheets or databases. Tacsipacsi (talk) 10:22, 25 June 2017 (UTC)
Multiple problems
[edit]Hello there,
I just installed my Mediawiki and added CirrusSearch but I'm having quite a number of problems:
- I don't have the completion suggester
- I don't have the "Did you mean" suggestion neither
- The fuzziness isn't as good as I expected it to be: after a few tests, I've noticed that it only accepts one error (like only one added/changed/removed letter) even for 10 letters words
Do you know how to solve my problems? (or at least explain me why they exist...)
Thanks in advance ! Débutante (talk) 08:55, 4 July 2017 (UTC)
- 1. the completion suggester is not enabled by default but it will only cover search-as-you-type
- 2. "Did you mean" suggestions are activated by default but there few limitations:
- typo must not be in the first 2 letters- on small indices it may not work properly as it requires a lot of data to work- 3. The fuzziness isn't as good as I expected: what do you mean? Are you using the fuzzy syntax (word followed by a tilde) word~ ? If yes this has also limitation, in general the edit distance is maxed at 2 and the word must share a common prefix (usually 2 letters)
- In order to improve "Did you mean suggestions" you could do the following:
- Activate the use of the text (instead of just titles) to increase recall:
- $wgCirrusSearchPhraseSuggestUseText = true;
- You'll have to regenerate your index after activating this option:
updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier nowDCausse (WMF) (talk) 17:54, 6 July 2017 (UTC)- thank you, I found it
- oh okay, it must be because I have a really small index as I'm only doing tests
- Yeah I'm using this and after a few test, I've noticed that it only correct one error For exemple, I have a document with the word pharaon and an other one with pharhaon. If I search for pharaon~ it will find both but if I search for pharaons~ it will only find pharaon so it's only one letter and not two as expected. Débutante (talk) 14:49, 12 July 2017 (UTC)
- Hi,
- How is completion suggestor enabled? 66.77.160.179 (talk) 14:21, 16 October 2017 (UTC)
- The completion suggester is enabled by setting 'yes' to $wgCirrusSearchUseCompletionSuggester .
- You'll then need to refresh the completion suggester index regularly by running the maintenance script updateSuggesterIndex.php. DCausse (WMF) (talk) 14:32, 16 October 2017 (UTC)
CirrusSearch Version
[edit]Hello,
I'm trying to install CirrusSearch on an old wiki (1.25) and I'm currently having multiple problems due to the versions of my extensions (or so I think it is, I'm not a 100% sure). When I try creating my index with >php updateSearchIndexConfig.php I have multiple errors depending on which release of CirrusSearch I'm trying to install :
CirrusSearch REL1.?? (it was given to me with the wiki but wasn't activated so maybe it's REL1.25) :
PHP Fatal error: Call to undefined method CirrusSearch\UpdateOneSearchIndexConfig::getClient() in C:\wikitest\extensions\Elastica\ElasticaConnection.php on line 46
CirrusSearch REL1.27 :
PHP Fatal error: Class 'MediaWiki\MediaWikiServices' not found in C:\wikitest\extensions\CirrusSearch\includes\Maintenance\Maintenance.php on line 64
CirrusSearch REL1.28, CirrusSearch REL1.29 :
PHP Fatal error: Class 'SearchIndexField' not found in C:\wikitest\extensions\CirrusSearch\CirrusSearch.php on line 1115
PHP Fatal error: Class 'SearchIndexField' not found in C:\wikitest\extensions\CirrusSearch\CirrusSearch.php on line 1229
I use :
MediaWiki 1.25.6
PHP 5.6.31
ElasticSearch 2.3.3
Elastica 1.3.0.0
If someone knows how to solve this ... or where I could download older version of CirrusSearch, I can't seem to find them
PS : I can't update my Wiki as I use LDAP Authentification extension and it's not available for upper versions Débutante (talk) 15:45, 12 July 2017 (UTC)
- I think you need to try CirrusSearch REL1_25 but beware that you'll have to downgrade your elasticsearch cluster to a very old version (1.3.2 or above). It will be almost impossible for you to run elastic 2.3.3 with such an old codebase without considerable efforts to backport elastic 2.x support.
- You need to make sure that the Elastica extension is also set on the REL1_25 branch. DCausse (WMF) (talk) 07:51, 13 July 2017 (UTC)
- I finally solved it, I just needed an older version of ElasticSearch : 1.0.3 as I now know that I have CirrusSearch 0.2 which is REL1.24
- Thanks for everything ! Débutante (talk) 11:41, 17 July 2017 (UTC)
How do I configure Cirrus to search the Cargo extension tables
[edit]I have installed elasticsearch and cirrus search and ran the index, but now when I search, I am trying to find data contained in the Cargo tables. How do I tell elastic and cirrus to index them? Jhuff05 (talk) 14:45, 13 July 2017 (UTC)
- By cargo tables, do you mean Extension:Cargo? As far as I know that extension does not integrate with CirrusSearch in any way. The typical way for an extension to integrate with CirrusSearch is to create pages with a custom content handler, and have that content handler return appropriate data to be indexed. EBernhardson (WMF) (talk) 21:03, 13 July 2017 (UTC)
- That or you could use Manual:Hooks/CirrusSearchMappingConfig and Manual:Hooks/SearchIndexFields and Manual:Hooks/SearchDataForIndex. I think Extension:GeoData uses those, so you could probably look at it as an example. Smalyshev (WMF) (talk) 21:57, 13 July 2017 (UTC)
when a new essay will be shown in wikipedia?
[edit]Hi, I wrote an essay in my user talk and submitted to wikipedia. Now my query is when it will publish or come to search results?
Thanks in advance. Manik mahmud (talk) 06:27, 29 July 2017 (UTC)
- generally within a few minutes for internal search. For an essay you might have to adjust the namespaces searched, as by default only content is searched and I don't think that includes essays. If it's not coming up provide a link to the essay and I'll see what happened. EBernhardson (WMF) (talk) 22:16, 1 August 2017 (UTC)
ForceSearchIndex.php Problems
[edit]I wont Install CirrusSearch on my MediaWiki.
I follow the instructions from the README file.
By teh Step from
Next bootstrap the search index by running:
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
i become this Type error. Can Somebody help me to Fix this?
Software Version
MediaWiki 1.28.2
PHP 5.6.31-4+ubuntu16.04.1+deb.sury.org+4 (apache2handler)
MySQL 5.7.19-0ubuntu0.16.04.1
Elasticsearch 5.5.2
PHP TypeError: Argument 1 passed to CirrusSearch\ForceSearchIndex::attachPageConditions() must be an instance of Wikimedia\Rdbms\IDatabase, instance of DatabaseMysqli given, called in /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php on line 431 in /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php on line 447
PHP Stack trace:
PHP 1. {main}() /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php:0
PHP 2. require_once() /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php:605
PHP 3. CirrusSearch\ForceSearchIndex->execute() /var/www/html/mediawiki/maintenance/doMaintenance.php:111
PHP 4. CirrusSearch\ForceSearchIndex->getUpdatesByIdIterator() /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php:171
PHP 5. CirrusSearch\ForceSearchIndex->attachPageConditions() /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php:431
[e9c93385af653b35e2639272] [no req] TypeError from line 447 of /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php: Argument 1 passed to CirrusSearch\ForceSearchIndex::attachPageConditions() must be an instance of Wikimedia\Rdbms\IDatabase, instance of DatabaseMysqli given, called in /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php on line 431
Backtrace:
#0 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php(431): CirrusSearch\ForceSearchIndex->attachPageConditions(DatabaseMysqli, BatchRowIterator, string)
#1 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php(171): CirrusSearch\ForceSearchIndex->getUpdatesByIdIterator()
#2 /var/www/html/mediawiki/maintenance/doMaintenance.php(111): CirrusSearch\ForceSearchIndex->execute()
#3 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php(605): require_once(string)
#4 {main} 92.217.183.173 (talk) 07:54, 29 August 2017 (UTC)
- This suggests the versions of MediaWiki and CirrusSearch you are using are not in sync. CirrusSearch is expecting a newer version of MediaWiki than you have installed. Which versions do you have? EBernhardson (WMF) (talk) 23:00, 29 August 2017 (UTC)
- Hello EBernhardson,
- thanks for the info.
- Here my Versions:
- MediaWiki 1.28.2
- PHP 5.6.31-4+ubuntu16.04.1+deb.sury.org+4 (apache2handler)
- MySQL 5.7.19-0ubuntu0.16.04.1
- Elasticsearch 5.5.2
- Now i will try the latest Version MediaWiki 1.29.1.
- I send feedback if I'm finished. Elvis2912 (talk) 03:45, 30 August 2017 (UTC)
- Thanks for Helping,
- the installation of the newer Version has Fix the problem. Elvis2912 (talk) 04:27, 30 August 2017 (UTC)
- I am having problems too running this script, which part did you use a newer version on? Rishxpre55 (talk) 14:18, 30 August 2017 (UTC)
- Hello Rishxpre55,
- it was a new installation from Mediawiki with an empty Database. So i was free to delete all files and tables and install the latest Version 1.29.0.
- To run CirrusSearch you need the extension Elastica too. Hope i could help. Elvis2912 (talk) 07:36, 31 August 2017 (UTC)
- hi, i am using mssql server which is giving me no end of issues, not sure that 1.29 works with mssql server. I'm using 1.27. Rishxpre55 (talk) 13:39, 31 August 2017 (UTC)
Lint?
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Is there a way to search for Lint errors? e.g. pages in Special:LintErrors/misnested-tag? Thanks. Valerio Bozzolan (talk) 13:47, 18 September 2017 (UTC)
- In theory, it might be able to be done using an extension called Linter by adding a field to the ElasticSearch indices that has the error info, and then allow searching for it.
- Hope that helps! DTankersley (WMF) (talk) 16:12, 18 September 2017 (UTC)
- Thanks, asked in this topic. Valerio Bozzolan (talk) 17:48, 18 September 2017 (UTC)
differences in wikis
[edit]- https://el.wikipedia.org/w/index.php?search=感染はらい菌の経鼻
- and
- https://en.wikipedia.org/w/index.php?search=感染はらい菌の経鼻
- produce different results. Vanished user Xorisdtbdfgonugyfs (talk) 15:52, 28 September 2017 (UTC)
- I think you wanted to link to the English and Greek Wikipedias for the same search of "感染はらい菌の経鼻". I hope you don't mind that I attempted to fix it. It looks like one shows search results from the Chinese Wikipedia while the other does not. @TJones (WMF), do you know why this is? CKoerner (WMF) (talk) 19:06, 28 September 2017 (UTC)
- That's exact the question. Why the same search engine displays different results? Vanished user Xorisdtbdfgonugyfs (talk) 16:32, 30 September 2017 (UTC)
- @Xoristzatziki—of course one difference is that English Wikipedia has one result and Greek Wikipedia has no results. The other difference, which I think is the important one here, is that English Wikipedia is showing results from Chinese Wikipedia. The Chinese results are there on English Wikipedia because we have enabled language detection on English Wikipedia for queries that get 0, 1, or 2 results. If we detect a language other than English, we search the Wikipedia in that language. If that search gets any results, we display those results, too.
- Unfortunately, we've only enabled the language detection on nine Wikipedias: English, French, Italian, Spanish, German, Portuguese, Russian, Japanese, and Dutch, based roughly on the amount of search traffic they were receiving at the time we made the list. The list of languages being tested for language detection is optimized for each wiki, based on the languages that occur in queries on that wiki.
- Languages that are similar are more easily confused, like Spanish and French. On English Wikipedia, there are many more Spanish queries than French queries, for example. At the beginning, disabling French language detection improved the accuracy—while we missed a small number of French queries, we prevented a larger number of Spanish queries being incorrectly labeled as French. (Since that early deployment, accuracy has improved and French and Spanish are both enabled on English Wikipedia, though with a bias in favor of Spanish.) On French Wikipedia, obviously having French enabled is more valuable than having Spanish enabled if you have to choose between them. The exact list varies for all nine Wikipedias.
- We stopped working on enabling language identification for additional wikis about a year ago in favor of other projects. The parent task for the process was documented in Phab ticket T121541, though we've closed it. Polish, Arabic, and Chinese were next on the list. Based on search volume, Greek would be fairly far down the list, though there is a ticket (Phab ticket T140300) to enable some default list of languages everywhere else—I suggested Arabic, Armenian, Chinese, English, Greek, Hebrew, Japanese, Korean, Russian, and Thai in the ticket. But it still needs more research (maybe there's a better list other than just based on uniqueness of the script) and some testing.
- At the moment it isn't high on our priority list, but please comment on T140300.
- Also, the language detection isn't perfect. As mentioned before, Spanish and French interfere with each other sometimes. It's especially difficult because queries are often very short and many individual words occur in many languages (with the same or different meanings), or look very similar because they were borrowed. drama is my new favorite example, since it appears in at least 16 languages; it's originally Greek, but written in the Latin alphabet it looks like Spanish, English, and/or Italian to the language identification.
- As I was saying, the language identification isn't perfect. Your example query was identified as Chinese on English Wikipedia, but it looks like it is actually Japanese. (it certainly has some Japanese characters in it.) It gets zero results on Japanese Wikipedia, so if other Wikipedias with language detection enabled correctly identified it as Japanese, nothing would happen because there are no results to show. You found a very interesting corner case! TJones (WMF) (talk) 14:58, 2 October 2017 (UTC)
Missing documentation of new subpageof keyword
[edit]Apparently the newer subpageof keyword isn't documented (introduced in T159321). It also seems to come with its own quirks:
| Keyword | subpagesof | prefix |
|---|---|---|
| Search string | help:namespaces | help:namespaces |
| Result (main namespace) | none | found |
| result (all namespaces) | found | found |
| result (help namespace) | none | found |
It seems that the new keyword is more strict. If the prefix "help:" is removed from above search string it finds it in the specific namespaces.
This is generally a good thing, but confusing to those that are used to the prefix keyword. 197.218.91.102 (talk) 20:07, 30 September 2017 (UTC)
- This keyword was added to circumvent the limitation and ambiguities of the prefix keyword and to ease the development of an user interface that will assist the user in creating complex queries.
- It's not meant to replace the prefix keyword but to specialize the usage of the prefix keyword which is often used to find subpages.
- Differences:
- prefix does not care if the searched prefix denotes a parent page, e.g. prefix:help:namespa will match, subpageof:namespa won't
- stop handling namespace inside the keyword itself, it was confusing for developers. The main reason is that the namespaces to search on was controlled by different sources: the namespace filter UI block and the keyword.
- prefix is greedy, everything after the keyword is consumed making it very difficult to combine with other keywords, i.e. the prefix have to be the last keyword in the query. Subpageof is not greedy and quotes must be used to search pages with spaces, making it possible to combine with other keyword anywhere in the query.
- The reason you find very different results is that you include the namespace help in the subpageof query forcing the keyword to find subpages that start with help:namespaces, so when searching on all namespaces on mw.org you'll find pages like Translations:Help:Namespaces/6/en which are subpages of Help:Namespaces in the Translations namespace.
- The equivalent of prefix:help:namespaces is searching subpageof:namespaces with the help namespace selected in the namespace filter.
- Note that you can use a shortcut to search a specific namespace: simply prefix your query with the namespace name:
- The proper way to "translate" prefix:help:namespaces with subpageof is help:subpageof:namespaces.
- Prefix was the sole keyword to work that way (namespace handling in the keyword itself).
- Sorry for the lack of documentation, I was waiting for feedback from the developers that requested a change to the prefix keyword. DCausse (WMF) (talk) 14:29, 2 October 2017 (UTC)
- @DCausse (WMF) is it OK to add this to the documentation now? CKoerner (WMF) (talk) 15:42, 31 October 2017 (UTC)
- @CKoerner (WMF) sure, I'll go ahead and document the keyword (probably this week). DCausse (WMF) (talk) 08:03, 1 November 2017 (UTC)
Warning Regular expression
[edit]: "Do not run a bare insource:/regexp/ search."
There is no clarification what that means, what is the alternative of a bare regexp?? All examples are bare regexp. → User: Perhelion 09:32, 3 November 2017 (UTC)
- It refers to the previous paragraph, i.e. you should define a search domain in addition to the regex search. So all examples are not bare regexes, as the first four use non-regex
insource:, and the fifth usesprefix:to limit the search domain on which the regex is tested. Tacsipacsi (talk) 15:05, 3 November 2017 (UTC)
Maximum number of results
[edit]I believe there is a maximum of 10,000 search results. Should be mentioned. 108.51.44.227 (talk) 16:10, 6 November 2017 (UTC)
- It's a hard-coded setting in the Elasticsearch configuration. 108.51.44.227 (talk) 16:11, 6 November 2017 (UTC)
- That is correct, there is a 10,000 search result maximum; please refer to the conversation about the search results limits in this phab ticket: https://phabricator.wikimedia.org/T177270. DTankersley (WMF) (talk) 16:47, 6 November 2017 (UTC)
Translate: how to have suggestions in the current language ?
[edit]Hi everyone,
I am running a multilingual MediaWiki with MLEB, and wanted to know if there is any means to have the suggestions in the search localised ?
To be complete, my wiki is in French with pages translated in English.
I translate the titles of the pages (by using the namespace "Translations:Pagename/Page display title/en")
I want my users who search for an english term to display first the "Page display title" for their language in the suggestions.
I think it's only a setting to do, but I don't know where to search.
Thanks in advance ! Tuxxic (talk) 14:12, 16 November 2017 (UTC)
- Assuming you don't use the completion suggester you may be able to tune results based on user language by using $wgCirrusSearchLanguageWeight:
- DCausse (WMF) (talk) 17:20, 16 November 2017 (UTC)
$wgCirrusSearchLanguageWeight = [ 'user' => 10.0, // should favor pages in user language 'wiki' => 1.2, // boost pages in wiki language ]; - Hi,
- Thanks for answering, don't close too soon please.
- I don't think it's what I need. Can you please explain what the setting does ?
- Also, I need to have the translated page title in the suggestions. Tuxxic (talk) 14:37, 17 November 2017 (UTC)
- Sorry, I probably misunderstood your question.
- These settings permit to favor pages in some language.
- E.g. you have a list of pages:
- - Documentation
- - Documentation/fr
- - Documentation/en
- If my user interface is set to french it should help to promote Documentation/fr first when starting to type Docum.
- If the title is translated there's currently no way to make it appear in search results. This would require more tight integration between Translate and CirrusSearch, the translated title is not really a title and is not known by CirrusSearch.
- You may want to file a feature request in phabricator. DCausse (WMF) (talk) 15:22, 17 November 2017 (UTC)
- Cool, thanks for understanding better my needs :)
- I'll file a feature request in Phabricator, I hoped there was a means to do it flawlessly.
- Thanks for your time. Tuxxic (talk) 09:03, 24 November 2017 (UTC)
The search box suggests non existent pages
[edit]Type "Red link" in top right search box.
The suggestions are: "Red link" and "Manual:Red link". The latter is a non existent page: Manual:Red Link
Type "Manual:Red link" in Special:Search. There is a suggestion for the title "Manual:Red link". When you click on it, the result page is shown with the message that the page does not exist etc. It seems that a page with this title never existed.
I saw a couple of similar cases but don't remember the titles. Geraki (talk) 15:03, 29 November 2017 (UTC)
- Thanks Geraki for the report. I'd never come across that before. Kind of spooky. :)
- Thanks to 197.218 as well for the link to the relevant phabricator task. CKoerner (WMF) (talk) 15:43, 29 November 2017 (UTC)
Change Elasticsearch default port and HTTP headers
[edit]1) How do I change the default port. My cluster runs on port 8080.
2) How do I attach HTTP headers to the CirrusSearch client that will be sent on each request? 2001:700:200:9:1F9:FF59:ADCC:9F5F (talk) 10:43, 5 December 2017 (UTC)
Increase timeout
[edit]Hello everyone, i am often performing regexp searches on WikiCommons which run into timeout, e. g. https://commons.wikimedia.org/w/index.php?search=insource%3A%2F%5BDd%5Date+%2A%3D+%2A%5B0-3%5D%5B0-9%5D+%2Ad%5C%27%2F&title=Special:Search&profile=default&fulltext=1&searchToken=65mmlim2p8t95o8ue9saz8a7f
Would it be possible to extend the timeout for certain groups of users? Aschroet (talk) 12:46, 27 December 2017 (UTC)
- If you use some other filter with the regex insource: (e.g. plain text insource), it’s less likely to time out. In this case, prepending File:insource:Date (to limit to the File namespace and search for “Date” in the search index) might help, although it still timed out for me. Tacsipacsi (talk) 18:32, 27 December 2017 (UTC)
Recently Installed and Index Ran, Yet No Search Results
[edit]Hello,
MW Version 1.24.2
PHP 5.5.9
MySQL 5.5.58
Lua 5.1.5
Elasticsearch 1.7.3
I have been working on replacing my wiki's search index with something better than the original since it is too slow with the amount of data we have (2M pages, text table of about 50GB.)
I tried installing Cirrussearch onto the wiki and after running the indexer (which took a very long time as the guide for bootstrapping large wikis did not work) I edited my LocalSettings.php to enable Cirrussearch only to be met with query upon queries that return no results.
I went by the book according to the Extension:CirrusSearch page (curl, ESearch, Elastica all installed; Elastic and Cirrussearch versions for 1.24.2 taken from github) as well as the README for Cirrussearch, following these steps:
Place the CirrusSearch extension in your extensions directory.
Make sure you have the curl php library installed (sudo apt-get install php5-curl in Debian.)
You also need to install the Elastica MediaWiki extension.
Add this to LocalSettings.php:
require_once( "$IP/extensions/Elastica/Elastica.php" );
require_once( "$IP/extensions/CirrusSearch/CirrusSearch.php" );
$wgDisableSearchUpdate = true;
Configure your search servers in LocalSettings.php if you aren't running Elasticsearch on localhost:
$wgCirrusSearchServers = array( 'elasticsearch0', 'elasticsearch1', 'elasticsearch2', 'elasticsearch3' );
There are other $wgCirrusSearch variables that you might want to change from their defaults.
Now run this script to generate your elasticsearch index:
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php
Now remove $wgDisableSearchUpdate = true from LocalSettings.php. Updates should start heading to Elasticsearch.
Next bootstrap the search index by running:
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse
Note that this can take some time. For large wikis read "Bootstrapping large wikis" below.
Once that is complete add this to LocalSettings.php to funnel queries to ElasticSearch:
$wgSearchType = 'CirrusSearch';
Running curl 'localhost:9200/_cat/indices?v' returns the three green health, open status indexes:
wikidb-ww_general_first pri: 4 rep: 0 docs.count: 14789 docs.deleted: 0 store.size: 36.4mb
wikidb-ww_content_first pri: 4 rep: 0 docs.count: 2129367 docs:deleted: 0 store.size: 98.2gb
mw_cirrus_versions pri: 1 rep: 0 docs.count: 2 docs.deleted: 2 store.size: 3.5kb.
Does anyone know about this type of error? Were there are no search results even though the search bar gives results while typing? I tried installing/reconfiguring Sphinxsearch before (this was on the machine by the previous devs who worked on it) and it gave me the same error. Default search "works" in that I will get pages but it takes 2 minutes to do a query which is unacceptable.
Thanks in advance:
-kgmills Kgmills (talk) 21:35, 30 December 2017 (UTC)
- Apparently all maintenance scripts worked nicely, but when searching you only see results in the autocomplete suggestions but nothing is displayed when you hit search?
- You mention an error in your message but I don't see it, do you see a particular error message or simply an empty search results page (as if the index was empty)?
- Thanks! DCausse (WMF) (talk) 17:01, 2 January 2018 (UTC)