Extension talk:CirrusSearch/2017
Add topic| This page used the Structured Discussions extension to give structured discussions. It has since been converted to wikitext, so the content and history here are only an approximation of what was actually displayed at the time these comments were made. |
Discussion related to the CirrusSearch MediaWiki extension.
See also the open tasks for CirrusSearch on phabricator.
CirrusSearch on Ubuntu 16.04
[edit]Will CirrusSearch work on php5.6-curl on Ubuntu 16.04LTS? I don't think php5-curl can be installed on Ubuntu 16.04LTS. It actually comes with php 7 by default.
Thanks. 98.229.68.160 (talk) 07:09, 4 January 2017 (UTC)
- php5.6-curl will certainly work, if running against 5.6. php7.0 also might work although we haven't explicitly tested it. EBernhardson (WMF) (talk) 19:34, 17 January 2017 (UTC)
Suggestion: Surface and make number average number contributors visible
[edit]Problem: Determining the number of contributors
Background :An important metric to determine the usefulness or quality of any published document is peer review. A page that and has all changes by a single contributor is less likely to be reliable than one that has been reviewed by several editors.
Use cases:
As a reader I want to know if an article is has been reviewed so that I can determine if its content is more likely to be reliable before researching the references.
As an editor I want to find all articles with few authors so that I can improve its quality.
Possible implementation: Expose whether a page has been "reviewed" or has more than one or two contributors. This can be shown in search results alongside date and words. It could be exposed as : "contributors: > 3", e.g "incategory:scientists contributors: > 3"
This may be computationally expensive on the servers, so it may be enough to just determine ranges, e.g. no review = 1 contributor, minimal review less than 3 contributors, reviewed > 3. 197.218.90.119 (talk) 12:31, 27 January 2017 (UTC)
- If I understand correctly you'd like to see this information - number of contributors - in both the metadata that appears at the end individual search results (like the number of words, size in bytes, and last edited date), and also as a filter for which one could search for specifically. "Show me all pages in category Scientists with contributors greater than 3". Is that right?
- I thought this information was exposed via the API, but I'm not having any luck confirming that. I do know it's part of the Page Information (example) and my suspicion is that it is computationally expensive. :)
- If I have the understanding correct, please let me know. CKoerner (WMF) (talk) 18:26, 27 January 2017 (UTC)
- > understand correctly you'd like to see this information - number of contributors - in both the metadata that appears at the end individual search results (like the number of words, size in bytes, and last edited date), and also as a filter
- Yep.
- >I thought this information was exposed via the API,
- The API does support it (https://en.wikipedia.org/w/api.php?action=query&prop=contributors&titles=Main_Page). It may be computationally expensive to retrieve the full list of contributors especially in pages with tens of thousands of revisions and thousands of contributors, but it probably isn't a big deal to expose just a few. If it were, it is likely that the api would be disabled.
- It is possible that this will be low priority or declined because there is an alternative, but it still seems like a good thing to keep in mind in future search developments. 197.218.90.119 (talk) 19:04, 27 January 2017 (UTC)
Suggestion: When "did you mean" doesn't return any results try automatically searching using completion suggestor in search pages
[edit]Problem: The "Did you mean" tool in search pages does not always provide (useful) suggestions
Background: Some search queries may be worse because of slight misspellings, and sometimes "did you mean" provides worse suggestions than "completion suggestion". For example, currently searching on english wikipedia for:
| Words | Did you mean | Completion suggestion |
|---|---|---|
| bit torant | bit torent | bittorrent |
| Schwarznagg | Schwarznegger | |
| rumplestilskan | Rumpelstiltskin |
Proposed solution :
Use the first suggestion from completion suggestion whenever there are no "did you mean suggestions". 197.218.80.181 (talk) 08:57, 30 January 2017 (UTC)
- I agree with you this is inconsistent, please see T135920 where we track a similar problem (inconsistencies between did you mean and the completion suggester).
- I agree that in some cases the first result from the completion suggester can be better. The problem is that the completion suggester is a prefix search kind of API, we would have to make sure that we don't suggest pages that are ''too far''.
- I think the opposite example would be: search for route but misspelled as roote, did you mean will suggest route but the first result from the completion suggester is Rooted graph.
- Your solution might work if we display the completion suggester 1st result just in case the did you mean failed. We still need to make sure that the query covers a sufficiently large prefix of the suggested article. DCausse (WMF) (talk) 10:49, 30 January 2017 (UTC)
- > Your solution might work if we display the completion suggester 1st result just in case the did you mean failed.
- Yes, that was the suggestion.
- The point is that some results are always much better than no results. In extreme cases where the query doesn't cover a sufficiently large prefix, then a simple suggestion would suffice, instead of a "no results" page. People who want a definite exact match can always be advised to use quotes instead, e.g. "Schwarznagg".
- So even though the word is not used (currently) at all on english wikipedia google still changed the query and surfaced some results, see :
- https://www.google.com/search?q="Schwarznagg"+site%3Aen.wikipedia.org
- vs
- https://www.google.com/search?q=%22Schwarznagg%22+site%3Aen.wikipedia.org
- For the specific case of "bit torent", because the only result is a single page, and that page was a redirect, it would make much more sense to search for that page title instead of just returning the redirect, compare:
- "bit torent" -> https://en.wikipedia.org/w/index.php?title=Special:Search&search=bit+torent
- vs
- bittorent -> https://en.wikipedia.org/w/index.php?title=Special:Search&profile=default&fulltext=1&search=bittorent
- The "bit torent" is a hacky redirect workaround added by editors to give some results for something the search engine would likely give none. 197.218.81.182 (talk) 12:55, 30 January 2017 (UTC)
Can the REL1_27 branch of this extension work with ElasticSearch 2.x? ElasticSearch 1.7.x just entered EOL
[edit]Dear devs, could it work with ElasticSearch 2.4? If not, could you make it compatible? It doesn't seem to make sense to have to upgrade from a MediaWiki LTS (1.27) to 1.28 just for that... but I really dislike running an umaintained branch of any software on my server.
Elastic 2.4.x will be maintained at least until 2018-02-28; still not enough time to upgrade to the next MW LTS (which will be released June 2018, I guess), but at least it's far off...
Thanks, Dror.
P.S. Elastic EOL policy and timeline: https://www.elastic.co/support/eol FFS Talk 18:07, 1 February 2017 (UTC)
- https://phabricator.wikimedia.org/T146636 doesn't look too promising :-(
- As a sidenote - using the 1.28 branch on 1.27 is not an option. I tried, it depends on features added in 1.28 and completely breaks when trying to use it with 1.27. Cboltz (talk) 20:26, 3 March 2017 (UTC)
- Darn it. I was hoping it was something trivial. Thanks for the update, @Cboltz :-) FFS Talk 23:55, 3 March 2017 (UTC)
Just a friendly reminder who just upgrade MW to 1.28, elasticsearch.yml setting change is nesseary
[edit]So MW1.28, CirrusSearch will need elasticsearch2.x version, and starting from 2.x, elasticsearch is bind to 127.0.0.1 instead of 0.0.0.0 in 1.x version. If you failed to set the network.host: part you might experience search failed. Deletedaccount4567435 (talk) 18:49, 1 February 2017 (UTC)
Suggestion:Expose all useful file metadata to search engine
[edit]While there have been great advancements in the image metadata (e.g. filewidth:, filetype, etc) that has been exposed, it is still lacking some very useful metatada.
For example, one can't search for :
Video or audio of a certain "playtime"
{
"name": "playtime_seconds",
"value": 113.72532879819
},
Framecount, looped images, duration
{
"name": "frameCount",
"value": 16
},
{
"name": "looped",
"value": true
},
{
"name": "duration",
"value": 15
}
Frame rate, and creation date
"bandwidth": 204608, "framerate": 15
In some cases the location of the image may be very relevant (if stored in its exif data), and this is also stored in the metadata of some files.
The usecases are numerous, for instance for writing an article about world war one, one may want to filter images from that period. When looking for videos to add to a page one may want short animations to showcase the concept, e.g. a moving hurricane , and not be interested in very long videos. The same applies to animated images because in some cases they illustrate the concept better than others, and in some cases they don't, so it might be good to filter those either way.
Generally it might be good to evaluate what the API exposes, and to surface the most useful metadata. 197.218.80.182 (talk) 13:53, 2 February 2017 (UTC)
- It might also be a good idea to expose metadata from the commons metadata api, it includes interesting data such as :
- GPSLatitude - latitude
- GPSLongitude - longitude
- LicenseShortName - short human-readable license name
- LicenseUrl
- DateTimeOriginal
- Extension:CommonsMetadata 197.218.80.182 (talk) 14:41, 2 February 2017 (UTC)
- See Special:ApiSandbox 197.218.80.182 (talk) 14:41, 2 February 2017 (UTC)
- I created a task for your specific request. This might overlap with the Structured data on Commons project, but it might not! Hopefully we can get some of the engineers/managers to take look at it.
- (Bonus points for using a Disasterpiece clip in your example. I'm a big fan). :) CKoerner (WMF) (talk) 17:00, 2 February 2017 (UTC)
- We can certainly expand the amount of metadata included limitations of the supporting server we use (elasticsearch) prevents us from including all the arbitrary metadata that is possible. Thanks for bringing up a few specific pieces of metadata that are usefull. Anyone wishing to expand on the list of explicit metadata to be included on the ticket is welcome. EBernhardson (WMF) (talk) 17:10, 2 February 2017 (UTC)
- > limitations of the supporting server we use (elasticsearch) prevents us from including all the arbitrary metadata that is possible
- Sure, that's why it mentions "all useful". Adding the whole dump of metadata isn't useful, as some metadata regular readers wouldn't need even if exposed , and may mostly benefit editors.
- To give a bit more supporting evidence, there already tools waiting for its availability for years see:
- https://phabricator.wikimedia.org/T51662
- You can gauge its usefulness by evaluating these sites:
| Search parameter | flickr | Pixabay | Youtube | IA* | Cirrussearch | |
|---|---|---|---|---|---|---|
| Resolution | Yes | Yes | Yes | No | No | Yes |
| Category / tag | Yes | Yes | No | Yes | Yes | Yes |
| License | Yes | N / A | Yes | No | Yes | No |
| Location | No | No | No | No | Yes | No |
| Date taken (created) | Yes | No | No | No | Yes | No |
| Upload date | Yes | No | Yes | Yes | Yes | No |
| Color | Yes | Yes | Yes | No | No | No |
| Author | No | No | No | No | Yes | No |
| Uploader | No | No | No | No | Yes | No |
| File type | Yes | Yes | Yes | Yes | Yes | Yes |
| Orientation | Yes | Yes | No | No | No | No |
| Duration | No | No | No | Yes | Yes | No |
| Sort by upload date | Yes | No | No | Yes | Yes | No |
- IA* = Internet archive
- Sample rate might be useful for editors, as requested by the VisualEditor developers, but clearly, despite having more raw metadata wikis are behind the other popular sites, and license specifically is very important and missing from the search.
- Clearly internet archive is the winner hands down, despite probably having less resources than some of the other entities in that list. Its tooling is something to strive for. 197.218.81.64 (talk) 10:45, 3 February 2017 (UTC)
- This is hands-down the most well-formatted feature request I've seen. You even did a comparison between other sites and formatted that in a table. Bravo and thank you for taking the time to do so. CKoerner (WMF) (talk) 15:23, 3 February 2017 (UTC)
Required ElasticSearch version?
[edit]Extension:CirrusSearch states that "Elasticsearch 2.x is required" rather than 1.7. Does this mean version 2.x or higher, or exactly 2.x? That is, can Elasticsearch 5.x be used with CirrusSearch? Thanks. Maiden taiwan (talk) 15:57, 2 February 2017 (UTC)
- > That is, can Elasticsearch 5.x be used with CirrusSearch?
- Not yet.https://phabricator.wikimedia.org/T154501 197.218.80.182 (talk) 16:02, 2 February 2017 (UTC)
- Thank you! Maiden taiwan (talk) 16:19, 2 February 2017 (UTC)
Suggestion: Show media license in search results
[edit]Problems:
- As a reader I can't easily find out which license the files have
- As a media reuser I can't see licenses in search results so that I can find reusable files
- As an editor I can't easily track down files without a license so that I can delete or add licenses
Background: The search results include, words, size, and upload date. License for the content pages isn't really needed because most pages have the same license, however, media licenses are important for reusers.
Proposed solution: In each search result, alongside upload date add the license (maybe as an icon) fetched from the commons metadata api.
Note: this is not the same as the previous metadata related suggestion. This suggestion merely proposes the addition of a license label next to each file result. 197.218.81.64 (talk) 10:58, 3 February 2017 (UTC)
- Hi - I've added your suggestion to Phabricator for further investigation by the Search team. Thanks for writing this up! DTankersley (WMF) (talk) 07:16, 10 February 2017 (UTC)
Searching with partial word/ngram matching
[edit]Does CirrusSearch currently support partial word/ngram matching? So for example, the page name I am looking is PageName2017. When searching, I would like 'ame20' to match to this page as a suggestion. I dug into the AnalysisConfigBuilder.php file and it does not appear to support ngram tokenizing. I'm looking for some analyzer setting like this:
https://keyholesoftware.com/2015/11/02/anatomy-of-setting-up-an-elasticsearch-n-gram-word-analyzer/
If I just updated the analyzer settings in AnalysisConfigBuilder.php to include an ngram tokenizer and perhaps also a corresponding setting in MappingConfigBuilder.php, will partial word matching work in the search bar? Longphile (talk) 15:29, 4 February 2017 (UTC)
- We currently only use ngram tokenizing for the insource regex search, everything else is tokenized either by words or not at all (keywords). The use the custom trigram analyzer and is done by the SourceTextIndexField class. Something similar could be done to make a title field have a trigram index. This search query performed would also have to be adjusted to query this trigram field and weight it appropriately.
- You mention the search bar, and title suggestions. The title suggestions (in the top right corner of vector skin) are provided by the completion suggester if enabled. This is different from standard search and uses it's own special index. You would probably need to make sure this is disabled if you want to provide filtering/scoring based on title trigrams. EBernhardson (WMF) (talk) 18:13, 6 February 2017 (UTC)
Suggestion: Provide more intuitive messages when search fails or succeed
[edit]Problem : Some search results don't return more specific and intuitive messages when they fail or succeed
Background: For both simple searches and advanced searching, the error message is generic, making the user believe that the words may not exist at all when they are just typos
Examples:
| Example search | Expected | Actual |
|---|---|---|
| yrannosaurus | Did you mean "tyrannosaurus" | The page X does not exist |
| banana filetype:multimedia | "No files of type 'multimedia' were found." | There were no results matching the query. |
| monkey filetype:3d | Filetype X is not currently supported, see help for valid types (or try these Y, Z, ...) | There were no results matching the query. |
| ninja filesize:"30 MB" | Only integer values ( bytes) are supported. | Randomly sized file results |
| monkey filemime:3d | No files with filemime '3d' exist, see X help for valid types | There were no results matching the query. |
| Tarzan hastemplate:|jungle | Invalid character "|" used in template name | There were no results matching the query. |
Presumably this would also happen for malformed regex.
Proposed solutions:
- Try instead of make sure all special keywords validate their input and provide appropriate help when it is invalid.
- When the search successfully finds the results, show a message clearly indicating the filters applied, e.g. instead of " 50 results", something like "50 results containing files < 50 KB)
Even when the user uses it correctly, they can't be sure if the results contain the keyword, this is especially problematic for geoip keywords, since the coordinate of the page or file isn't shown in search results, and the articles or files may either be very close or very far (or completely irrelevant) from the gps coordinates. 197.218.83.132 (talk) 15:21, 6 February 2017 (UTC)
- Just a small clarification,the suggestion above isn't meant to ignore the simple search string, e.g. "ninja". So a clearer message might be, no search results found for filetype ' XXX' containing string "ninja" or something similar. 197.218.83.132 (talk) 15:28, 6 February 2017 (UTC)
- Thanks for the suggestions - I've created a phabricator ticket to be prioritized. DTankersley (WMF) (talk) 00:51, 3 March 2017 (UTC)
query question
[edit]I'm running Mediawiki 1.28, Elasticsearch 2.4.4, and CirrusSearch 0.2 (c23ae6a), according to Special:Version.
Say we have a number of pages whose titles that contain the words "intern", "internal", and "internet".
Is there a query that can find all of them?
If I try "intitle:intern", I get results with titles containing the word "intern", as expected. Next I tried "intitle:intern*" but that returned results containing the words "internal" and "internet" but not "intern". I was under the impression that the * wildcard matched zero or more letters but that does not seem to be the case.
Next I tried using a boolean OR to get all of them, but this query just returns zero results: "intitle:intern OR intitle:intern*" Clearly there's something I'm still missing. 2620:11E:1000:120:3EA9:F4FF:FE85:CA50 (talk) 21:05, 13 February 2017 (UTC)
- I'm surprised that intitle:intern* does not yield intern as a possible result. Testing locally it works as expected. Could you paste somewhere the output of the search page after you append &cirrusDumpQuery to the url of the address bar of your browser. It will dump the actual query sent to elasticsearch.
- A possible cause would be the limit we set on the number terms we allow the wildcard to expand. Internally intern* will be expanded to all possible words in your index that start with intern. To avoid explosions we limit this expansion to 1024. In other words if you have more that 1024 distinct words that start with intern it's possible that some of them will be missing in the end result.
- Concerning the use of OR: it's currently a limitation of Cirrus, it does not support combining search keywords inside a boolean expression. DCausse (WMF) (talk) 09:18, 14 February 2017 (UTC)
Suggestion: Provide image (or media) similarity search
[edit]Problems
- As a reader, I can't easily find very similar images to help understand a concept.
- As an editor I can't easily find duplicate images (e.g. rotated images, rescaled or grayscale versions) to delete, categorize or label.
- As a editor, I can't easily find a potentially better but similar images to reuse in an article .
- As a third party user of multimedia content , I can't find similar images to download and use in my projects.
- As an uploader, I can't easily determine if my images are transformed (e.g. rotated) duplicates of existing media before uploading.
Background
It is very cumbersome to verify if images are very similar or transformed (rotated, rescaled or grayscale) duplicates. This can be useful in many cases, for example, if someone searches for "passion fruit" there will be many results, yet after finding a fruit that looks like the one needed, one can't simply click to find similar images. This also means that even images that are visually identical (with slightly different pixels) to the naked eye may be reuploaded because the sha1 will be different.
Proposed solution
Implement a perceptual hashing algorithm to index similar images and provide the ability to find them using search results (and eventually in a distant future using an image itself ) similar to other search engines. There have been quite a few attempts using elasticsearch that may be feasible for this extension:
- Hacking Elasticsearch for Image Retrieval - https://www.linkedin.com/pulse/hacking-elasticsearch-image-retrieval-ashwin-saval?articleId=8828975402389796690- Very detailed implementation
- Scalable reverse image search built on Kubernetes and Elasticsearch - https://github.com/pavlovai/match
- Content Based Image Retrieval Plugin for Elasticsearch - https://github.com/kiwionly/elasticsearch-image
Quite a few research papers on it too:
http://www.ijarcce.com/upload/2016/march-16/IJARCCE%20247.pdf
http://www.cse.unsw.edu.au/~weiw/files/SSDBM13-HmSearch-Final.pdf
This can also be used for other media, but this is harder:
http://www.phash.org/audioscoutm.pdf
http://www.phash.org/ 197.218.88.65 (talk) 12:19, 1 March 2017 (UTC)
Installing Composer on wiki with shared Linux hosting w/ cPanel
[edit]Sorry if I may be asking this in the wrong place, but how do I install Composer/run commands like <wiki>composer install --no-dev</wiki> on shared Linux hosting with cPanel? Does it work with PuTTY? Also, is there other command stuff I need to run? 493Titanollante (talk) 01:31, 4 March 2017 (UTC)
Weird (?) use case - two domains, one DB. Can it be done with one index?
[edit]I have two domains pointing at the same DB - one is the regular site, the other is for kiosk stations and loads with a somewhat different configuration (different extensions, etc). Is there a way to setup CirrusSearch so that I don't have to maintain two indices? Having 2 indices will also mean I will have to manually refresh the kiosk one, because there will be no edits there (=no refresh). FFS Talk 15:30, 26 March 2017 (UTC)
- Depending on the version you use you may be able to tell cirrus to use a specific index by setting:
$wgCirrusSearchIndexBaseName = 'regularwiki';- For extra safety you may want to disable cirrus updates on this wiki:
$wgDisableSearchUpdate = true- Note that is configuration is not really supported and I cannot assure you that it'll work properly. Cirrus has some code to handle interwiki and may be confused thinking that the results it gets from this index are interwiki results. DCausse (WMF) (talk) 08:02, 27 March 2017 (UTC)
Suggestion: Support searching for external links
[edit]Problem
As a reader, I want to find articles that mention contain a specific link (e.g. a new story, a hoax or an untrustworthy site) to verify its validity.
As a editor, I want to find articles that mention a specific link and some keyword to eliminate spam or certain vandalism or hoaxes.
Background
Currently, cirrussearch allows searching for internal links, yet it doesn't make it possible to do this for external links. This means that one has to use a page such as Special:LinkSearch or complicated regex with "insource" that may not always find the link because they can be constructed by templates in hard to find ways, e.g. "{{{mainsite}}}.com/{{stringsub}}".
Proposed solution
A new search "keyword" or predicate that indexes external links, e.g.:
banana cures aids extlinksto:/*.hoaxysite.com/ -extlinksto:/*.hoaxysite.com/ 197.218.80.203 (talk) 10:00, 30 March 2017 (UTC)
- Linksearch also doesn't resolve International domain names(https://phabricator.wikimedia.org/T130482), so while these should be equivalent they aren't:
| Form 1 | Form 2 | Form 3 |
|---|---|---|
| xn--bcher-kva.ch | buecher.de | Bücher.de |
| this | http://www.sina.com.hk/news/article/20170330/5/45/49/各款時尚耳機送給愛音樂又好動的他-7164476.html |
- That makes things way worse. 197.218.80.203 (talk) 10:41, 30 March 2017 (UTC)
- This isn't impossible, but it would take a little time to get going. Basically we already have the external links in the search index, but they are not processed in a way that is useful for this type of search. If you had to guess what is the relative usefulness of searching tokenized full urls, vs say a suffix search on domains?
- By tokenized urls i mean we would break up http://www.sina.com.hk/news/article/20170330/5/45/49/各款時尚耳機送給愛音樂又好動的他-7164476.html into [www, sina, com, hk, news, article, 20170330, 5, 45, 49, 各款, 時尚, 耳機, 送給, 愛音樂, 又, 好動, 的, 他, 7164476, html] and allow matching individual pieces of the url. EBernhardson (WMF) (talk) 21:23, 30 March 2017 (UTC)
- Thank you for the suggestion. I've created a task to track the request.
- https://phabricator.wikimedia.org/T161863 CKoerner (WMF) (talk) 21:52, 30 March 2017 (UTC)
- > If you had to guess what is the relative usefulness of searching tokenized full urls, vs say a suffix search on domains?
- Tokens are likely to be far more sensible and useful and have use cases beyond simple validity checking or vandalism fighting, for example academics can use it to find links to specific resources the domain registered to one country. Right now one has to figure out the exact syntax of linksearch and it isn't all that intuitive.
- However, it would really depend on the syntax implemented and support at least wildcards if regex is not feasible due to performance issues or technical issues.
- >Thank you for the suggestion. I've created a task to track the request.
- You're welcome. 197.218.90.120 (talk) 22:37, 30 March 2017 (UTC)
- One other concrete usecase that I recently saw:
- https://en.wikipedia.org/w/index.php?title=Wikipedia%3AVillage_pump_%28technical%29&type=revision&diff=772883257&oldid=772883163This would have been trivial, e.g. : "extlinkto:unesdoc.unesco.org*232555epdf" OR "extlinkto:unesdoc.unesco.org*244676e.pdf"
"Free-content attribution" insource:http://unesdoc.unesco.org/images/0023/002325/232555e.pdf or http://unesdoc.unesco.org/images/0024/002446/244676e.pdf
- Or using regex magic :It is possible to get those using insource but it requires a lot of hoop jumping, and there is no guarantee that the link will not be in a template somewhere or be combined differently.
unesdoc\.unesco\.org.*(232555e|244676e)\.pdf
- The greatest benefit will be that it will be possible to select specific namespaces. Linksearch is just everything jumbled together, due to performance concerns presumably. 197.218.90.120 (talk) 23:16, 30 March 2017 (UTC)
Suggestion: Show results with category information
[edit]I would like to display the category of the article in the search results everytime.
At the moment the category only shows up if the category name is entered in the search query too.
Is there any possibility to change that ?
@CKoerner (WMF), @EBernhardson (WMF) Lanthanis (talk) 12:07, 6 April 2017 (UTC)
- You might be interested in the explore similar interface changes we are testing. It's not exactly what you are asking for, and it doesn't show all categories (there are often too many for a nice looking display), but it might help. Some feedback on the feature would also be useful as we evaluate it. EBernhardson (WMF) (talk) 17:56, 26 April 2017 (UTC)
Suggestion: Show geographically nearby items on search results
[edit]Problem
As a user, when searching I can't easily find "geographically" relevant search results based on my location.
As an editor, I can't easily find pages related to nearby places (or subjects) that I can improve.
Background
One of the problems with search results is that they often lack context. A tourist in germany may simply search for "museum" or "monument" expecting to see nearby (along with regular results) museums or monuments. While trying "museums in germany" may yield some relevant articles, a similar query for "museum timbuktu" currently yields "Museum of Atlanta" as a second result.
Cirrussearch does in fact expose search keywords like "neartitle", "nearcoord", and there is "special:nearby". However, these aren't intuitive as "special:nearby" can't be searched, they don't group nearby results separately, and it is not visually clear (e.g. using colors, or icons) which results are near a location. This means that reader may become confused about what results are relevant and what are just random articles.
Proposed solution
- Use the user's approximate coordinates (based on the IP) to surface points of interest / relevant results.
- Add a sidebar "box" that contains these possible articles of interest, e.g. for a user in germany searching for museum :
Search results Standard results Nearby * Museum 📌 Augsburg Puppet Theater museum * British Museum 📌 Kunstmuseum Bayreuth
197.218.89.243 (talk) 13:30, 10 April 2017 (UTC)
- We do have advanced query functionality that allows this kind of thing to work. What we don't have is any UI that exposes it. The best you could do is choose an article you know is close to you, for example adding 'neartitle:Istanbul' such as the query: https://en.wikipedia.org/w/index.php?search=museum+neartitle%3AIstanbul
- Adding some sort of UI for this certainly sounds useful. EBernhardson (WMF) (talk) 20:35, 11 April 2017 (UTC)
- Indeed, the biggest issue is that it doesn't show any visual cue at all. For example, it is currently impossible to know if the results sorted by relevance or if they sorted by proximity to the coordinate or title.
- It might be worth adding the coordinates to below each search item or at least the distance from the searched item to give the reader some clue about the results. 197.218.82.68 (talk) 20:56, 11 April 2017 (UTC)
Issue: Location based keywords (nearcoord, neartitle) don't allow whitespaces
[edit]Problem
As a reader, I expect nearcoord or neartitle to allow whitespaces (or to give a useful warning).
Steps to reproduce:
- Click the search box on english wikipedia
- Enter either 'nearcoord:37.77666667, -122.39' , OR 'neartitle:"100km , San Francisco'
- Press enter
Expected
- Some results close to the relevant coordinates ; OR
- A warning saying "A warning has occured while searching: The geo coordinates '37.77666667,' provided to 'nearcoord' could not be parsed because it contains white spaces".
Actual
- Unrelated search results
- A warning
- "A warning has occured while searching: The geo coordinates '37.77666667,' provided to 'nearcoord' could not be parsed."; OR
- "A warning has occured while searching: The geo distance '100km ' provided to 'neartitle' could not be parsed."
The problem seems to be that white spaces aren't allowed anywhere in the nearcoord, and for "neartitle", they don't seem to be allowed before the comma (","). 197.218.82.68 (talk) 21:05, 11 April 2017 (UTC)
Internal error [f40f800686e682b54ac8f46c] 2017-05-10 09:10:11: Erreur fatale de type « RuntimeException »
[edit]Hi everyone !
I have CirrusSearch 0.2 and Elastica 1.3.0.0 (see in Special:Version), in addition to ElasticSearch 2.3.0 installed with this guide https://www.elastic.co/guide/en/elasticsearch/reference/2.3/_installation.html (excepted the last command line)
But when I search for something with the searchtool it show me this error message :
<pre>Internal error : [f40f800686e682b54ac8f46c] 2017-05-10 09:10:11: Erreur fatale de type « RuntimeException »</pre>
Someone know how to fix it, or have a clue for me ? :3 192.44.63.161 (talk) 09:20, 10 May 2017 (UTC)
- Did you built the index exactly as the README documents it?
# $wgSearchType = 'CirrusSearch';$wgDisableSearchUpdate = true;- run
sudo -u wwwuser php ./extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --conf LocalSettings.php
- run
$wgDisableSearchUpdate = false;- run
sudo -u wwwuser php ./extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip --conf LocalSettings.php - run
sudo -u wwwuser php ./extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse --conf LocalSettings.php
- run
$wgSearchType = 'CirrusSearch';
- I had also “Internal error” messages, but once I indexed it as described in the README (also on re-indexing) the “Internal Error” was gone. Andreas P.
14:58, 31 July 2017 (UTC)
Suggestion: Show brief data about search term in sidebar (info box)
[edit]Issue
As a user, I find that search results do not contain enough information to know the right data was found (e.g. a search for purple doesn't immediately show some data about color), nor does it disambiguate properly, e.g. orange is a color, a fruit, and a company name.
Background
While the search results contain terms highlighted in snippets, this is a plain search that doesn't take into account the massive metadata that wikis have. For example, searching for "elephant" in a search engine such as google returns an information side box containing some data from wikipedia, unlike the default cirrussearch. However, even powerful search engines don't show such data for common terms ("vine", "apple"). Indeed, a search for apple (on google) gives results about a company (or other things on the first page), and not the fruit, which even by their standards is pretty unusual.
Proposed solutions
Show brief data about search term when it has an exact match to an existing article. This could be retrieved in one of many ways:
- Show subset of data from a structured data store (e.g. Wikidata)
- Only show results unrelated to living or dead people (on an initial version).
- Show data from an infobox within the wiki:
- Scrape data from page html (templates) that add "infobox" class, and index it separately
- Clean up the data to remove unnecessary html
- When search matches the article, show normal search results and subset of infobox data
- Make use of templatedata to obtain this information - templatedata contains the description of each field, its data, and its type. This makes it possible to show such data, gives editors incentive to add it, and improves discoverability and reliability (as editors will probably notice errors and possibly fix them).
- Scrape data from page html (templates) that add "infobox" class, and index it separately
Example
| Elephant (picture) |
|---|
| Animal |
| Elephants are large mammals ... |
| Average mass : 6,000 kg |
Notes : While may be controversial to show information from a repository such as wikidata there are undeniable facts about the world that aren't subject to change, e.g. orange is a color (from early ~1000 A.D to 2017 A.D). Having a infobox data as default (or fallback ) also provides an alternative means to address issues related to a centralized repository. 197.218.91.103 (talk) 10:56, 12 May 2017 (UTC)
- This would likely need the introduction of at least two new classes that editors can add to templates, and that cirrussearch can scrape, e.g.:
{| class="wikitable" ! colspan="2" |Elephant (picture) |- |class ="infobox-key" |type |class ="infobox-value" |Animal |- |class ="infobox-key" |description |class ="infobox-value" | Elephants are large mammals ... |- | class ="infobox-key" | Average mass | class ="infobox-value"| 6,000 kg |}
| Elephant (picture) | |
|---|---|
| type | Animal |
| description | Elephants are large mammals ... |
| Average mass | 6,000 kg |
- Or alternatively it just scrapes the the first column as the key and the second one as the value, although it will fail in some cases and that's where when the classes could be added. 197.218.82.227 (talk) 14:52, 12 May 2017 (UTC)
Does CirrusSearch support analyze search for Chinese?
[edit]For language without space between words (e.g. Chinese, Japanese...), using analyzer to tokenize text into individual words is vital for efficiency & accuracy.
I got elastic search plugin "elasticsearch-analysis-smartcn" installed. Reindexed pages. However, the search result still treat texts all together instead of individual Chinese words.
For example, users search 苹果安卓"Apple Android" for all Apple and Android related article, instead of searching for long word "appleandroid".
Does analyze search supported? If yes, then how should I config it to improve non-space language search result?
Thank everyone in advance for your help! Deletedaccount4567435 (talk) 23:48, 14 May 2017 (UTC)
Suggestion: Ability to search an arbitrary group of pages
[edit]Problem
As a user, I can't search within a specific group of pages not covered by categories or existing search keywords.
Background
While cirrussearch is very powerful and can search using many filters such as ("incategory" or "hastemplates") it lacks a way to search only a group of pages that don't otherwise have any direct relationship to each other. The use cases are numerous:
- Search within grouped results - A tourist may want to know about interesting locations in both lisbon and madrid (e.g. 'museums neartitle:"porto" neartitle:vigo' - yield no results). Currently this can't be done without deep knowledge of search syntax or doing it separately.
- Search within pages listed in maintenance reports - for example to add links to all orphan (Special:LonelyPages) pages related to Sydney.
- Embedded search - for example on Special:PagesWithBadges, one could collect listed pages and search within them.
Proposed solution
This could be implemented in many ways:
- A new search keyword for arbitrary pages : "inpages:" , e.g. "inpages:spain|porto|russia|vigo| ...." and / or "inpageids:1|23|4|6" . Page ids would allow it to handle a larger number of pages.
- A dedicated keyword for maintenance reports : "inquerypage:Lonelypages"
- A discrete API or URL string that takes these , e.g. https://www.mediawiki.org/w/index.php?search=Stargate&title=Special:Search&inpages=vigo%7Cparis
Notes:
May be relevant :https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html. 197.218.89.166 (talk) 13:57, 27 May 2017 (UTC)
- Some more use cases:
- Search within Special:Watchlist items (maybe find vandalism ("keyword" that one forgot to clean up )
- Search within Reading/Reading Lists (https://phabricator.wikimedia.org/T164990) - for useful research items. 197.218.89.166 (talk) 14:28, 27 May 2017 (UTC)
Is it possible to limit CirrusSearch with redis instead of the C poolcounterd?
[edit]The server side poolcounterd software were extremely poor documented. It can be build by dpkg-buildpackage under some version of Debian, but mostly not.
Is it possible to limit CirrusSearch by redis instead of the C poolcounterd?
error messages:
Failing Scenarios:
cucumber features/simulation.feature:2 # Scenario: Just readers
cucumber features/simulation.feature:12 # Scenario: Just search with per user locks
cucumber features/simulation.feature:17 # Scenario: Search and readers
77 scenarios (3 failed, 74 passed)
575 steps (3 failed, 572 passed) Deletedaccount4567435 (talk) 04:07, 14 June 2017 (UTC)
- There is a PoolCounterRedis implementation. The documentation suggests the following as an example of configuring redis:
$wgPoolCounterConf = [ 'ArticleView' => ['class' => 'PoolCounterRedis','timeout' => 15, // wait timeout in seconds'workers' => 1, // maximum number of active threads in each pool'maxqueue' => 5, // maximum number of total threads in each pool'servers' => [ '127.0.0.1' ],'redisConfig' => []] ];- The list of pool counters used by CirrusSearch:
- CirrusSearch-Search
- CirrusSearch-Prefix
- CirrusSearch-NamespaceLookup
- CirrusSearch-Completion
- CirrusSearch-Regex EBernhardson (WMF) (talk) 18:00, 6 July 2017 (UTC)
Job queue issue
[edit]Since upgrading to 1.28.2 I very occasionally get problems with the job queue. I'm posting here because this always relates to CirrusSearch. Looking at the table entry, it seems the job_params is incomplete and information that should be contained within the blob has overrun into job_timestamp and other fields. This happened most recently when editing MediaWiki:Common.css. It results in an error message on every page until I delete the relevant entry in the job table.
Any ideas? Prh47bridge (talk) 09:32, 19 June 2017 (UTC)
- Given the lack of response and the fact that the problem is clearly that the software is trying to store too much data in job_params, I have modified the table so that job_params is a longblob. However, I don't believe that is the real solution. Is this a defect that has been fixed? Or am I missing something? Prh47bridge (talk) 23:22, 22 June 2017 (UTC)
- It has not been fixed yet. For now the workaround is to avoid the DB job queue. If working in an advanced environment with elasticsearch available we also require redis to be used for the job queue. EBernhardson (WMF) (talk) 17:54, 6 July 2017 (UTC)
- Thanks for this information. Prh47bridge (talk) 08:12, 24 July 2017 (UTC)
CirrusSearch failure after re-sync old database
[edit]I am upgrading our Wiki platform. To capture updates since the initial install, I imported the backup from the old server and ran the maintenance on the DB and on the extensions.
However I now get the following problem. I've tried updating the indexes in the CirrusSearch maintenance directory. Old wiki was version 1.20.2 and it seemed to work correctly after initial update to 1.28. It seems to have an index of page titles available, but the "full text" search is not working at all - the search suggestions that come up DO work.
Internal error [c3a5bc4bc788b6ce0a4ef27c] /index.php?search=middleware&title=Special%3ASearch&go=Go RuntimeException from line 113 of /var/www/html/mediawiki/extensions/CirrusSearch/includes/ElasticsearchIntermediary.php: No search request was made
Backtrace: ...
#0 /var/www/html/mediawiki/extensions/CirrusSearch/includes/Hooks.php(807): CirrusSearch\ElasticsearchIntermediary::setResultPages(array)
#1 [internal function]: CirrusSearch\Hooks::onSpecialSearchResults(string, SqlSearchResultSet, SqlSearchResultSet)
#2 /var/www/html/mediawiki/includes/Hooks.php(195): call_user_func_array(string, array)
#3 /var/www/html/mediawiki/includes/specials/SpecialSearch.php(386): Hooks::run(string, array)
#4 /var/www/html/mediawiki/includes/specials/SpecialSearch.php(242): SpecialSearch->showResults(string)
#5 /var/www/html/mediawiki/includes/specials/SpecialSearch.php(152): SpecialSearch->goResult(string)
#6 /var/www/html/mediawiki/includes/specialpage/SpecialPage.php(522): SpecialSearch->execute(NULL)
#7 /var/www/html/mediawiki/includes/specialpage/SpecialPageFactory.php(577): SpecialPage->run(NULL)
#8 /var/www/html/mediawiki/includes/MediaWiki.php(283): SpecialPageFactory::executePath(Title, RequestContext)
#9 /var/www/html/mediawiki/includes/MediaWiki.php(851): MediaWiki->performRequest()
#10 /var/www/html/mediawiki/includes/MediaWiki.php(512): MediaWiki->main()
#11 /var/www/html/mediawiki/index.php(43): MediaWiki->run()
#12 {main} Chakaal (talk) 18:23, 23 June 2017 (UTC)
- This looks to be an inconsistency between the version of CirrusSearch and the version of MediaWiki core. Could you link to the exact versions being used? EBernhardson (WMF) (talk) 17:53, 6 July 2017 (UTC)
- Same thing happened to me AGAIN. I am using MW & Extension:CirrusSearch & Extension:Elastica, ALL from "git checkout origin/REL1_29".
- " RuntimeException from line 113 No search request was made"
- So I tried the tar ball version from "Special:ExtensionDistributor/CirrusSearch" for CirrusSearch & Elastica, which did not work for MW 1.28 last year.
- and NO
- " RuntimeException from line 113 No search request was made"
- Same error. Deletedaccount4567435 (talk) 04:23, 11 November 2017 (UTC)
Error running updateSearchIndexConfig.php
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
content index...
Fetching Elasticsearch version...2.4.5...ok
Scanning available plugins...none
Inferring index identifier...cbwiki-26jun2017_content_first
Picking analyzer...english
Validating number of shards...ok
Validating replica range...ok
Validating shard allocation settings...done
Validating max shards per node...ok
Validating analyzers...ok
Validating mappings...
Validating mapping...different...corrected
Validating cache warmers...
Updating Main Page...done
Validating aliases...
Validating cbwiki-26jun2017_content alias...ok
Validating cbwiki-26jun2017 alias...ok
Updating tracking indexes...
Unexpected Elasticsearch failure.
Elasticsearch failed in an unexpected way. This is always a bug in CirrusSearch.
Error type: Elastica\Exception\Bulk\ResponseException
Message: unknown: Error in one or more bulk request actions:
index: /mw_cirrus_metastore/version/cbwiki-26jun2017_content caused [mw_cirrus_metastore][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [mw_cirrus_metastore] containing [1] requests]
index: /mw_cirrus_metastore/version/cbwiki-26jun2017_general caused [mw_cirrus_metastore][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [mw_cirrus_metastore] containing [1] requests]
Trace:
#0 /var/www/html/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Bulk.php(360): Elastica\Bulk->_processResponse(Object(Elastica\Response))
#1 /var/www/html/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Client.php(320): Elastica\Bulk->send()
#2 /var/www/html/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Index.php(140): Elastica\Client->addDocuments(Array)
#3 /var/www/html/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Type.php(199): Elastica\Index->addDocuments(Array)
#4 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/metastore.php(148): Elastica\Type->addDocuments(Array)
#5 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/metastore.php(91): CirrusSearch\Maintenance\Metastore->updateIndexVersion('cbwiki-26jun201...')
#6 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(299): CirrusSearch\Maintenance\Metastore->execute()
#7 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(267): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->updateVersions()
#8 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(58): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->execute()
#9 /var/www/html/mediawiki/maintenance/doMaintenance.php(111): CirrusSearch\Maintenance\UpdateSearchIndexConfig->execute()
#10 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(65): require_once('/var/www/html/m...')
#11 {main}The message says "this is always a bug in CirrusSearch" so I'm posting it here.
My setup:
| Product | Version |
|---|---|
| MediaWiki | 1.28.2 |
| PHP | 5.5.9-1ubuntu4.21 (apache2handler) |
| MySQL | 5.5.55-0ubuntu0.14.04.1 |
| Elasticsearch | 2.4.5 |
| CirrusSearch | 0.2 (c23ae6a)16:21, October 25, 2016 |
| Elastica | 1.3.0.0 (0959e38)19:02, October 24, 2016 |
Keeping files out of default search
[edit]With MediaWiki 1.28, images (from the file namespace) show up in search results by default. According to CirrusSearch documentation, prefacing a query with a colon causes only the main namespace to appear in the results.
Is there a way to do this type of search by default without having end users need to type the colon? Benhinc (talk) 17:04, 28 June 2017 (UTC)
- I believe this is configured via Manual:$wgNamespacesToBeSearchedDefault. The odd thing though is that the default value for this is to only search NS_MAIN, and not files. EBernhardson (WMF) (talk) 17:51, 6 July 2017 (UTC)
- I found this too and you're absolutely right — the default, which is NS_MAIN, also includes files. What I need is a $wgNamespacesToNotBeSearchedDefault. This is frustrating. Benhinc (talk) 15:41, 10 July 2017 (UTC)
- Hello,
- are you running CirrusSearch with MediaWiki 1.28? Could you please share which version of ElasticSearch you are running? I have tried 5.5 and 2.4.5 and neither of them seems to work.
- thank you a milion
- Karel Karelzav (talk) 09:45, 20 July 2017 (UTC)
- No problem — here's my current setup:
| Product | Version |
|---|---|
| MediaWiki | 1.28.2 |
| PHP | 5.5.9-1ubuntu4.21 (apache2handler) |
| MySQL | 5.5.55-0ubuntu0.14.04.1 |
| Elasticsearch | 2.4.5 |
Benhinc (talk) 14:16, 7 August 2017 (UTC)
Suggestion: Allow user to define max number of Interwiki search results
[edit]I have a help desk environment where we have two MediaWiki instances (one private and one public; MediaWiki 1.28.2, CirrusSearch 0.2, Elastica 1.3.0.0, Elasticsearch 1.7.6, and Interwiki 3.1). We have Interwiki set up on the private instance to also search the public instance, so that my help desk can search in one place and not have to search both.
When my users search from the private instance they get the local results in the main body and a sidebar with the remote results (as expected)... but only the top [up to] 5 results from the remote instance show up. It appears as if the results are limited by the const MAX_RESULTS = 5; on line 32 of .../includes/InterwikiSearcher.php. Would it be possible to change that to be a user-defined variable (for example, $wgCirrusSearchInterwikiMaxResults) instead of a constant? Josh Simon (talk) 17:37, 7 July 2017 (UTC)
- Yes it will be available in the upcoming MW release 1.30. `$wgCirrusSearchNumCrossProjectSearchResults` will allow you to set the number of results. DCausse (WMF) (talk) 08:25, 10 July 2017 (UTC)
- Thanks; that's exactly what we need! Josh Simon (talk) 14:32, 10 July 2017 (UTC)
Why I got Error message from Special:Version page after adding (require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";) in the LocalSettings.php??
[edit]Hello, I've tried to install CirrusSearch but I got error messages.
I've installed Java and proper version of Mediawiki(1.28) and Elastic Search(2.4.5).
Also installed elastica. I can check this at the Special:Version page.
But after this, I got Error message from Special:Version page after adding (require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";) in the LocalSettings.php
How can I deal with??
thx,
Please understand My low level English. I'm Korean. 111.91.137.34 (talk) 05:55, 10 July 2017 (UTC)
- Oops.. Sorry I forgot to unzip the cirrus.tar file.
- Have a nice day. 111.91.137.34 (talk) 07:34, 10 July 2017 (UTC)
- Ah, also I changed Elastic version 2.4.5 to 2.3.3 111.91.137.34 (talk) 07:35, 10 July 2017 (UTC)
Search for mediawiki 1.27
[edit]Versions Installed:
Mediawiki 1.27.3
PHP 5.6.30 with CURL
MSSQL 10.5.2550
Elastic search 2.3.3
Cirrus Search 0.2
We are trying to install cirrus search with the above versions, on the Cirrus page the dependencies only list 1.28 and 1.29 but we need to run the 1.27 LTS version, where can I find instructions to cover that version, if supported?
Using the instructions as best I can for the versions listed, I am receiving the following error message, when trying to run the update search index config script:
Content index…
Fetching Elasticsearch version… 2.3.3… Not supported!
Only Elasticsearch 1.x is supported. Your version is 2.3.3
I have set the following variables in localsettings.php:
require_once( "$IP/extensions/Elastica/Elastica.php" );
require_once( "$IP/extensions/CirrusSearch/CirrusSearch.php" );
$wgDisableSearchUpdate = true;
$wgCirrusSearchServers = array( '127.0.0.1'); The local server is where elastic search service is running.
Can anyone point me in the right direction, please? Jhuff05 (talk) 17:38, 11 July 2017 (UTC)
- Have you tried installing elasticsearch 1.7.6 instead of 2.3.3? I think that is the easiest way to get cirrus working with 1.27.3. DCausse (WMF) (talk) 08:27, 12 July 2017 (UTC)
- Yes, I have now installed 1.7.3, It appears to be working, (indexing appears to be running very slowly) however, We need to show support and version 1.x reached end of life in Jan '17 Support is the same reason we need to run 1.27 of the core wiki code... Jhuff05 (talk) 14:13, 13 July 2017 (UTC)
- Yes it's sad but MW cannot ensure that all its dependencies (esp. dependencies from an extension) are not EOL... DCausse (WMF) (talk) 14:50, 13 July 2017 (UTC)
Disable expanding templates for search
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Currently the system is expanding/including the contents of a template by default. Is there a way to disable this? Or more preferably disable this for specific templates? We have various templates that we use for navigation purposes so it would be great if we could disable this for those templates. Facerafter (talk) 12:00, 12 July 2017 (UTC)
- If you want to exclude content, like say in a nav template, from appearing in the search index you can add a CSS class to that element. See Help:CirrusSearch#Exclude content from the search index. Hope that helps! CKoerner (WMF) (talk) 13:59, 12 July 2017 (UTC)
Can several wikis use one elasticsearch service?
[edit]Is it possible to use one elasticsearch service for wiki family the same as they can use one database with prefix table names? Pastakhov (talk) 06:07, 13 July 2017 (UTC)
- Absolutely, as long as you use the same mediawiki version for all your wikis a single elasticsearch cluster can be shared for multiple wikis. DCausse (WMF) (talk) 07:43, 13 July 2017 (UTC)
incorrect Elasticsearch version in README
[edit]The readme for the REL1_29 branch says that Elasticsearch 2.x will work and yet this page says 5.x+ is required. Which is correct? MacFan4000 (talk) 12:10, 17 July 2017 (UTC)
- The page is correct, it's a problem in the README file. DCausse (WMF) (talk) 14:46, 17 July 2017 (UTC)
- The README was partially updated, that is correct, but if you scroll down there is a lot of old stuff that is obsolete or outdated... It would be highly appreciated if someone with knowledge cloud update the whole file.
- e.g: suff about "Production suggestions"
- ----------------------
- Elasticsearch
- All the general rules for making Elasticsearch production ready apply here. So you don't have to go
- round them up below is a list. Some of these steps are obvious, others will take some research.
- ** NOTE: this list was written for 0.90 so it may not work well for 1.0. It'll be revised when I have
- more experience with 1.0.
- ...
- ... SmartK (talk) 14:10, 16 August 2017 (UTC)
Use ElasticSearch with MediaWiki v. 1.28
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Hello,
is anyone running CirrusSearch/ElasticSeatch with MediaWiki 1.28? Could you please share which version of ElasticSearch you are running? I have tried ElasticSearch 5.5 and 2.4.5 and neither of them seems to work.
When I search, I receive following error:
No search request was made
Backtrace:
#0 /var/www/wiki/extensions/CirrusSearch/includes/Hooks.php(807): CirrusSearch\ElasticsearchIntermediary::setResultPages(array)
...
I can share full call stack backtrace if you want, but I read in this discussion that this error pops up when using wrong version of ElasticSearch.
thank you a milion
Karel Karelzav (talk) 09:50, 20 July 2017 (UTC)
- The elasticsearch version will be checked by the maintenance scripts so in general you should catch version issues ealiers when running updateSearchIndexConfig.php or forceSearchIndex.php, have you run these scripts? DCausse (WMF) (talk) 11:05, 20 July 2017 (UTC)
- No, I did not run these scripts, it was not mentioned anywhere in the installation manual that I should. I will give it a try. Thank you for your help! Karelzav (talk) 11:23, 20 July 2017 (UTC)
- ok, so I ran both scripts:
- php updateSearchIndexConfig.php
- PHP Notice: Undefined index: REQUEST_URI in /var/www/wiki/LocalSettings.php on line 3
- content index...
- Fetching Elasticsearch version...2.4.5...ok
- Scanning available plugins...none
- Inferring index identifier...wiki_content_first
- Picking analyzer...english
- Creating index...ok
- Validating number of shards...ok
- Validating replica range...ok
- Validating shard allocation settings...done
- Validating max shards per node...ok
- Validating analyzers...ok
- Validating mappings...
- Validating mapping...different...corrected
- Validating cache warmers...
- Updating Main Page...done
- Validating aliases...
- Validating wiki_content alias...alias is free...corrected
- Validating wiki alias...alias not already assigned to this index...corrected
- mw_cirrus_metastore missing creating.
- Creating metastore index... mw_cirrus_metastore_first ok
- Index is red retrying...
- Green!
- Creating mw_cirrus_metastore alias to mw_cirrus_metastore_first.
- Updating tracking indexes...done
- general index...
- Fetching Elasticsearch version...2.4.5...ok
- Scanning available plugins...none
- Inferring index identifier...wiki_general_first
- Picking analyzer...english
- Creating index...ok
- Validating number of shards...ok
- Validating replica range...ok
- Validating shard allocation settings...done
- Validating max shards per node...ok
- Validating analyzers...ok
- Validating mappings...
- Validating mapping...different...corrected
- Validating cache warmers...
- Validating aliases...
- Validating wiki_general alias...alias is free...corrected
- Validating wiki alias...alias not already assigned to this index...corrected
- Updating tracking indexes...done
- Deleting namespaces...done
- Indexing namespaces...done
- php forceSearchIndex.php
- PHP Notice: Undefined index: REQUEST_URI in /var/www/wiki/LocalSettings.php on line 3
- [ wiki] Indexed 10 pages ending at 10 at 8/second
- [ wiki] Indexed 8 pages ending at 21 at 9/second
- [ wiki] Indexed 10 pages ending at 33 at 6/second
- [ wiki] Indexed 10 pages ending at 43 at 6/second
- [ wiki] Indexed 10 pages ending at 53 at 6/second
- [ wiki] Indexed 10 pages ending at 63 at 5/second
- [ wiki] Indexed 10 pages ending at 83 at 6/second
- [ wiki] Indexed 9 pages ending at 95 at 7/second
- [ wiki] Indexed 10 pages ending at 106 at 7/second
- [ wiki] Indexed 9 pages ending at 120 at 8/second
- [ wiki] Indexed 7 pages ending at 128 at 8/second
- Indexed a total of 103 pages at 8/second Karelzav (talk) 11:28, 20 July 2017 (UTC)
- If I recall correctly the README file should mention that these scripts are needed after installing.
- The maintenance scripts seem to have run properly, did that fix your issue or do you still get errors when searching?
- If so you should probably check the elasticsearch logs for any errors and post them here (located in /var/log/elasticsearch/) DCausse (WMF) (talk) 11:58, 20 July 2017 (UTC)
- There is nothing at /var/log/elasticsearch
- I found some logs in /usr/local/bin/elasticsearch-2.4.5 (that is where I installed elasticsearch):
- elasticsearch_deprecation.log, elasticsearch_index_indexing_slowlog.log and elasticsearch_index_search_slowlog.log are empty files
- In elasticsearch.log I can see log of what was shown in terminal after starting ElasticSearch, but no error messages.
- I actually think that ElasticSearch is working fine, it seems like it is MW extension or MW itself that is failing. Karelzav (talk) 12:16, 20 July 2017 (UTC)
- ok, stupid me. Problem was I had forgotten to add line $wgSearchType = 'CirrusSearch'; to LocalSettings.php.
- Thank you very much for your help! It is very appreciated. Karelzav (talk) 14:24, 20 July 2017 (UTC)
cirrusSearchElasticaWrite jobs blocking runJobs
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
- I have been getting the following errors,
Notice: unserialize(): Error at offset 62601 of 65535 bytes in /path/to/wiki//includes/jobqueue/JobQueueDB.php on line 803
Fatal error: Unsupported operand types in /path/to/wiki/extensions/CirrusSearch/includes/Job/ElasticaWrite.php on line 44
- The error is blocking the overall execution of all jobs. Looking into the database, the <prefix>_job table shows the jobs which are somehow related to this error. For all these jobs the job_cmd is
cirrusSearchElasticaWrite.
- I have tried with all relevant combinations of versions i.e. Elastica REL1_28, REL1_29, master - CirrusSearch REL1_27, REL1_28, REL1_29 on MediaWiki core versions REL1_27, REL1_28, REL1_29.
- I'm not completely sure but these failed jobs might have started when I had a production Wiki and a backup Wiki (running on a database with a different name but being a duplicate of the production Wiki's database) connect with Elasticsearch (on an external server). On trying to run
php updateSearchIndexConfig.php --startOveron the backup Wiki, I received the error:
Looks like the index has more than one identifier. You should delete all but the one of them currently active. Here is the list...
- I think after deleting the additional identifiers, is when I started receiving the errors for
cirrusSearchElasticaWritejobs. Could this be the actual reason? - Manually deleting the
cirrusSearchElasticaWritejobs from the database do resolve the jobs block, so I would like to know whether this deletion is going to cause some significant bug, or should it be safe to simply delete them? AhmadF.Cheema (talk) 07:41, 24 July 2017 (UTC)
- cirrusSearchElasticaWrite contains all the failed jobs, they will be retried 3 times.
- It's perfectly fine to delete the jobs, you can rebuild your index from scratch using the mateinance scripts (updateSearchIndexConfig and forceSearchIndex).
- I'd suggest running REL1_29 with elastic 5.3.2. DCausse (WMF) (talk) 08:12, 24 July 2017 (UTC)
PHP CLI stopped working
[edit]MediaWiki 1.28.2
PHP 7.1.7 (cgi-fcgi)
MySQL 5.7.19-log
Elasticsearch 2.4.5
Running updateSearchIndexConfig.php:
Fetching Elasticsearch version...2.4.5...ok
Scanning available plugins...none
Inferring index identifier...wikidb_content_first
Picking analyzer...german
Validating number of shards...ok
Validating replica range...ok
Validating shard allocation settings...done
Validating max shards per node...ok
Validating analyzers...ok
Validating mappings...
Validating mapping...different...corrected
Validating cache warmers...
Updating Hauptseite...done
Validating aliases...
Validating wikidb_content alias...ok
Validating wikidb alias...ok
Updating tracking indexes...done
general index...
Fetching Elasticsearch version...2.4.5...ok
Scanning available plugins...none
Inferring index identifier...wikidb_general_first
Picking analyzer...german
Validating number of shards...ok
Validating replica range...ok
Validating shard allocation settings...done
Validating max shards per node...ok
Validating analyzers...ok
Validating mappings...
Validating mapping...different...corrected
Validating cache warmers...
Validating aliases...
Validating wikidb_general alias...ok
Validating wikidb alias...ok
Updating tracking indexes...done
Deleting namespaces...done
Indexing namespaces...done
Looks fine ... but running php forceSearchIndex.php --skipLinks --indexOnSkip results in php CLI failing. Lanthanis (talk) 10:23, 24 July 2017 (UTC)
- Is there any error you could add to your message that would help to debug your issue? DCausse (WMF) (talk) 12:18, 24 July 2017 (UTC)
- I'm sorry but no there are no further information.
- I started Elasticsearch in console but it only shows me a disconnect of a client (me running the
forceSearchIndex.php). - There are no debug messages in my php debug log ...
- Is there another possibility to see debug information? Lanthanis (talk) 12:28, 24 July 2017 (UTC)
- Okay, the PHP extension
php_wincache.dllcaused this error. - I commented it out in my
php.iniand now I executed these php files successfully! - Wincache 2.0 for IIS caused this. I looked into the eventlog. Thank you for your help! Lanthanis (talk) 12:50, 24 July 2017 (UTC)
- I had the same results. Problem is that disabling wincache makes the site slow down on my setup, so I disabled it for the force index script and then re-enabled it afterwards. TespSam (talk) 17:59, 16 February 2018 (UTC)
- Sure, thought that was clear :) Lanthanis (talk) 18:03, 16 February 2018 (UTC)
- Yeah, what's not clear is if it breaks anything else down the line. Have you had any problems in the 7 months since? All the other bits are working fine? TespSam (talk) 18:24, 16 February 2018 (UTC)
- I didn't have any problems afterwards. Lanthanis (talk) 13:10, 19 February 2018 (UTC)
Suggestion: Allow searching for articles that include or exclude heading (section) titles
[edit]Use case:
As a reader, I'd want to search heading titles so that I can find relevant content.
Background
Searching for this provides a lot more useful information, for example, searching for "Brazil tribe history" will find too many unrelated articles that obscure results within a section talk about the history of the tribes.
Compare brazil+tribe+insource%3A%2F%3D%3D%5Cs%2AHistory%5Cs%2A%3D%3D%2Fvs search=brazil+tribe+history.
A bit of searching revealed that this has a lot of use-cases:
- For investigative research : https://phabricator.wikimedia.org/T123096
- For a Faq like search - https://phabricator.wikimedia.org/T152715or in Extension:Flow , or talk pages
- For other research: https://phabricator.wikimedia.org/T171007
- Widget to suggest sections for a new page - if the article title matches an existing heading on a different article, it could be used to suggest all sub-headings related to it as new headings
- Finding articles without any sections - to expand them, e.g. to add references which normally go under a section
Proposed solution
New search keyword ("inheading"):
| Action | default | Regex | mutiple sections |
|---|---|---|---|
| find text | inheading:"word" | inheading:/word/ | "inheading:word1" "inheading:word2" |
| exclude sections | -inheading:word | -inheading:/word/ | -inheading:word1 -inheading:word2 |
Using insource partly works, but only for articles that don't transclude sections from elsewhere, since it analyses the wikitext not the resultant html. Although this is mostly a gamble because an article can show that as plain text rather than a heading. 197.218.88.73 (talk) 16:58, 25 July 2017 (UTC)
Integrate mapper-attachment-plugin to Extension:CirrusSearch?
[edit](it’s related also to Topic:Search inside uploaded documents)
I’m running MW 1.28.2, Extension:CirrusSearch REL1_28, elasticsearch 2.4.5 and I’m experimenting to integrate plugin mapper-attachments to read all kinds of OFFICE file_media_type. Right now I find them by querying elasticsearch, but I’m unable to find the results in the wiki search. My guess is, it comes all down to the proper mapping which is tricky to achieve rightly.
- Is it possible to use copy_to some how or plug into an existing search filter like “insource:” or do I have to write a SearchResult class? (I don't have access to hook CirrusSearchAddQueryFeatures MW1.29+)
- how can I direct CirrusSearch to read also from my custom
file_attachmentor the sub fieldfile_attachment.content? - Can anybody direct me into the right direction?
Thank you.
So far I managed to index file_media_type OFFICE to use the elasticsearch plugin and by using the CirrusSearch hooks, but the data are not found by CirrusSearch only in Elasticsearch:
$wgHooks['CirrusSearchMappingConfig'][] = function ( array &$config, $mappingConfigBuilder ) {
foreach ($config['page']['properties'] as $key => &$PAGE_PROPERTIES) {
if ($key == 'file_text') {
/* https://stackoverflow.com/questions/36618549/is-it-possible-to-get-contents-of-copy-to-field-in-elasticsearch */
$PAGE_PROPERTIES['store'] = true; /* add store=1 to defaults, no effect with copy_to */
}
}
// plug in mapper-attachment
$config['page']['properties']['file_attachment'] = [
'type' => 'attachment',
"fields" => [
"content" => [
"type" => "string",
"copy_to" => ["all", "file_text"], /* no effect with copy_to */
"analyzer" => "text",
"search_analyzer" => "text_search",
]
]
];
};
$wgHooks['CirrusSearchBuildDocumentParse'][] = function (
\Elastica\Document $Doc,
Title $ThisTitle,
Content $PageContent,
ParserOutput $ParserOutput ) {
global $wgTmpDirectory;
$log_content= "\nDEBUG \$Doc:\n";
$ThisLocalFile=wfFindFile($ThisTitle);
$localFilePath = $ThisLocalFile instanceof File ? $ThisLocalFile->getLocalRefPath() : null;
if ($Doc->namespace == NS_FILE
&& $Doc->has('file_media_type')
) {
if (preg_match("@OFFICE@i", $Doc->get('file_media_type'))) {
$Doc->set('file_attachment', base64_encode( file_get_contents($localFilePath) ) ) ;
$log_content.= "\nDEBUG did set file_attachment\n";
} else {
$log_content.= "\nDEBUG file_media_type: {$Doc->file_media_type}\n";
}
}
if ($Doc->namespace == NS_FILE) {
$log_content.= "\nDEBUG \$ThisTitle:\n";
$log_content.= var_export( $ThisTitle, true);
$log_content.= "\nDEBUG \$ThisLocalFile:\n";
$log_content.= $ThisLocalFile instanceof File ? $ThisLocalFile->getLocalRefPath() : var_export( $ThisLocalFile, true);
$log_content.= var_export( $Doc, true);
file_put_contents($wgTmpDirectory . "/CirrusSearchBuildDocumentParse.log", $log_content, FILE_APPEND );
}
return true;
};
require_once "$IP/extensions/Elastica/Elastica.php";
require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";
Andreas P.
16:08, 31 July 2017 (UTC)
- using copy_to to file_text & all sounds like a good solution to me, at least a solution that should involve fewer modifications. The only thing I see that will be missing is the highlighting config to include your new field file_attachment.content. In short you'll be able to search for docs but you won't see any text snippets.
- At a glance I don't see why it fails, do you run updateSearchIndexConfig everytime you change the hook to update the mapping?
- Note that you can append
&cirrusDumpQueryto a search results page to see the json query that will be sent to elastic, it could help to debug. - I don't see why you force store to true? It should not be needed.
- Glad to see someone working on this, good luck! DCausse (WMF) (talk) 16:47, 31 July 2017 (UTC)
- Yes I did run updateSearchIndexConfig every time I changed mappings, but I also saw, that some changes did not appear in
curl -XGET 'http://localhost:9200/_all/_mapping'. Often I deleted the whole index and mapping, to see if it was working. - And … well …, guess what:
copy_tofunctionality is removed for sub fields of typeattachmentin version 2.4 (see Mapping changes - Elasticsearch Reference 2.4), which is the version I have to use. In former versions this would work. See also discussion on https://github.com/elastic/elasticsearch/issues/14946. - So, the question now is:
- How can I direct extension:CirrusSearch to search also my
file_attachment.content? - Or is there another way to hook in for REL1_28? Andreas P.
09:44, 1 August 2017 (UTC)
- How can I direct extension:CirrusSearch to search also my
- Damn, it's a shame that copy_to is broken, the doc states the opposite...
- Without copy_to I'm afraid you'll have to make more profound changes to CirrusSearch and hooks won't be sufficient.
- I'd suggest to patch cirrus instead:
- 1. Config:
- - Add a new wgCirrusSearchUseAttachmentPlugin config var to indicate that the plugin is installed
- 2. Mapping:
- - Tweak getDefaultFields() in includes/Maintenance/MappingConfigBuilder.php to add your mapping (guarded by your new wgCirrusSearchUseAttachmentPlugin var)
- - If you can: try to add a subfield named plain to the content subfield with analyzers plain and plain_search
- 3. Indexing:
- Update buildDocumentsForPages in includes/Updater.php to add the code you've added as the CirrusSearchBuildDocumentParse hook
- Same here, guard you new code with the wgCirrusSearchUseAttachmentPlugin config var and
namespace == NS_FILE. - 4. Search:
- - Tweak buildFullTextSearchFields in includes/Query/FullTextQueryStringQueryBuilder.php :
- - change
return [ "all${fieldSuffix}^${weight}" ];- to
if ($context->getConfig()->get( 'CirrusSearchUseAttachmentPlugin' ) && (!$namespaces || in_array( NS_FILE, $namespaces ))) { return [ "all${fieldSuffix}^${weight}", "file_attachment.content${fieldSuffix} ]; } else { return [ "all${fieldSuffix}^${weight}" ]; }- But also add your new field in case the all field is not in use (in the same function):
- Change
if ( !$namespaces || in_array( NS_FILE, $namespaces ) ) { $fileTextWeight = $weight * $searchWeights[ 'file_text' ]; $fields[] = "file_text${fieldSuffix}^${fileTextWeight}"; }- to
if ( !$namespaces || in_array( NS_FILE, $namespaces ) ) { $fileTextWeight = $weight * $searchWeights[ 'file_text' ]; $fields[] = "file_text${fieldSuffix}^${fileTextWeight}"; if ($context->getConfig()->get( 'CirrusSearchUseAttachmentPlugin' ) { $fields[] = "file_attachement.content${fieldSuffix}^${fileTextWeight}"; } }- If you want you can adapt FullTextSimpleMatchQueryBuilder (not enabled by default)
- 5. Highlighting
- Update FullTextResultsType#getHighlightingConfiguration in includes/Search/ResultsType.php and a line for your new field exactly the same way file_text is added
- Sorry, this is not obvious but without copy_to the hooks cannot be used...
- And please feel free to upload patch to gerrit, I'd be happy to review it. DCausse (WMF) (talk) 10:29, 1 August 2017 (UTC)
Changing the displays of the search results
[edit]I work with a template heavy wiki meaning a lot of our pages that get used often are ones with content from templates. This information could be used to better identify links from search results. I've tried digging through the CirrusSearch's files to find where the HTML is defined for visually printing out the information. It could be that CirrusSearch simply changes who gives MW search results and MW itself does the visual generating. However, if CirrusSearch does indeed deal with the HTML printoutsof search results I would like to know where so that I can add on things to make search results clearer for my users.
Thanks Ggjsc (talk) 15:01, 14 August 2017 (UTC)
Fatal exception of type "RuntimeException"
[edit]Hello,
I'm receiving a RuntimeException error on the search page after I enable CirrusSearch. I have a lot of operation timed out errors in the extensions/CirrusSearch/error.log as follows:
"Requeueing job with delay of 64.
2017-08-14 20:36:49 PCTWB9003 mediawiki: Search backend error during prefix search for 'C' after 5026: unknown: Operation timed out
2017-08-14 20:36:57 PCTWB9003 mediawiki: Search backend error during prefix search for 'Cod' after 5012: unknown: Operation timed out
2017-08-14 20:37:04 PCTWB9003 mediawiki: Search backend error during prefix search for 'Code' after 5012: unknown: Operation timed out
2017-08-14 20:37:12 PCTWB9003 mediawiki: Search backend error during near_match search for 'code' after 5015: unknown: Operation timed out
2017-08-14 20:37:44"
Index configuration / bootstrapping:
1) updateSearchIndexConfig.php - OK
2) forceSearchIndex.php --skipLinks --indexOnSkip - many entries state [ mediawiki] indexed X pages... howerver, there are quite a few rows which state 'D:\Program' is not a recognized as an internal or external command....
3) forceSearchIndex.php --skipParse - OK
Machine:
Windows 7
| Product | Version |
|---|---|
| MediaWiki | 1.28.2 |
| PHP | 5.6.30 (cgi-fcgi) |
| MariaDB | 10.1.23-MariaDB |
CirrusSearch and ElasticSearch were downloaded for 1.28
ElasticSearch 2.4.6
Any help is much appreciated! Nha4601 (talk) 12:04, 15 August 2017 (UTC)
- The errors listed on the command line when executing forceSearchIndex.php were caused by the EmbedVideo extension and $wgFFmpegLocation pointing to D:\Program Files\FFmpeg\bin. Disabling EmbedVideo allows step 2 to execute correctly, but I'm still getting an error on the search page.
- Also, the EmbedVideo extension stopped working (even after disabling CirrusSearch and Elastica.
- Any help (including getting EmbedVideo working again) is appeciated. Nha4601 (talk) 17:37, 15 August 2017 (UTC)
ForceSearchIndex.php Error
[edit]Hi,
My media wiki setup is as follows:
| MediaWiki | 1.27.3 |
| PHP | 7.0.21 (cgi-fcgi) |
| MS SQL Server | 11.00.5388 |
I keep getting the following error when running the forcesearchindex.php script:
:\TechMediaWiki\extensions\CirrusSearch\maintenance>php forceSearchIndex.php
PHP Warning: Declaration of DatabaseMssql::ignoreErrors(array $value = NULL) sh
ould be compatible with DatabaseBase::ignoreErrors($ignoreErrors = NULL) in E:\T
echMediaWiki\includes\db\DatabaseMssql.php on line 1413
[562ba13588a77a82db43400d] [no req] MWException from line 105 of E:\TechMTechMediaWiki\extensions\CirrusSearch\maintenance\forceSearchIndex.php(179
): CirrusSearch\ForceSearchIndex->findUpdates(NULL, string, NULL)
#4 E:\TechMediaWiki\maintenance\doMaintenance.php(103): CirrusSearch\ForceSearch
Index->execute()
#5 E:\TechMediaWiki\extensions\CirrusSearch\maintenance\forceSearchIndex.php(547
): require_once(string)
#6 {main}
mediaWiki\includes\page\WikiPage.php: Invalid or virtual namespace -1 given.
Backtrace:
#0 E:\TechMediaWiki\includes\page\WikiPage.php(160): WikiPage::factory(Title)
#1 E:\TechMediaWiki\extensions\CirrusSearch\maintenance\forceSearchIndex.php(409
): WikiPage::newFromRow(stdClass, integer)
#2 E:\TechMediaWiki\extensions\CirrusSearch\maintenance\forceSearchIndex.php(392
): CirrusSearch\ForceSearchIndex->decodeResults(MssqlResultWrapper, NULL)
#3 E:\
I have tried multiple versions of elastic search from 1.x versions as per the cirrussearch readme, and the same error comes up. Please help.
Thanks Rishxpre55 (talk) 10:05, 30 August 2017 (UTC)
- Did you proceed to the letter as the README of CirrusSearch documents the update and forceSearchIndex process from the beginning on? I had once the same error and following exactly the README worked out for me. May be it is related to your issue. Andreas P.
10:56, 30 August 2017 (UTC) - Yes I did, not sure if it's the version of PHP or what. But I did run the updatesearchindexconfig.php first then the forcesearchindex.php after, not sure what the error
- [562ba13588a77a82db43400d] [no req] MWException from line 105 of E:\TechMediaW
- iki\includes\page\WikiPage.php: Invalid or virtual namespace -1 given.
- Sorry it must of not copied over didn't see it on the OP.
- Any ideas what this means. Rishxpre55 (talk) 12:02, 30 August 2017 (UTC)
- This seems to be suggesting an underlying issue in MediaWiki with MS SQL support. The error about the database classes not having matching implementations comes from MediaWiki itself, same with the error from WikiPage about not being able to convert the database row into a WikiPage object. I would suggest filling a phabricator task about it. EBernhardson (WMF) (talk) 17:34, 1 September 2017 (UTC)
Looking for a better search option - search by title
[edit]At the site I work on, we receive a lot of complaints about the search engine because it doesn't prioritize the title. If intitle: " " is used, the search results return what people expect. Is there a way to force the intitle: function to be the standard search, without typing it everytime in the search box? If this can't be done, we might remove CirrusSearch and go back to using the regular search engine in mediawiki - is there a reason why we shouldn't do that?
Thank you in advance Ahancie (talk) 19:38, 3 November 2017 (UTC)
- If the default search works best for you, by all means use what works best!
- If you want to try and tune CirrusSearch for your usecase, there are a set of weights that determine relative importance of fields. By default it looks like:
$wgCirrusSearchWeights = [ 'title' => 20, 'redirect' => 15, 'category' => 8, 'heading' => 5, 'opening_text' => 3, 'text' => 1, 'auxiliary_text' => 0.5, 'file_text' => 0.5, ];- There is an important caveat in the docs though:
- Must be integers not decimals. If $wgCirrusSearchAllFields['use'] is false this can be changed on the fly. If it is true then changes to this require an in place reindex to take effect.
- You could try increasing title (and possible redirect) weight if you are looking for much stronger preference for the title field. This will change the way ranking works. The intitle keyword is a little different, it changes the retrieval phase to limit what is considered. There is currently no options available to change the default fields used for the retrieval phase. EBernhardson (WMF) (talk) 23:04, 3 November 2017 (UTC)
- Thank you for your reply, you have given me some things to try. Is there a reason to choose extension:CirrusSearch over the default search? We just removed the CirrusSearch reference in LocalSettings, to see what would happen, and the search seems to be working the way we would expect .. titles first, then a weighted average from the text. I'm not sure why the decision was made to install CirrusSearch on the wiki I am working on, but I am hesitant to remove it since I was not around during that decision making process. Ahancie (talk) 22:39, 7 November 2017 (UTC)
- Generally CirrusSearch can provide much more powerful features like the intitle: keyword, regex search against wikitext content, category filtering, indexing of uploaded files (of specifically supported formats by mediawki), etc. CirrusSearch also has much better support for the wide variety of languages that exist. If you only have a few thousand pages this might be a bit overkill for your needs and the relatively simpler sql-based default search is plausibly appropriate. EBernhardson (WMF) (talk) 19:15, 9 November 2017 (UTC)
Keep receiving "Call to undefined method Elastica\Query::setStoredFields()" error!
[edit]Using MW 1.29.1 with Elasticsearch 5.6.3, I am very sure the Extension:CirrusSearch & Extension:Elastica are in "origin/REL1_29" branch. They all showed up in Special:Version page. BUT the search just don't work. Why?
require_once "$IP/extensions/Elastica/Elastica.php";
require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";
$wgCirrusSearchServers = array( '127.0.0.1' ); //elasticsearch with redis together
$wgSearchType = 'CirrusSearch';
$wgCirrusSearchPrefixSearchStartsWithAnyWord = true;
[038842b02f5fd4ba41db7da1] /index.php?search=test Error from line 520 of /mediawiki-1.29.1/extensions/CirrusSearch/includes/Searcher.php: Call to undefined method Elastica\Query::setStoredFields()
Backtrace:
#0 /mediawiki-1.29.1/extensions/CirrusSearch/includes/Searcher.php(644): CirrusSearch\Searcher->buildSearch()
#1 /mediawiki-1.29.1/extensions/CirrusSearch/includes/Searcher.php(242): CirrusSearch\Searcher->searchOne()
#2 /mediawiki-1.29.1/extensions/CirrusSearch/includes/Hooks.php(557): CirrusSearch\Searcher->nearMatchTitleSearch(string)
#3 /mediawiki-1.29.1/includes/Hooks.php(186): CirrusSearch\Hooks::onSearchGetNearMatch(string, NULL)
#4 /mediawiki-1.29.1/includes/search/SearchNearMatcher.php(125): Hooks::run(string, array)
#5 /mediawiki-1.29.1/includes/search/SearchNearMatcher.php(33): SearchNearMatcher->getNearMatchInternal(string)
#6 /mediawiki-1.29.1/includes/specials/SpecialSearch.php(253): SearchNearMatcher->getNearMatch(string)
#7 /mediawiki-1.29.1/includes/specials/SpecialSearch.php(143): SpecialSearch->goResult(string)
#8 /mediawiki-1.29.1/includes/specialpage/SpecialPage.php(522): SpecialSearch->execute(NULL)
#9 /mediawiki-1.29.1/includes/specialpage/SpecialPageFactory.php(578): SpecialPage->run(NULL)
#10 /mediawiki-1.29.1/includes/MediaWiki.php(287): SpecialPageFactory::executePath(Title, RequestContext)
#11 /mediawiki-1.29.1/includes/MediaWiki.php(862): MediaWiki->performRequest()
#12 /mediawiki-1.29.1/includes/MediaWiki.php(523): MediaWiki->main()
#13 /mediawiki-1.29.1/index.php(43): MediaWiki->run()
#14 {main} Deletedaccount4567435 (talk) 04:12, 11 November 2017 (UTC)
- So I tried both "origin/REL1_28" & "origin/REL1_30" version. They all work after composer update, but not "origin/REL1_29". Strange. Deletedaccount4567435 (talk) 05:03, 11 November 2017 (UTC)
- REL1_29 not working is indeed strange. We can see in the composer.json of the Elastica extension that the specified version of the elastica library is 5.1.0. Looking at that tag in the Elastica library we can see the Elastica\Query class has a setStoredFields() function. I tried cloning the extension, checking out the REL1_29 branch, and running composer install. This downloaded the correct version of the library as reported above. I'm sadly not sure how it's failing for you unfortunately. EBernhardson (WMF) (talk) 20:15, 15 November 2017 (UTC)
Initial indexing fails
[edit]php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php works fine
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip works fine
But the next step crashes
php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse
With the following error
PHP Catchable fatal error: Argument 1 passed to CirrusSearch\DataSender::reportUpdateMetrics() must be an instance of Elastica\Bulk\ResponseSet, null given
called in extensions/CirrusSearch/includes/DataSender.php on line 209 and defined in extensions/CirrusSearch/includes/DataSender.php on line 229
Any idea how to solve that ? 108.6.160.163 (talk) 00:14, 17 November 2017 (UTC)
- I am Using MediaWiki 1.29.1 and ElasticSearch 5.3.3
- I was able to index other wikis without issues but not that particular one. 108.6.160.162 (talk) 01:23, 17 November 2017 (UTC)
- This is a know bug (see T180298) and was fixed recently. I don't have great suggestion except upgrading to master or hacking CirrusSearch code to avoid this error:
- Simply comment the line 209 in extensions/CirrusSearch/includes/DataSender.php and run the maintenance scripts again.
- I'll try to backport the fix. DCausse (WMF) (talk) 07:50, 17 November 2017 (UTC)
- I upgraded to Master but now I am getting this error
- PHP Fatal error: Call to undefined method CirrusSearch\ForceSearchIndex::getBatchSize() in /extensions/CirrusSearch/maintenance/forceSearchIndex.php on line 466
- But it works with the original downloaded version of CirrusSearch for MediaWiki 1.29 by commenting line 209 in extensions/CirrusSearch/includes/DataSender.php as you suggested
- Thank you so much for you help 108.6.160.162 (talk) 11:43, 17 November 2017 (UTC)
- Applying the patch to DataSender.php online 209 also fixes the problem
- if ( $validResponse ) {
- $this->reportUpdateMetrics( $responseSet, $indexType, count( $data ) );
- } 108.6.160.162 (talk) 11:58, 17 November 2017 (UTC)
index_already_exists_exception: already exists
[edit]I am running a single node demonstration server for MediaWiki.
| Product | Version |
|---|---|
| MediaWiki | 1.28.3 (e9f315c) 16:01, 15 November 2017 |
| PHP | 5.6.32 (apache2handler) |
| MariaDB | 5.5.56-MariaDB |
| ICU | 50.1.2 |
| Elasticsearch | 2.4.6 |
| Lua | 5.1.5 |
and I am trying to rebuild my ES search indexes. After I deleted the indexes using curl, I have tried running WIKI="demo" php "/opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php" --startOver
I get this failure:
content index...
Fetching Elasticsearch version...2.4.6...ok
Scanning available plugins...
elasticsearch-migration, head, kopf
Inferring index identifier...wiki_demo_content_first
Picking analyzer...english
Validating number of shards...ok
Validating replica range...ok
Validating shard allocation settings...done
Validating max shards per node...ok
Validating analyzers...ok
Validating mappings...
Validating mapping...different...corrected
Validating cache warmers...
Updating QualityBox demo...done
Validating aliases...
Validating wiki_demo_content alias...ok
Validating wiki_demo alias...ok
mw_cirrus_metastore missing creating.
Creating metastore index... mw_cirrus_metastore_first
Unexpected Elasticsearch failure.
Elasticsearch failed in an unexpected way. This is always a bug in CirrusSearch.
Error type: Elastica\Exception\ResponseException
Message: index_already_exists_exception: already exists
Trace:
#0 /opt/htdocs/mediawiki/vendor/ruflin/elastica/lib/Elastica/Request.php(172): Elastica\Transport\Http->exec(Object(Elastica\Request), Array)
#1 /opt/htdocs/mediawiki/vendor/ruflin/elastica/lib/Elastica/Client.php(627): Elastica\Request->send()
#2 /opt/htdocs/mediawiki/vendor/ruflin/elastica/lib/Elastica/Index.php(515): Elastica\Client->request('mw_cirrus_metas...', 'PUT', Array, Array)
#3 /opt/htdocs/mediawiki/extensions/CirrusSearch/includes/Maintenance/MetaStoreIndex.php(164): Elastica\Index->request('', 'PUT', Array, Array)
#4 /opt/htdocs/mediawiki/extensions/CirrusSearch/includes/Maintenance/MetaStoreIndex.php(122): CirrusSearch\Maintenance\MetaStoreIndex->createNewIndex()
#5 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/metastore.php(68): CirrusSearch\Maintenance\MetaStoreIndex->createOrUpgradeIfNecessary()
#6 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(299): CirrusSearch\Maintenance\Metastore->execute()
#7 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(267): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->updateVersions()
#8 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(58): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->execute()
#9 /opt/htdocs/mediawiki/maintenance/doMaintenance.php(111): CirrusSearch\Maintenance\UpdateSearchIndexConfig->execute()
#10 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(65): require_once('/opt/htdocs/med...')
#11 {main}
You can see my ES status here: http://demo.qualitybox.us:9201/_plugin/head/
The shards are not allocated to the node.
Is there something I can do to "start over" besides --startOver? Greg Rundlett (talk) 04:02, 13 December 2017 (UTC)
- Note: I've deleted the indexes and I'm retrying the updateSearchIndexConfig.php
- list unassigned shards:
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason | grep UNASSIGNED - Delete each:
curl -XDELETE 'localhost:9200/mw_cirrus_metastore_first' - Then run maintenance script
WIKI="demo" php "/opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php" --startOver
- list unassigned shards:
- So far, it gets to "Creating metastore index... mw_cirrus_metastore_first" with "ok"
- Then, "Index is red retrying..." (repeats for 20 minutes so far, with a VERY small wiki) Greg Rundlett (talk) 04:19, 13 December 2017 (UTC)
- So elastic does not want to assign shards for this index. Could you try to check elastic logs to see if you can find something? DCausse (WMF) (talk) 10:12, 13 December 2017 (UTC)
- The log is full of failures like this
[2017-12-12 22:06:59,634][DEBUG][action.search ] [meza_node_1] All shards failed for phase: [query]Greg Rundlett (talk) 16:32, 13 December 2017 (UTC)
- Eventually the script failed because the primary shard was not active for the first index. I don't know why, or what caused this condition. Should I open a phab ticket with these details?
- Unexpected Elasticsearch failure.
- Greg Rundlett (talk) 16:24, 13 December 2017 (UTC)
Elasticsearch failed in an unexpected way. This is always a bug in CirrusSearch. Error type: Elastica\Exception\ResponseException Message: unavailable_shards_exception: [mw_cirrus_metastore_first][0] primary shard is not active Timeout: [1m], request: [index {[mw_cirrus_metastore_first][internal][metastore_version], source[{"metastore_major_version":0,"metastore_minor_version":2}]}] Trace: #0 /opt/htdocs/mediawiki/vendor/ruflin/elastica/lib/Elastica/Request.php(172): Elastica\Transport\Http->exec(Object(Elastica\Request), Array) #1 /opt/htdocs/mediawiki/vendor/ruflin/elastica/lib/Elastica/Client.php(627): Elastica\Request->send() #2 /opt/htdocs/mediawiki/vendor/ruflin/elastica/lib/Elastica/Index.php(515): Elastica\Client->request('mw_cirrus_metas...', 'PUT', Array, Array) #3 /opt/htdocs/mediawiki/vendor/ruflin/elastica/lib/Elastica/Type.php(520): Elastica\Index->request('internal/metast...', 'PUT', Array, Array) #4 /opt/htdocs/mediawiki/vendor/ruflin/elastica/lib/Elastica/Type.php(89): Elastica\Type->request('metastore_versi...', 'PUT', Array, Array) #5 /opt/htdocs/mediawiki/extensions/CirrusSearch/includes/Maintenance/MetaStoreIndex.php(385): Elastica\Type->addDocument(Object(Elastica\Document)) #6 /opt/htdocs/mediawiki/extensions/CirrusSearch/includes/Maintenance/MetaStoreIndex.php(167): CirrusSearch\Maintenance\MetaStoreIndex->storeMetastoreVersion(Object(Elastica\Index)) #7 /opt/htdocs/mediawiki/extensions/CirrusSearch/includes/Maintenance/MetaStoreIndex.php(122): CirrusSearch\Maintenance\MetaStoreIndex->createNewIndex() #8 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/metastore.php(68): CirrusSearch\Maintenance\MetaStoreIndex->createOrUpgradeIfNecessary() #9 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(299): CirrusSearch\Maintenance\Metastore->execute() #10 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(267): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->updateVersions() #11 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(58): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->execute() #12 /opt/htdocs/mediawiki/maintenance/doMaintenance.php(111): CirrusSearch\Maintenance\UpdateSearchIndexConfig->execute() #13 /opt/htdocs/mediawiki/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(65): require_once('/opt/htdocs/med...')