Extension talk:Lucene-search

Jump to: navigation, search

About this board

Archives 

Archive


By clicking "Add topic", you agree to our Terms of Use and agree to irrevocably release your text under the CC BY-SA 3.0 License and GFDL

How do you start or enable "Did you mean" feature with Lucene-search?

1
121.244.152.202 (talkcontribs)

Can someone please help me enable this feature.

I use Mediawiki 1.18, with Lucene-search 2.1.3 and MWsearch 1.21.

Reply to "How do you start or enable "Did you mean" feature with Lucene-search?"

Help - port 8123 refusing all connections after Linux update & reboot

12
Maiden taiwan (talkcontribs)

After a "yum update" on our Linux server, and a reboot, Lucene is no longer listening on port 8123. We have not changed any Lucene config files.

$ telnet localhost 8123
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host: Connection refused

lsearchd is running and is listening on port 8321 for incremental reindexes. Java is running as well. When I start lsearchd manually, it says:

sudo  /usr/local/bin/lucene-run
RMI registry started.
Trying config file at path /root/.lsearch.conf
Trying config file at path /usr/local/lucene-search-2.1.3/lsearch.conf
0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
727  [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound
730  [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Indexer started on port 8321

It definitely does NOT print the usual message about port 8123:

771  [Thread-2] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Searcher started on port 8123

Any tips? Where do I start looking? This is a critical site for our business with thousands of users daily. Thanks.

--Maiden taiwan 03:54, 12 December 2011 (UTC)

Maiden taiwan (talkcontribs)

I should mention that the "yum update" was NOT for lucene-search, nor for Java. Just for core CentOS Linux packages. Maiden taiwan 04:00, 12 December 2011 (UTC)

Maiden taiwan (talkcontribs)

Changing the port number in lsearch.conf does not affect the problem. Maiden taiwan 04:07, 12 December 2011 (UTC)

Rainman (talkcontribs)

Did your hostname change somehow? If there was a conflict, it would print out an error message. It seems like it doesn't even want to start a searcher because it might think this is not the right host to start it up?

Maiden taiwan (talkcontribs)

The hostname is still the same. Maiden taiwan 12:43, 12 December 2011 (UTC)

Rainman (talkcontribs)

Well don't know then. My hunch is that there is something wrong with how the hostname is understood. Have you tried calling java with:

-Djava.rmi.server.hostname=<your hostname, not localhost!>

And then use the same hostname in your configuration files?

Maiden taiwan (talkcontribs)

Thanks for the tip. lsearchd currently runs this line:

java -Djava.rmi.server.codebase=file://$jardir/LuceneSearch.jar \
-Djava.rmi.server.hostname=$HOSTNAME -jar $jardir/LuceneSearch.jar $*

and $HOSTNAME = the correct value: I ran "ps uax" and saw it. Maiden taiwan 16:01, 12 December 2011 (UTC)

Maiden taiwan (talkcontribs)

I ran an "strace -v" on lsearchd, and its calls to uname({sysname="Linux", nodename="mysystem", ...) are return success (zero), so I believe this shows it's looking up the right hostname.

Rainman (talkcontribs)

Well, don't know then, sorry.

Maiden taiwan (talkcontribs)

OK, I downloaded the lucene-search Java source, added some debug output, and recompiled it. Here is more data.

  • On our system, MediaWiki runs on one server (wikihost) and Lucene on another (km105).
  • The problem is that GlobalConfiguration.isSearcher() is returning false.
  • In GlobalConfiguration, the hostAddr and hostName variables are set correctly to 127.0.0.1 and the true hostname (km105). However, in the isSearcher function body, both search.get(hostAddr) and search.get(hostName) are null.
  • On a different working Lucene system in my company, search.get(hostName) is non-null.
  • I see only one line where the "search" hashtable gets set, in function processSearchRoles (search.put(host,hostroles)). I added some debugging, and this is getting called only for the MediaWiki host (wikihost) and not for the search host (km105).

Here is the debug output of lsearchd:

Trying config file at path /home/danb/.lsearch.conf
Trying config file at path /home/danb/src/lsearch.conf
setHost hostAddr = 127.0.0.1
setHost hostName = km105
0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
755  [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound
isIndexer hostAddr = 127.0.0.1
isIndexer hostName = km105
isIndexer index.get hostAddr = null
isIndexer index.get hostName = [*]
isSearcher hostAddr = 127.0.0.1
isSearcher hostName = km105
isSearcher search.get hostAddr = null
isSearcher search.get hostName = null    (NOTE: This seems to be the problem.)
758  [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Indexer started on port 8321

Here is lsearch-global.conf, which is unchanged since before the problem started:

################################################
# Global search cluster layout configuration
################################################

[Database]
wikidb : (single) (spell,4,2) (language,en)

[Search-Group]
wikihost : *

[Index]
km105 : *

[Index-Path]
<default> : /search

[OAI]
<default> : http://wikihost/w/index.php

[Namespace-Boost]
<default> : (0,2) (1,0.5)

[Namespace-Prefix]
all : <all>
[0] : 0
[1] : 1
[2] : 2
[3] : 3
[4] : 4
[5] : 5
[6] : 6
[7] : 7
[8] : 8
[9] : 9
[10] : 10
[11] : 11
[12] : 12
[13] : 13
[14] : 14
[15] : 15

And here is lsearch.conf:

MWConfig.global=file:///usr/local/lucene-search.2.1.3/lsearch-global.conf
Indexes.path=/usr/local/lucene-search-2.1.3/indexes
Rsync.path=/usr/bin/rsync
...
(the rest of the file is unchanged from the default)

Can you suggest any other debug output I can add to Lucene so it helps find the problem?

Maiden taiwan (talkcontribs)

I got everything working again. Part of it was my error -- km105 isn't supposed to be a search query service, just an index generator. So the above debug behavior is correct.

Cboltz (talkcontribs)

I just had the same problem after moving some wikis and lucene to a new server.

The solution: lsearch-global.conf contains the hostname. After changing it to the content of $HOSTNAME, it works again. Interestingly, on the old server I had to use the full hostname (hostname -f output) while the new server needs the short hostname (hostname -s output).

This is "just for the records" in case someone hits the same problem ;-)

Reply to "Help - port 8123 refusing all connections after Linux update & reboot"

configure build problem ... possibly related to PHP BOM prepending

2
Freddo411 (talkcontribs)

Hello,

I am running into a blocking issue when trying to run ./build. The specific error copied below. I am using version 2.1.3

I am also noticing a number of oddities that may be related.

1) the inclusion on BOM markers in config.inc and lsearch-global.inc. Specifically, in config.inc

  -- a BOM between "dbname=" and "wikidb"
  -- a BOM after "wgScriptPath=".  Note, the correct value from LocalSettings.php is 
  -- a BOM between "wgServer=" and "https://newwiki.west.isilon.com"

2) the xml dump from media wiki also includes a prepended BOM. I tried this outside of the Lucene context and also consistently got a BOM prepended by PHP.

Any help or clues would be appreciated.


MediaWiki lucene-search indexer - rebuild all indexes associated with a database.
Trying config file at path /root/.lsearch.conf
Trying config file at path /usr/local/lucene-search-2.1.3/lsearch.conf
MediaWiki lucene-search indexer - index builder from xml database dumps.

0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
68   [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /usr/local/lucene-search-2.1.3/indexes/import/wikidb.links
116  [main] INFO  org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links...
192  [main] FATAL org.wikimedia.lsearch.importer.Importer  - Cannot store link analytics: Content is not allowed in prolog.
java.io.IOException: Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
	at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
	at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
	at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
	at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
194  [main] ERROR org.wikimedia.lsearch.importer.BuildAll  - Error during rebuild of wikidb : Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
java.io.IOException: Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
	at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
	at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
	at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
	at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)

Extirpate (talkcontribs)

i have the same problem. and i solved it. the extension "semantic comments" caused this problem.

Reply to "configure build problem ... possibly related to PHP BOM prepending"
82.218.136.35 (talkcontribs)

i'me useing crontab to generate every day a new index.

30 1 * * * /etc/init.d/lsearch stop && cd /usr/local/search/ls2 && /usr/local/search/ls2/build > /dev/null 2>&1 && /etc/init.d/lsearch start

every day new files and folders are generated in the update folder: wiki wiki.hl wiki.links wiki.related wiki.spell /usr/local/search/ls2/indexes/update/wiki.related/20131116013015

can i save delete all (or only old folders)? is there an automated cleanup script to check for the old files and folders? my installation is about 2 years and a have tons of old files an folders!

regards, josef lahmer (josy1024@gmail.com)

Reply to "cleanup old indexes"
Juandev (talkcontribs)

Not available in download, where to find it?

Reply to "Not available"

How do I get the lsearchd script to run in Upstart?

1
24.99.91.221 (talkcontribs)

I cannot get it to work and am a sad panda. I am using CentOS 6.4. It runs fine when I run it in a screen, but I'd like to run it on boot for obvious reasons.

Reply to "How do I get the lsearchd script to run in Upstart?"
Subfader (talkcontribs)

MediaWiki 1.16.alpha on Ubuntu. Installed java-6-sun-1.6.0.24 and Apache Ant 1.7.0. Installed lucene-search-bin-2.1.3.tar.gz into /my/lucene/lucene-search-2.1.3.

Configs are created:

./configure /path/to/wiki/install/root/

Running

./build

returns

Dumping MyWiki...
MediaWiki lucene-search indexer - rebuild all indexes associated with a database.
Trying config file at path /root/.lsearch.conf
Trying config file at path /root/my/lucene/lucene-search-2.1.3/lsearch.conf
MediaWiki lucene-search indexer - index builder from xml database dumps.

0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
83   [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki.links
141  [main] INFO  org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links...
232  [main] FATAL org.wikimedia.lsearch.importer.Importer  - Cannot store link analytics: Premature end of file.
java.io.IOException: Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
        at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
235  [main] ERROR org.wikimedia.lsearch.importer.BuildAll  - Error during rebuild of MyWiki : Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
java.io.IOException: Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
        at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
Finished build in 0s

Creating a dump xml with maintenance/dumpBackup.php also fails. The command returns no error at all.

./lsearchd

returns

RMI registry started.
Trying config file at path /root/.lsearch.conf
Trying config file at path /root/my/lucene/lucene-search-2.1.3/lsearch.conf
0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
581  [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound
583  [Thread-2] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Searcher started on port 8123
583  [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Indexer started on port 8321
588  [Thread-5] INFO  org.wikimedia.lsearch.search.SearcherCache  - Starting initial deployer for [MyWiki, MyWiki.hl, MyWiki.links, MyWiki.related, MyWiki.spell]

Running the test search returns a 500 server error:

Internal error in SearchEngine: MyWiki is being deployed or is not searched by this host
LSearch daemon on localhost

and adds this to the console log:

2199 [Thread-8] INFO  org.wikimedia.lsearch.frontend.HttpMonitor  - HttpMonitor thread started
2200 [pool-2-thread-1] INFO  org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/MyWiki/Boratto what:search dbname:MyWiki term:Boratto
2234 [pool-2-thread-1] INFO  org.wikimedia.lsearch.analyzers.StopWords  - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 18 ms
java.lang.RuntimeException: MyWiki is being deployed or is not searched by this host
        at org.wikimedia.lsearch.search.SearcherCache.getLocalSearcher(SearcherCache.java:388)
        at org.wikimedia.lsearch.search.WikiSearcher.<init>(WikiSearcher.java:96)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:715)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129)
        at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101)
        at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193)
        at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2251 [pool-2-thread-1] ERROR org.wikimedia.lsearch.search.SearchEngine  - Internal error in SearchEngine trying to make WikiSearcher: MyWiki is being deployed or is not searched by this host
java.lang.RuntimeException: MyWiki is being deployed or is not searched by this host
        at org.wikimedia.lsearch.search.SearcherCache.getLocalSearcher(SearcherCache.java:388)
        at org.wikimedia.lsearch.search.WikiSearcher.<init>(WikiSearcher.java:96)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:715)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129)
        at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101)
        at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193)
        at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2541 [pool-2-thread-2] WARN  org.wikimedia.lsearch.frontend.HttpHandler  - Unknown request /favicon.ico

Where to start troubleshooting? Thanks in advance!

Rainman (talkcontribs)

Can you check if the .xml file in dump/ is not empty?

Subfader (talkcontribs)

The xml created by maintenance/dumpBackup.php? Yes it is created but empty.

I have no problems running other maintenance scripts. It returns no error message at all, that drives me crazy.

Chiefgeek157 (talkcontribs)

Just saw this problem a minute ago on my own server. I had a typo in LocalSettings.php that prevented the dump from working (I personally was missing a trailing ';').

Subfader (talkcontribs)

Hmh I can check but shouldn't the whole wiki break in such case? My wiki works fine tho.

Rainman (talkcontribs)

Yes, unfortunately dumpBackup.php returns very few errors. You need to google around to make sure it works first.

212.24.186.158 (talkcontribs)

When xdebug is turned on it returns this error message in php_errors.log: PHP Fatal error: Maximum function nesting level of '100' reached, aborting! in /var/www/html/w/includes/GlobalFunctions.php on line 2326 Try to turn off xdebug in php. It helped me to solve the premature end of file error when running build.

Leucosticte (talkcontribs)

Or put xdebug.max_nesting_level = 200 in your php.ini.

Reply to "Problems getting started"

Special use case documentation of Lucene for Mediawiki ?

1
Fractaliste (talkcontribs)

Hi,

Is there a documentation of the special syntax I can use on my wiki search form ? I'm interested in special feature provided by Lucene for Mediawiki use like ":incategory" options or other...

Reply to "Special use case documentation of Lucene for Mediawiki ?"
2001:558:1414:0:0:5EFE:A0A:753A (talkcontribs)

Running Ubuntu 10.04 Java JDK 7 Ant 1.9.1

./lsearchd stops at this line

1213 [Thread-5] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable<wikidb.spell>$0 bound

any help getting the daemon started is appreciated

Reply to "Daemon not starting"
195.124.31.219 (talkcontribs)

Hi all,

whenever I start searching, I get:

117919 [pool-2-thread-4] INFO org.wikimedia.lsearch.frontend.HttpHandler - query:/prefix/wiki/Ddcr?namespaces=0&offset=0&limit=10&version=2.1&iwlimit=10&searchall=0 what:prefix dbname:wiki term:Ddcr java.lang.RuntimeException: Index wiki.prefix doesn't exist

from the search server.

I went all the way down: ./configure /var/www/wiki && ./build

Any ideas?

Reply to "Index doesn't exist"