Extension talk:Lucene-search

About this board

Archives
Archive

Previous page history was archived for backup purposes at Extension talk:Lucene-search/LQT Archive 1 on 2015-06-10.

Start a new topic

How do you start or enable "Did you mean" feature with Lucene-search?

One comment • 13:21, 16 May 2019 4 years ago

1

121.244.152.202 (talkcontribs)

Can someone please help me enable this feature.

I use Mediawiki 1.18, with Lucene-search 2.1.3 and MWsearch 1.21.

Reply 06:41, 10 June 2015 8 years ago

Reply to "How do you start or enable "Did you mean" feature with Lucene-search?"

Help - port 8123 refusing all connections after Linux update & reboot

12 comments • 14:18, 1 November 2014 9 years ago

12

Maiden taiwan (talkcontribs)

After a "yum update" on our Linux server, and a reboot, Lucene is no longer listening on port 8123. We have not changed any Lucene config files.

$ telnet localhost 8123
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host: Connection refused

lsearchd is running and is listening on port 8321 for incremental reindexes. Java is running as well. When I start lsearchd manually, it says:

sudo  /usr/local/bin/lucene-run
RMI registry started.
Trying config file at path /root/.lsearch.conf
Trying config file at path /usr/local/lucene-search-2.1.3/lsearch.conf
0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
727  [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound
730  [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Indexer started on port 8321

It definitely does NOT print the usual message about port 8123:

771  [Thread-2] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Searcher started on port 8123

Any tips? Where do I start looking? This is a critical site for our business with thousands of users daily. Thanks.

--Maiden taiwan 03:54, 12 December 2011 (UTC)

Reply 03:54, 12 December 2011 12 years ago

Maiden taiwan (talkcontribs)

I should mention that the "yum update" was NOT for lucene-search, nor for Java. Just for core CentOS Linux packages. Maiden taiwan 04:00, 12 December 2011 (UTC)

Reply 04:00, 12 December 2011 12 years ago

Maiden taiwan (talkcontribs)

Changing the port number in lsearch.conf does not affect the problem. Maiden taiwan 04:07, 12 December 2011 (UTC)

Reply 04:07, 12 December 2011 12 years ago

Rainman (talkcontribs)

Did your hostname change somehow? If there was a conflict, it would print out an error message. It seems like it doesn't even want to start a searcher because it might think this is not the right host to start it up?

Reply 09:22, 12 December 2011 12 years ago

Maiden taiwan (talkcontribs)

The hostname is still the same. Maiden taiwan 12:43, 12 December 2011 (UTC)

Reply 12:43, 12 December 2011 12 years ago

Rainman (talkcontribs)

Well don't know then. My hunch is that there is something wrong with how the hostname is understood. Have you tried calling java with:

-Djava.rmi.server.hostname=<your hostname, not localhost!>

And then use the same hostname in your configuration files?

Reply 15:25, 12 December 2011 12 years ago

Maiden taiwan (talkcontribs)

Thanks for the tip. lsearchd currently runs this line:

java -Djava.rmi.server.codebase=file://$jardir/LuceneSearch.jar \
-Djava.rmi.server.hostname=$HOSTNAME -jar $jardir/LuceneSearch.jar $*

and $HOSTNAME = the correct value: I ran "ps uax" and saw it. Maiden taiwan 16:01, 12 December 2011 (UTC)

Reply 16:01, 12 December 2011 12 years ago

Maiden taiwan (talkcontribs)

I ran an "strace -v" on lsearchd, and its calls to uname({sysname="Linux", nodename="mysystem", ...) are return success (zero), so I believe this shows it's looking up the right hostname.

Reply 16:52, 12 December 2011 12 years ago

Rainman (talkcontribs)

Well, don't know then, sorry.

Reply 19:49, 12 December 2011 12 years ago

Maiden taiwan (talkcontribs)

OK, I downloaded the lucene-search Java source, added some debug output, and recompiled it. Here is more data.

On our system, MediaWiki runs on one server (wikihost) and Lucene on another (km105).
The problem is that GlobalConfiguration.isSearcher() is returning false.
In GlobalConfiguration, the hostAddr and hostName variables are set correctly to 127.0.0.1 and the true hostname (km105). However, in the isSearcher function body, both search.get(hostAddr) and search.get(hostName) are null.
On a different working Lucene system in my company, search.get(hostName) is non-null.
I see only one line where the "search" hashtable gets set, in function processSearchRoles (search.put(host,hostroles)). I added some debugging, and this is getting called only for the MediaWiki host (wikihost) and not for the search host (km105).

Here is the debug output of lsearchd:

Trying config file at path /home/danb/.lsearch.conf
Trying config file at path /home/danb/src/lsearch.conf
setHost hostAddr = 127.0.0.1
setHost hostName = km105
0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
755  [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound
isIndexer hostAddr = 127.0.0.1
isIndexer hostName = km105
isIndexer index.get hostAddr = null
isIndexer index.get hostName = [*]
isSearcher hostAddr = 127.0.0.1
isSearcher hostName = km105
isSearcher search.get hostAddr = null
isSearcher search.get hostName = null    (NOTE: This seems to be the problem.)
758  [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Indexer started on port 8321

Here is lsearch-global.conf, which is unchanged since before the problem started:

################################################
# Global search cluster layout configuration
################################################

[Database]
wikidb : (single) (spell,4,2) (language,en)

[Search-Group]
wikihost : *

[Index]
km105 : *

[Index-Path]
<default> : /search

[OAI]
<default> : http://wikihost/w/index.php

[Namespace-Boost]
<default> : (0,2) (1,0.5)

[Namespace-Prefix]
all : <all>
[0] : 0
[1] : 1
[2] : 2
[3] : 3
[4] : 4
[5] : 5
[6] : 6
[7] : 7
[8] : 8
[9] : 9
[10] : 10
[11] : 11
[12] : 12
[13] : 13
[14] : 14
[15] : 15

And here is lsearch.conf:

MWConfig.global=file:///usr/local/lucene-search.2.1.3/lsearch-global.conf
Indexes.path=/usr/local/lucene-search-2.1.3/indexes
Rsync.path=/usr/bin/rsync
...
(the rest of the file is unchanged from the default)

Can you suggest any other debug output I can add to Lucene so it helps find the problem?

Reply 18:04, 13 December 2011 12 years ago

Maiden taiwan (talkcontribs)

I got everything working again. Part of it was my error -- km105 isn't supposed to be a search query service, just an index generator. So the above debug behavior is correct.

Reply 18:59, 13 December 2011 12 years ago

Cboltz (talkcontribs)

I just had the same problem after moving some wikis and lucene to a new server.

The solution: lsearch-global.conf contains the hostname. After changing it to the content of $HOSTNAME, it works again. Interestingly, on the old server I had to use the full hostname (hostname -f output) while the new server needs the short hostname (hostname -s output).

This is "just for the records" in case someone hits the same problem ;-)

Reply 14:18, 1 November 2014 9 years ago

Reply to "Help - port 8123 refusing all connections after Linux update & reboot"

configure build problem ... possibly related to PHP BOM prepending

2 comments • 12:44, 15 October 2014 9 years ago

2

Freddo411 (talkcontribs)

Hello,

I am running into a blocking issue when trying to run ./build. The specific error copied below. I am using version 2.1.3

I am also noticing a number of oddities that may be related.

1) the inclusion on BOM markers in config.inc and lsearch-global.inc. Specifically, in config.inc

  -- a BOM between "dbname=" and "wikidb"
  -- a BOM after "wgScriptPath=".  Note, the correct value from LocalSettings.php is 
  -- a BOM between "wgServer=" and "https://newwiki.west.isilon.com"

2) the xml dump from media wiki also includes a prepended BOM. I tried this outside of the Lucene context and also consistently got a BOM prepended by PHP.

Any help or clues would be appreciated.

MediaWiki lucene-search indexer - rebuild all indexes associated with a database.
Trying config file at path /root/.lsearch.conf
Trying config file at path /usr/local/lucene-search-2.1.3/lsearch.conf
MediaWiki lucene-search indexer - index builder from xml database dumps.

0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
68   [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /usr/local/lucene-search-2.1.3/indexes/import/wikidb.links
116  [main] INFO  org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links...
192  [main] FATAL org.wikimedia.lsearch.importer.Importer  - Cannot store link analytics: Content is not allowed in prolog.
java.io.IOException: Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
	at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
	at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
	at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
	at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
194  [main] ERROR org.wikimedia.lsearch.importer.BuildAll  - Error during rebuild of wikidb : Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
java.io.IOException: Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
	at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
	at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
	at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
	at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)

Reply 17:45, 19 August 2013 10 years ago

Extirpate (talkcontribs)

i have the same problem. and i solved it. the extension "semantic comments" caused this problem.

Reply 12:44, 15 October 2014 9 years ago

Reply to "configure build problem ... possibly related to PHP BOM prepending"

cleanup old indexes

One comment • 11:49, 27 November 2013 10 years ago

1

82.218.136.35 (talkcontribs)

i'me useing crontab to generate every day a new index.

30 1 * * * /etc/init.d/lsearch stop && cd /usr/local/search/ls2 && /usr/local/search/ls2/build > /dev/null 2>&1 && /etc/init.d/lsearch start

every day new files and folders are generated in the update folder: wiki wiki.hl wiki.links wiki.related wiki.spell /usr/local/search/ls2/indexes/update/wiki.related/20131116013015

can i save delete all (or only old folders)? is there an automated cleanup script to check for the old files and folders? my installation is about 2 years and a have tons of old files an folders!

regards, josef lahmer (josy1024@gmail.com)

Reply 11:49, 27 November 2013 10 years ago

Reply to "cleanup old indexes"

Not available

One comment • 17:19, 28 October 2013 10 years ago

1

Juandev (talkcontribs)

Not available in download, where to find it?

Reply 17:19, 28 October 2013 10 years ago

Reply to "Not available"

How do I get the lsearchd script to run in Upstart?

One comment • 19:50, 9 October 2013 10 years ago

1

24.99.91.221 (talkcontribs)

I cannot get it to work and am a sad panda. I am using CentOS 6.4. It runs fine when I run it in a screen, but I'd like to run it on boot for obvious reasons.

Reply 19:50, 9 October 2013 10 years ago

Reply to "How do I get the lsearchd script to run in Upstart?"

Problems getting started

8 comments • 05:23, 5 October 2013 10 years ago

8

Subfader (talkcontribs)

MediaWiki 1.16.alpha on Ubuntu. Installed java-6-sun-1.6.0.24 and Apache Ant 1.7.0. Installed lucene-search-bin-2.1.3.tar.gz into /my/lucene/lucene-search-2.1.3.

Configs are created:

./configure /path/to/wiki/install/root/

Running

./build

returns

Dumping MyWiki...
MediaWiki lucene-search indexer - rebuild all indexes associated with a database.
Trying config file at path /root/.lsearch.conf
Trying config file at path /root/my/lucene/lucene-search-2.1.3/lsearch.conf
MediaWiki lucene-search indexer - index builder from xml database dumps.

0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
83   [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki.links
141  [main] INFO  org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links...
232  [main] FATAL org.wikimedia.lsearch.importer.Importer  - Cannot store link analytics: Premature end of file.
java.io.IOException: Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
        at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
235  [main] ERROR org.wikimedia.lsearch.importer.BuildAll  - Error during rebuild of MyWiki : Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
java.io.IOException: Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
        at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
Finished build in 0s

Creating a dump xml with maintenance/dumpBackup.php also fails. The command returns no error at all.

./lsearchd

returns

RMI registry started.
Trying config file at path /root/.lsearch.conf
Trying config file at path /root/my/lucene/lucene-search-2.1.3/lsearch.conf
0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
581  [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound
583  [Thread-2] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Searcher started on port 8123
583  [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Indexer started on port 8321
588  [Thread-5] INFO  org.wikimedia.lsearch.search.SearcherCache  - Starting initial deployer for [MyWiki, MyWiki.hl, MyWiki.links, MyWiki.related, MyWiki.spell]

Running the test search returns a 500 server error:

Internal error in SearchEngine: MyWiki is being deployed or is not searched by this host
LSearch daemon on localhost

and adds this to the console log:

2199 [Thread-8] INFO  org.wikimedia.lsearch.frontend.HttpMonitor  - HttpMonitor thread started
2200 [pool-2-thread-1] INFO  org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/MyWiki/Boratto what:search dbname:MyWiki term:Boratto
2234 [pool-2-thread-1] INFO  org.wikimedia.lsearch.analyzers.StopWords  - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 18 ms
java.lang.RuntimeException: MyWiki is being deployed or is not searched by this host
        at org.wikimedia.lsearch.search.SearcherCache.getLocalSearcher(SearcherCache.java:388)
        at org.wikimedia.lsearch.search.WikiSearcher.<init>(WikiSearcher.java:96)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:715)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129)
        at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101)
        at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193)
        at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2251 [pool-2-thread-1] ERROR org.wikimedia.lsearch.search.SearchEngine  - Internal error in SearchEngine trying to make WikiSearcher: MyWiki is being deployed or is not searched by this host
java.lang.RuntimeException: MyWiki is being deployed or is not searched by this host
        at org.wikimedia.lsearch.search.SearcherCache.getLocalSearcher(SearcherCache.java:388)
        at org.wikimedia.lsearch.search.WikiSearcher.<init>(WikiSearcher.java:96)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:715)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129)
        at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101)
        at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193)
        at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2541 [pool-2-thread-2] WARN  org.wikimedia.lsearch.frontend.HttpHandler  - Unknown request /favicon.ico

Where to start troubleshooting? Thanks in advance!

Reply Edited 17:50, 25 August 2012 11 years ago

Rainman (talkcontribs)

Can you check if the .xml file in dump/ is not empty?

Reply 20:07, 28 August 2012 11 years ago

Subfader (talkcontribs)

The xml created by maintenance/dumpBackup.php? Yes it is created but empty.

I have no problems running other maintenance scripts. It returns no error message at all, that drives me crazy.

Reply 06:40, 29 August 2012 11 years ago

Chiefgeek157 (talkcontribs)

Just saw this problem a minute ago on my own server. I had a typo in LocalSettings.php that prevented the dump from working (I personally was missing a trailing ';').

Reply 22:15, 28 August 2012 11 years ago

Subfader (talkcontribs)

Hmh I can check but shouldn't the whole wiki break in such case? My wiki works fine tho.

Reply 06:41, 29 August 2012 11 years ago

Rainman (talkcontribs)

Yes, unfortunately dumpBackup.php returns very few errors. You need to google around to make sure it works first.

Reply 09:01, 29 August 2012 11 years ago

212.24.186.158 (talkcontribs)

When xdebug is turned on it returns this error message in php_errors.log: PHP Fatal error: Maximum function nesting level of '100' reached, aborting! in /var/www/html/w/includes/GlobalFunctions.php on line 2326 Try to turn off xdebug in php. It helped me to solve the premature end of file error when running build.

Reply 13:28, 9 May 2013 10 years ago

Leucosticte (talkcontribs)

Or put xdebug.max_nesting_level = 200 in your php.ini.

Reply Edited 05:23, 5 October 2013 10 years ago

Reply to "Problems getting started"

Special use case documentation of Lucene for Mediawiki ?

One comment • 09:59, 6 September 2013 10 years ago

1

Fractaliste (talkcontribs)

Hi,

Is there a documentation of the special syntax I can use on my wiki search form ? I'm interested in special feature provided by Lucene for Mediawiki use like ":incategory" options or other...

Reply 09:59, 6 September 2013 10 years ago

Reply to "Special use case documentation of Lucene for Mediawiki ?"

Daemon not starting

One comment • 20:30, 3 June 2013 10 years ago

1

2001:558:1414:0:0:5EFE:A0A:753A (talkcontribs)

Running Ubuntu 10.04 Java JDK 7 Ant 1.9.1

./lsearchd stops at this line

1213 [Thread-5] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable<wikidb.spell>$0 bound

any help getting the daemon started is appreciated

Reply 20:30, 3 June 2013 10 years ago

Reply to "Daemon not starting"

Index doesn't exist

One comment • 10:59, 7 May 2013 10 years ago

1

195.124.31.219 (talkcontribs)

Hi all,

whenever I start searching, I get:

117919 [pool-2-thread-4] INFO org.wikimedia.lsearch.frontend.HttpHandler - query:/prefix/wiki/Ddcr?namespaces=0&offset=0&limit=10&version=2.1&iwlimit=10&searchall=0 what:prefix dbname:wiki term:Ddcr java.lang.RuntimeException: Index wiki.prefix doesn't exist

from the search server.

I went all the way down: ./configure /var/www/wiki && ./build

Any ideas?

Reply 10:59, 7 May 2013 10 years ago

Reply to "Index doesn't exist"