Extension talk:Lucene-search

From MediaWiki.org
Jump to: navigation, search

Contents

Thread titleRepliesLast modified
cleanup old indexes011:49, 27 November 2013
Not available017:19, 28 October 2013
How do I get the lsearchd script to run in Upstart?019:50, 9 October 2013
Problems getting started705:23, 5 October 2013
Special use case documentation of Lucene for Mediawiki ?009:59, 6 September 2013
configure build problem ... possibly related to PHP BOM prepending017:45, 19 August 2013
Daemon not starting020:30, 3 June 2013
Index doesn't exist010:59, 7 May 2013
two bases DB010:11, 5 May 2013
problem with ./configure106:24, 25 April 2013
Warnings in Lucene log004:10, 5 April 2013
Index step slows to an unusable speed on full Wikipedia dump116:59, 11 March 2013
Support for multiple page sections in results000:24, 19 February 2013
Init.d script for Ubuntu119:12, 12 February 2013
Accentless searching and hiding of redirects in suggestions203:41, 27 December 2012
Different results on Wikipedia mirror than live Wikipedia013:50, 11 December 2012
Bug when I doing a ./build201:16, 26 October 2012
Missing exact match results112:23, 16 October 2012
Accentless search not working014:26, 24 September 2012
How can I tell if search suggestions are working?415:38, 29 August 2012
First page
First page
Previous page
Previous page
Last page
Last page

cleanup old indexes

i'me useing crontab to generate every day a new index.

30 1 * * * /etc/init.d/lsearch stop && cd /usr/local/search/ls2 && /usr/local/search/ls2/build > /dev/null 2>&1 && /etc/init.d/lsearch start

every day new files and folders are generated in the update folder: wiki wiki.hl wiki.links wiki.related wiki.spell /usr/local/search/ls2/indexes/update/wiki.related/20131116013015

can i save delete all (or only old folders)? is there an automated cleanup script to check for the old files and folders? my installation is about 2 years and a have tons of old files an folders!

regards, josef lahmer (josy1024@gmail.com)

82.218.136.3511:49, 27 November 2013

Not available

Not available in download, where to find it?

Juandev (talk)17:19, 28 October 2013

How do I get the lsearchd script to run in Upstart?

I cannot get it to work and am a sad panda. I am using CentOS 6.4. It runs fine when I run it in a screen, but I'd like to run it on boot for obvious reasons.

24.99.91.22119:50, 9 October 2013

Problems getting started

MediaWiki 1.16.alpha on Ubuntu. Installed java-6-sun-1.6.0.24 and Apache Ant 1.7.0. Installed lucene-search-bin-2.1.3.tar.gz into /my/lucene/lucene-search-2.1.3.

Configs are created:

./configure /path/to/wiki/install/root/

Running

./build

returns

Dumping MyWiki...
MediaWiki lucene-search indexer - rebuild all indexes associated with a database.
Trying config file at path /root/.lsearch.conf
Trying config file at path /root/my/lucene/lucene-search-2.1.3/lsearch.conf
MediaWiki lucene-search indexer - index builder from xml database dumps.

0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
83   [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki.links
141  [main] INFO  org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links...
232  [main] FATAL org.wikimedia.lsearch.importer.Importer  - Cannot store link analytics: Premature end of file.
java.io.IOException: Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
        at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
235  [main] ERROR org.wikimedia.lsearch.importer.BuildAll  - Error during rebuild of MyWiki : Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
java.io.IOException: Trying to hardlink nonexisting file /root/my/lucene/lucene-search-2.1.3/indexes/import/MyWiki
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
        at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
Finished build in 0s

Creating a dump xml with maintenance/dumpBackup.php also fails. The command returns no error at all.

./lsearchd

returns

RMI registry started.
Trying config file at path /root/.lsearch.conf
Trying config file at path /root/my/lucene/lucene-search-2.1.3/lsearch.conf
0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
581  [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound
583  [Thread-2] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Searcher started on port 8123
583  [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Indexer started on port 8321
588  [Thread-5] INFO  org.wikimedia.lsearch.search.SearcherCache  - Starting initial deployer for [MyWiki, MyWiki.hl, MyWiki.links, MyWiki.related, MyWiki.spell]

Running the test search returns a 500 server error:

Internal error in SearchEngine: MyWiki is being deployed or is not searched by this host
LSearch daemon on localhost

and adds this to the console log:

2199 [Thread-8] INFO  org.wikimedia.lsearch.frontend.HttpMonitor  - HttpMonitor thread started
2200 [pool-2-thread-1] INFO  org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/MyWiki/Boratto what:search dbname:MyWiki term:Boratto
2234 [pool-2-thread-1] INFO  org.wikimedia.lsearch.analyzers.StopWords  - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 18 ms
java.lang.RuntimeException: MyWiki is being deployed or is not searched by this host
        at org.wikimedia.lsearch.search.SearcherCache.getLocalSearcher(SearcherCache.java:388)
        at org.wikimedia.lsearch.search.WikiSearcher.<init>(WikiSearcher.java:96)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:715)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129)
        at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101)
        at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193)
        at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2251 [pool-2-thread-1] ERROR org.wikimedia.lsearch.search.SearchEngine  - Internal error in SearchEngine trying to make WikiSearcher: MyWiki is being deployed or is not searched by this host
java.lang.RuntimeException: MyWiki is being deployed or is not searched by this host
        at org.wikimedia.lsearch.search.SearcherCache.getLocalSearcher(SearcherCache.java:388)
        at org.wikimedia.lsearch.search.WikiSearcher.<init>(WikiSearcher.java:96)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:715)
        at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129)
        at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101)
        at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193)
        at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2541 [pool-2-thread-2] WARN  org.wikimedia.lsearch.frontend.HttpHandler  - Unknown request /favicon.ico

Where to start troubleshooting? Thanks in advance!

Subfader (talk)17:44, 25 August 2012

Can you check if the .xml file in dump/ is not empty?

Rainman (talk)20:07, 28 August 2012

The xml created by maintenance/dumpBackup.php? Yes it is created but empty.

I have no problems running other maintenance scripts. It returns no error message at all, that drives me crazy.

Subfader (talk)06:40, 29 August 2012
 

Just saw this problem a minute ago on my own server. I had a typo in LocalSettings.php that prevented the dump from working (I personally was missing a trailing ';').

Chiefgeek157 (talk)22:15, 28 August 2012

Hmh I can check but shouldn't the whole wiki break in such case? My wiki works fine tho.

Subfader (talk)06:41, 29 August 2012

Yes, unfortunately dumpBackup.php returns very few errors. You need to google around to make sure it works first.

Rainman (talk)09:01, 29 August 2012

When xdebug is turned on it returns this error message in php_errors.log: PHP Fatal error: Maximum function nesting level of '100' reached, aborting! in /var/www/html/w/includes/GlobalFunctions.php on line 2326 Try to turn off xdebug in php. It helped me to solve the premature end of file error when running build.

212.24.186.15813:28, 9 May 2013

Or put xdebug.max_nesting_level = 200 in your php.ini.

Leucosticte (talk)05:21, 5 October 2013
 
 
 
 
 

Special use case documentation of Lucene for Mediawiki ?

Hi,

Is there a documentation of the special syntax I can use on my wiki search form ? I'm interested in special feature provided by Lucene for Mediawiki use like ":incategory" options or other...

Fractaliste (talk)09:59, 6 September 2013

configure build problem ... possibly related to PHP BOM prepending

Hello,

I am running into a blocking issue when trying to run ./build. The specific error copied below. I am using version 2.1.3

I am also noticing a number of oddities that may be related.

1) the inclusion on BOM markers in config.inc and lsearch-global.inc. Specifically, in config.inc

  -- a BOM between "dbname=" and "wikidb"
  -- a BOM after "wgScriptPath=".  Note, the correct value from LocalSettings.php is 
  -- a BOM between "wgServer=" and "https://newwiki.west.isilon.com"

2) the xml dump from media wiki also includes a prepended BOM. I tried this outside of the Lucene context and also consistently got a BOM prepended by PHP.

Any help or clues would be appreciated.


MediaWiki lucene-search indexer - rebuild all indexes associated with a database.
Trying config file at path /root/.lsearch.conf
Trying config file at path /usr/local/lucene-search-2.1.3/lsearch.conf
MediaWiki lucene-search indexer - index builder from xml database dumps.

0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
68   [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /usr/local/lucene-search-2.1.3/indexes/import/wikidb.links
116  [main] INFO  org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links...
192  [main] FATAL org.wikimedia.lsearch.importer.Importer  - Cannot store link analytics: Content is not allowed in prolog.
java.io.IOException: Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
        at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)
194  [main] ERROR org.wikimedia.lsearch.importer.BuildAll  - Error during rebuild of wikidb : Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
java.io.IOException: Trying to hardlink nonexisting file /usr/local/lucene-search-2.1.3/indexes/import/wikidb
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97)
        at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81)
        at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)

Freddo411 (talk)17:45, 19 August 2013

Daemon not starting

Running Ubuntu 10.04 Java JDK 7 Ant 1.9.1

./lsearchd stops at this line

1213 [Thread-5] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable<wikidb.spell>$0 bound

any help getting the daemon started is appreciated

2001:558:1414:0:0:5EFE:A0A:753A20:30, 3 June 2013

Index doesn't exist

Hi all,

whenever I start searching, I get:

117919 [pool-2-thread-4] INFO org.wikimedia.lsearch.frontend.HttpHandler - query:/prefix/wiki/Ddcr?namespaces=0&offset=0&limit=10&version=2.1&iwlimit=10&searchall=0 what:prefix dbname:wiki term:Ddcr java.lang.RuntimeException: Index wiki.prefix doesn't exist

from the search server.

I went all the way down: ./configure /var/www/wiki && ./build

Any ideas?

195.124.31.21910:59, 7 May 2013

two bases DB

I have:

one server;

two wiki on this:

/var/www/vhost/enwik

/var/www/vhost/ruwik


and two different databases on this server:

wmos_ru

wmos_en


-- shared db for users and any image....

How I must setting that lucene work correct for the two bases?

Serdts (talk)17:06, 4 May 2013

problem with ./configure

when i run ./configure /var/www/mediawiki/ i get this bash: ./configure: /bin/bash^M:bad interpreter: No such file or directory.

i tried running dos2unix but doesn't solve the problem.

Any tips will be great, thanks in advance guys.

Using CentOS6 RHEL lucene search 2.1 mediawiki 1.20.4

Nick1092 (talk)04:00, 25 April 2013

nvm, fix the problem , apparently it's the path inside the configure script problem, it's ok now after i change it.

Nick1092 (talk)06:24, 25 April 2013
 

Warnings in Lucene log

Edited by another user.
Last edit: 04:10, 5 April 2013

Today I had an alert come across indicating my /var/log was becoming full... I tracked this down to a lucene log file that was becoming rather large. Reading through the log I see this "WARN" repeated over and over

13373876902 [Thread-8] WARN  org.wikimedia.lsearch.frontend.HttpMonitor  - Thread[Thread-203870,5,main] is waiting for 7195886678 ms on *
13373876902 [Thread-8] WARN  org.wikimedia.lsearch.frontend.HttpMonitor  - Thread[Thread-284796,5,main] is waiting for 4240451969 ms on bad397
13373876902 [Thread-8] WARN  org.wikimedia.lsearch.frontend.HttpMonitor  - Thread[Thread-128942,5,main] is waiting for 9614169952 ms on 10.146.48.5:8321
13373876902 [Thread-8] WARN  org.wikimedia.lsearch.frontend.HttpMonitor  - Thread[Thread-285721,5,main] is waiting for 4240449957 ms on 10.146.48.5
13373876902 [Thread-8] WARN  org.wikimedia.lsearch.frontend.HttpMonitor  - Thread[Thread-285716,5,main] is waiting for 4240449964 ms on 10.146.48.5:40105

What troubleshooting steps can I take as im not even sure what is causing the threads to wait.

Thanks.

64.29.222.7521:54, 4 April 2013

Index step slows to an unusable speed on full Wikipedia dump

Hi,

I am trying to run the 'build' step from the instructions on a full dump of the English Wikipedia site.

I find that it runs at a reasonable rate until what appears to be a spell-correction step. This starts at ~50,000 terms/second, but slows down, and I killed it at ~600 terms/second after about a week, and only about half way through at "mo...".

Are there configuration settings I should be changing to run the build step against such a big corpus?

Thanks, Barry

64.236.163.2316:25, 8 March 2013

We removed the 'spell correction' step in the indexer, and the time came down to a manageable level.

We would rather have this in, so if anyone has a better solution than stripping out functionality, I would still love to hear it.

Barry

64.236.163.2316:59, 11 March 2013
 

Support for multiple page sections in results

Not sure if this would be part of the Lucene Search itself or the MWSearch extension, but it appears that, when searching for a term that appears in multiple sections of a page, the section links for the page results map to the highest matching section but ignore any other high matching sections on that page.

As an example, I have an internal wiki setup to act as a knowledgebase and, when searching for "Macro", the expected software page appears as the top result with a section link to the highest matching section, but there are a few other sections relating to other macro items on that page that are not mentioned or linked at all. I realize one could conceivably lump those sections together under a single "macro" heading in this case, but that seems to be an inherently limited solution that would not necessarily be appropriate in other similar instances.

Is there any way to have the top page result list multiple sections above a certain match threshold, capped at some certain count? Say, up to 3 section results with >90% match probability? Or even repeats of the page match, each with another high matching section link?

Seems like it could be a useful feature. Apologies if this is not the place for this.

Timothy.russ (talk)00:24, 19 February 2013

Init.d script for Ubuntu

The script referenced in the text (also found in the archives) does not work for Ubuntu. There are two main issues.

  1. There is a typo in the "reload" section (stat-stop-daemon is not a valid command).
  2. The stop functionality does not work - in fact, it actually loads another copy of lsearchd and java.

Has anyone been able to overcome these issues and have a properly functioning script for Ubuntu?

Reference: LSearch Daemon Init Script for Ubuntu

Jlemley22:37, 26 January 2012

The information provided up to now has been incomplete. I've updated the documentation and forum post with a corrected init.d script, more complete instructions for configuring it and the necessary commands to actually add it to startup - this last step would normally be done programmatically by installers when they place an init.d script but has been overlooked when implementing the same from scratch.

Been working perfectly on my install for a while now.

Timothy.russ (talk)19:12, 12 February 2013
 

Accentless searching and hiding of redirects in suggestions

Hello,

I hope this is the right place to ask this — i'm not actually sure which specific extension controls this functionality, so i'm sorry if i've posted this in error.

Anyway, i'm having trouble with the search suggestions provided through the search bar (both in the top right and on the search page); i am trying to get it set up to work like Wikipedia's, and although it's almost there, there are two issues that i can't resolve:

1. Accentless searching does not work in the suggestion box. On Wikipedia for example i can type in 'lubeck' and it will display the result 'Lübeck'; on my own wiki, as soon as i hit the 'u' in 'lubeck', the Lübeck result will disappear (unless i have a redirect from 'Lubeck', which leads to my second problem...).

2. Redirects are always shown in the suggestions box. On Wikipedia if i type in 'united states of mex' (all lower-case), the only result that is shown is 'United States of Mexico' (mixed case). This is how i want mine to work, but instead i get all of the redirects that match that text — like 'United states of mexico' (sentence case) and 'United States of Mexicans' and so on.

Assuming Lucene is what controls this (and please let me know if it's not — i have MWSearch and TitleKey installed as well, so i suppose on of those could come into it), what changes might i need to make to get this working properly?

Configuration info: MW 1.18, lucene-search 2.1.3, MWSearch MW1.18-r90287, TitleKey MW1.18-r81220, Debian 6

Thank you!

75.173.170.6502:35, 21 December 2011

Yes, on WMF wikis this is controlled by lucene-search. Basically, what you need to do is to add something like this into your global settings:

 [Database]
 yourwiki: (prefix)

Then you can re-run the build script to build the prefix index as well. Next you need to tell MediaWiki to use lucene as backend for prefix matches. This is done by adding the following into your localsettings.php:

 # default host for mwsuggest backend
 $wgEnableLucenePrefixSearch = true;
 $wgLucenePrefixHost = '10.0.3.18'; # IP or hostname of your lucene box

For more info on WMF settings: http://noc.wikimedia.org/conf/

Rainman00:24, 23 December 2011

Is "$wgEnableMWSuggest = true" required too? Thucproa1 (talk) 03:41, 27 December 2012 (UTC)

Thucproa1 (talk)03:41, 27 December 2012
 
 

Different results on Wikipedia mirror than live Wikipedia

We've setup an English Wikipedia mirror featuring Lucene search. For some reason, the search results differ for certain queries from the live English Wikipedia. For example, the search for "player" results in: John_Player_%26_Sons (hit #1), while live Wikipedia returns "Player" (the disambiguation page).

Interestingly, John_Player_%26_Sons is the #1 hit also on live Wikipedia using the API http://en.wikipedia.org/w/api.php?action=query&format=xml&list=search&srwhat=text&srlimit=1&srsearch=player However, if the user uses the search box in an article page, he gets the (generally more desirable) "Player" result. If the user uses the search page (http://en.wikipedia.org/w/index.php?search=&button=&title=Special%3ASearch), the #1 result is John_Player_%26_Sons.

The same happens e.g. for query "American".

What is the magic behind this? How do I programmatically get to "Player" as the #1 hit for "player" on my Wikipedia mirror?

Search result snippet:

459301

  1. info search=[ner], highlight=[ner] in 344 ms
  2. no suggestion
  3. interwiki 0 0
  4. results 20

4779.728 0 John_Player_%26_Sons

  1. h.title [] [5,11] [] John+Player+%26+Sons
  2. h.text [] [5,11,36,42] [+] John+Player+%26+Sons%2C+known+simply+as+Player%27s%2C+was+a+tobacco++and+cigarette++manufacturer+based+in+Nottingham+%2C+England+.+
  3. h.text [] [] [] It+is+today+a+
  4. h.redirect [] [0,6] [] Player%27s 0%3APlayer%27s
  5. h.date 2012-08-25T04:29:23Z
  6. h.wordcount 1475
  7. h.size 10036

4088.8413 0 Player

  1. h.title [] [0,6] [] Player
  2. h.text [] [0,6] [+] Player+may+refer+to%3A+
  3. h.text [] [0,6] [] Player+%28dating%29%2C+a+man+or+woman%2C+who+has+romantic+affairs+or+sexual+relations%2C+or+both%2C+with+other+women%2C+or+men+and+
  4. h.date 2012-08-26T00:15:02Z
  5. h.wordcount 301
  6. h.size 2359
46.30.65.9213:50, 11 December 2012

Bug when I doing a ./build

When I doing a ./build, i becomming a error message.

1327 [main] INFO  org.wikimedia.lsearch.ranks.Links  - Opening for read /opt/mediawiki/lucene-search-2.1.3/indexes/search/wiki.links
java.io.IOException: no segments* file found in org.apache.lucene.store.FSDirectory@/opt/mediawiki/lucene-search-2.1.3/indexes/search/wiki.links: files:
        at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92)
        at org.wikimedia.lsearch.spell.SuggestBuilder.main(SuggestBuilder.java:98)
        at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:124)
Caused by: org.xml.sax.SAXException: no segments* file found in org.apache.lucene.store.FSDirectory@/opt/mediawiki/lucene-search-2.1.3/indexes/search/wiki.links: files:
        at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:227)
        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
        at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)
        ... 2 more
1346 [main] FATAL org.wikimedia.lsearch.spell.SuggestBuilder  - I/O error reading dump for wiki from /opt/mediawiki/lucene-search-2.1.3/dumps/dump-wiki.xml : no segments* file found in org.apache.lucene.store.FSDirectory@/opt/mediawiki/lucene-search-2.1.3/indexes/search/wiki.links: files:
217.91.108.10206:37, 26 September 2011

Try deleting the /opt/mediawiki/lucene-search-2.1.3/indexes/search/wiki.links directory before running build.

Rainman15:25, 27 September 2011
Edited by another user.
Last edit: 01:16, 26 October 2012

Is there any way to prevent this from happening? I have noticed that Rainman's workaround works however the problem seems to re-occur on a regular basis. I suppose I could set up a cronjob to delete it periodically however there must be a better way.

UPDATE: Nevermind guys I just made a bash script which deletes the links file and then executes the build script. I just replaced my normal cron job build script with this. Thanks

142.179.123.16501:55, 23 October 2012
 
 

Missing exact match results

I'm using Lucene-search on a local copy of Wikipedia and the search results are missing *some* exact matches. For example, a search for 'Melbourne' does not turn up the city Melbourne (which is the top result on Wikipedia). However, other search terms have multiple exact match results. For example, a search for 'Australia' turns up the page for 'Australia' three times. I built the lucene indexes without any errors (over quite a few days) and the Melbourne page is present in the Mysql database. I'm using lucene-search-2.1.3. Any pointers much appreciated.

220.244.174.24201:02, 15 October 2012

The pages are stored according to their database ID, so the only thing I can think of is that somehow the XML dump is corrupted such to have wrong IDs in them. Can you verify that all metadata in the XML dump attached to those pages is correct?

Rainman (talk)12:23, 16 October 2012
 

Accentless search not working

Hi, I have some problems with Lucene Search 2.1 on an up to date debian server, with Media Wiki 1.9. I have followed the installation steps from the official extension page.

I have used this commands to rebuild index:

php /var/www/wikidev/maintenance/dumpBackup.php --current -quiet > wikidb.xml

java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml notwiki_dev

This is my LocalSettings.php file require_once("$IP./extensions/MWSearch/MWSearch.php");

$wgSearchType = 'LuceneSearch';

$wgLuceneHost = '127.0.0.1';

$wgLucenePort = 8123;

$wgEnableLucenePrefixSearch = true;

$wgLucenePrefixHost = '127.0.0.1';

$wgLuceneSearchVersion = 2.1;

$wgEnableMWSuggest = true;

This is my problem: I need LuceneSearch because it's an accentless search. I have a test page "pépé" but wher i search "pepe" it can't find my page same with "bebe" and search "bébé".

Also, http://127.0.0.1:8123/search/notwiki_dev/bébé return me a 500 server error.

Daemon is running, the use of searchbar create lines in my tty, no errors inside.

Finaly, i'm not even sure that MWSearch is used...

213.30.149.5814:26, 24 September 2012

How can I tell if search suggestions are working?

I believe I have correctly configured the Search Suggestions feature in both Lucene-search (2.1) and MediaWiki (1.19). Searches work fine and are hitting the Lucene daemon. However, search suggestions are not displayed while I type in the search box, which I am assuming is what the MWSearch extension provides when coupled with the Search Suggestions variables correctly configured.

How can I confirm if search suggestions are working? What is the expected behavior? How can I tell if I have an incorrect setting?

Chiefgeek157 (talk)22:13, 28 August 2012

iirc you need $wgEnableAPI = true;

Subfader (talk)06:35, 29 August 2012

In MW 1.19 at least, $wgEnableAPI is enabled by default, so that is not the problem.

My question is, what does it look like when Search Suggestions are working correctly? Should I see any INFO messages in the lsearchd process output (like I do when actually performing a search)? What should I be seeing in the browser?

I guess another question is, does anyone have this working right now?

Chiefgeek157 (talk)14:23, 29 August 2012
 

So I used Chrome's developer console on Wikipedia and on my wiki. It is obvious that this function is working on Wikipedia for two reasons:

  1. The search suggestion box appears as I type a search
  2. The Chrome XHR console shows a Javascript request to the Wikipedia server with each keystroke in the search box

On my wiki, I do not see the Javascript interaction with my server on keystrokes, so that tells me that it is likely that the Javascript to do Search Suggestions is not being sent to the browser.

I have the following in my LocalSettings.php:

$wgSearchType = 'LuceneSearch';
$wgLuceneHost = 'mywikiserver.mydomain.com';
$wgLucenePort = 8123;
$wgEnableLucenePrefixSearch = true;
$wgLucenePrefixHost = 'mywikiserver.mydomain.com';
require_once( "$IP/extensions/MWSearch/MWSearch.php" );
$wgLuceneSearchVersion = 2.1;

My lsearch-global.conf file contains this:

[Database]
wikidb : (single) (prefix) (spell,4,2) (language,en)

When I run 'build', I see the prefix index being built. When I run 'lsearchd', I see the prefix index being loaded. I can see the searches working by watching the console output of lsearchd (just running in shell for the time being). I do not see any indication that the Search Suggestions are reaching the server (but I knew that from the Chrome console anyway).

Am I missing some configuration parameter that would enable the Javascript for Search Suggestions to be sent to the browser?

Chiefgeek157 (talk)15:14, 29 August 2012

Found it. I had to add:

$wgEnableMWSuggest = true;
Chiefgeek157 (talk)15:38, 29 August 2012
 
 
First page
First page
Previous page
Previous page
Last page
Last page