User talk:Rainman

From mediawiki.org
Latest comment: 10 years ago by Katkov Yury in topic Turn on stemming for Russian in LuceneSearch

Hey Rainman, just a sponateous thank you for your work on Lucene. Searching really feels like being improved now :-) --:Bdk: 11:03, 4 July 2007 (UTC)Reply

How is search a page action?[edit]

I guess it isn't except in the broadest sense that it queries and operates on batches of pages (normally we think of a page action operating on a single page). You're right that that is pretty forced. I'm taking by your choice of "search" as a value that you feel it is a fundamental implementation type and not just a use case for the extension? Writing search engines is definitely a programming discipline unto itself and within MediaWiki there are specific hooks one needs to work with.

That being the case, it should remain part of the implementation taxonomy. On Template_talk:Extension we've been trying to sort out the distinction between the two (when there is one - which there isn't always). Your contributions would be most helpful. The point is to make this code set helpful to developers. Egfrank 10:29, 18 September 2007 (UTC)Reply

From the taxonomy in Template_talk:Extension I think nearest match is special page, since LuceneSearch redefines the default mysql search on Special:Search. "Page action" is a bit misleading, because it implies that it's search within a page and not whole wiki. I guess that search is a core function of any wiki, and is different from other special pages, so there is some argument to single it out, especially if there are other implementations that would fall into this type... --Rainman 20:32, 18 September 2007 (UTC) - taken from my talk page.Reply


Thanks for your thoughts - good documentation is impossible without a partnership between those who write the code and those who like to describe it. With your permission I'd like to move my question and your response to the Template_talk:Extension talk page. In the meantime, I'll respond here.

  • page action implies "within page" - hadn't thought about it that way, but I see your point.
  • There are other extensions related to this core function, see Category:Search extensions.
  • Implementation techniques differ widely. What I've seen so far:
  • Despite being a core function, documentation is poor: we don't even have a Manual:Search overview page or a Manual:Search extensions page to describe how to customize or extend it. On the other hand, because it is core, people probably look for help using that word.
  • Even though the techniques vary widely, an extension writer still needs to make a choice among the techniques. Providing examples and documenting them as a single entity still has merit. Furthermore, back-end, any decent extension needs to consider some common questions: caching, indexing (and possibly intercepting saves to do it), and, most basically, the lovely MediaWiki version dependent multi-table join that is needed to connect a page title to its current text. Front-end, special page isn't unique to search and not all search extensions subclass special page.

So my vote would be: add it to the implementation type taxonomy in honor of it being a core function of a wiki but don't further subtype it based on details. If we had a hundreds of extensions for this core function (e.g. parser extensions) maybe we would need subtypes, but this isn't the case at present. Egfrank 09:05, 19 September 2007 (UTC)Reply

PS Is there any page or forum where people interested in improving MediaWiki search tend to congregate?

Just to verify, OK if I cut and paste both half our our discussion over to Template_talk:Extension? Also do you want me to leave the bit here as is or replace it with a link? Egfrank 12:30, 19 September 2007 (UTC)Reply

LuceneSearch.jar[edit]

Hi Rainman,

Would you happen to have a binary of LuceneSearch.jar that I could use on a MediaWiki installation? The documentation says one can get a binary release but doesn't say where. This would be tremendously helpful.

Thanks! Ben

Hi Ben, there is no official download site for the binaries. E-mail me (at rainmansr -at- gmail) and I'll send you the binary release by mail - it's about 4.6MB. --Rainman 00:10, 14 December 2007 (UTC)Reply

about mwsuggest.js[edit]

Hi! rainman. I'm from korea. so english very fool, forgive me.

I use mwsuggest. it is very good tools. thank you rainman.

It is one thing you need to know.

mwsuggest is only 10 results on suggest search. my site is duplicates of many suggest. So I want to see 100 results.

What should I do?

editing mwsuggest.js? or other file edit?

I want to know.

Have a good day. rainman. ^^ (my mediawiki version-1.13.0 rc1)

  • from Korea: Thank you very much. Rainman. Your efforts will bring Progress of the world ^^.

Oh, and I was just watching a Korean movie :) Anyways, here is how you would do it:

Edit your LocalSettings.php and add this line to the end:

$wgMWSuggestTemplate="http://www.yoursite.com/api.php?action=opensearch&search={searchTerms}&namespace={namespaces}&limit=100";

Of course, change www.yoursite.com/api.php to point to api.php on your site. You will find api.php in the same directory as index.php. The crucial part is the limit=100 parameter at the end, you can tune that to get more/less results. --Rainman 18:00, 17 August 2008 (UTC)Reply

commandline option -configfile[edit]

Hi Rainman,

just a quick idea.
Would it be much work to unify the commandline options that every program is able to deal with -configfile, if it is started on the commandline? I changed more or less quick and dirty the following classes with main-methods for our installtion. It works but I don't know if there are any side effects.

  • BuildAll.java
  • IncrementelUpdater.java
  • Snapshot.java
  • SuggestedBuilder.java
  • RelatedBuilder.java

If you are interested in my changes, just send me an e-mail.

Thanks for the great work on lucene. Kind regrads, Peter --Voglerp 08:27, 23 October 2008 (UTC)Reply

Sane default for log4j / CentOS init script[edit]

Hi Rainman,

I must say that I almost run crazy while trying to get lucene run smoothly. I had especially problems with daemonizing lucene as I am not skilled in Java and that log4j stuff took quite a time to understand. So if you could put a log4j example into your distribution that logs to a file instead of the console I'd be glad. Anyway here are the modifications I did to have lsearchd being started at boot time on a CentOS5 machine (though I am sure this isn't the best way):

lsearchd

#!/bin/sh
pidfile=/var/run/lsearchd.pid
jardir=/var/lucene # put your jar dir here!
java -Djava.rmi.server.codebase=file://$jardir/LuceneSearch.jar -Djava.rmi.server.hostname=$HOSTNAME -jar $jardir/LuceneSearch.jar $* >/dev/null 2>&1 &
echo $! > /var/run/lsearchd.pid

/etc/init.d/lsearchd

# chkconfig: 2345 80 20
# description: Apache Lucene is a high-performance, full-featured text \
#              search engine library written entirely in Java
# processname: lsearchd
# config: /etc/lsearch.conf
# pidfile: /var/run/lsearchd.pid

# Source function library.
. /etc/rc.d/init.d/functions

ant=/usr/bin/ant
java=/usr/bin/java
prog=lsearchd
basedir=/var/lucene
pidfile=${PIDFILE-/var/run/lsearchd.pid}
lockfile=${LOCKFILE-/var/lock/subsys/lsearchd}

start() {
    echo -n $"Starting $prog: "
    daemon --pidfile ${pidfile} ${basedir}/lsearchd
    RETVAL=$?
    echo
    [ $RETVAL = 0 ] && touch ${lockfile}

    return $RETVAL
}
stop() {
    echo -n $"Stopping $prog: "
    killproc -p ${pidfile}
    RETVAL=$?
    echo
    [ ${RETVAL} = 0 ] && rm -f ${lockfile} ${pidfile}
}
reload() {
    echo -n $"Reloading $prog: "
    killproc -p ${pidfile} -HUP
    RETVAL=$?
    echo
}
usage() {
    echo $"Usage: ${prog} {start|stop|restart|reload|status|help"
    exit 1
}

# See how we were called.
case "$1" in
    start)      start;;
    stop)       stop;;
    status)     status -p ${pidfile} ${prog} ; RETVAL=$?;;
    restart)    stop && start;;
    reload)     reload;;
    *)          usage;;
esac

mwsearch.log4j

# Set root logger level and its only appender to A1.
log4j.rootLogger=ERR, A1

# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.RollingFileAppender
log4j.appender.A1.File=/var/log/lsearchd.log

# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

Regards, Frank.

Mono search?[edit]

What happened to the Mono search? This is Zac from the Mono team. I was curious why you guys went back to Java? --24.27.70.200 07:56, 4 December 2008 (UTC) (Wikipedia user ZacBowling)Reply

Not working on af.wiki[edit]

Hi, Rainman

According to the Afrikaans Wikipedia's Special:Version we do have the MWSearch extension installed. However, suggestions aren't made when searching under a faulty spelling (for example, suggesting "Vietnam" when searching for "Wietnam"). Do we need to install anything extra for this function or is there another reason why it isn't showing? Anrie 11:03, 26 February 2009 (UTC)Reply

For the time being only some of the wikis have all the features enabled... We are getting some more hardware to enable it on other wikis, however, tech people in charge of getting hardware are currently busy with other stuff so it might take a while before it gets enabled.. :( --Rainman 19:25, 26 February 2009 (UTC)Reply
That's too bad. Hope they'll get around to it before too long. Thanks for the explanation, though. Anrie 21:04, 26 February 2009 (UTC)Reply

Question in serbian[edit]

Ti imaš SVN pristup? Mislim, možeš li dodavati prevod na srpski opise izmjena koje čine standardni pywiki botovi? Npr. Kada koristiš opciju fixing_redirects.py opis izmjene je "Bot: Fixing redirects". Možeš li ti to da prevedeš (trebalo bi Бот:Поправка преусмерења). Pitam te za ovo jer će nam vjerovatnoi ubuduće trebati. Ako hoćeš, odgovori na sr:User talk:Bokim. Poz. --94.250.45.75 19:21, 24 August 2009 (UTC)Reply

Extension:MWSearch only work in mw 1.13+?[edit]

Hi,

http://www.mediawiki.org/w/index.php?title=Extension:MWSearch&diff=next&oldid=175730

How did you verify that Extension:MWSearch only work in mw 1.13+? I've test it in mw 1.11.1, and it works.

--Ans 10:22, 14 September 2009 (UTC)Reply

Ok, it seem just that,
  • score and timestamp is not rendered in mw 1.11.1 search ui
  • the highlight information from lucene-search-2.1 daemon is not used in mw 1.11.1
--Ans 13:02, 14 September 2009 (UTC)Reply
In 1.13 we had a rework of some of the search internals, so it might work in older versions but incompletely as you note. --Rainman 15:26, 14 September 2009 (UTC)Reply

Search decreased to 50 results[edit]

Hi, that is a strong limitation for Commons, if you categorize images there your normaly start a search, have a look at some thousand images and pick the images fitting the search. Will this restriction be removed the next time? Thanks for answering, I bookmarked this page. --Martin H. 12:50, 5 November 2009 (UTC)Reply

It is going to get lifted when we get new servers.. Mark told me that is going to happen "soon", but it is really our of my hands now... --Rainman 13:07, 5 November 2009 (UTC)Reply
Ok, thanks for the information. Ill keep my eyes open e.g. on techblog. --Martin H. 13:45, 5 November 2009 (UTC)Reply

Questions about some Bugs assigned to you[edit]

Hi Rainman. Some time ago i filed some bugs into the bugzilla. Both are still on status new with no clue if there will happen anything in any given time.

bugzilla:17475
This one is older than a year. You made some statements, but none about how to proceed. Additionally the question what OAI is remains still unanswered.
This is a major issue, and should be dealt with, however is currently not high in the queue for the FlaggedRevs people.. I'll push for this to be resolved because we now have 2 paid developers working on FlaggedRevs alone. A simple search is going to tell you what OAI is. I think the fix is not too difficult, it is just that FlaggedRevs development is an almost separate entity and seems to at the moment progress not as fast... --Rainman 13:00, 25 March 2010 (UTC)Reply
bugzilla:22272
This one is completely answer-less. I filed this one because of a lengthy discussion on de-wp and i can assure you that many people are interested in a solution
There are some assorted problems with how diacritics etc are handled, but those are not loss-of-function bugs, just minor annoyance, so I don't see it resolved anytime soon. --Rainman 13:00, 25 March 2010 (UTC)Reply

Please don't get me wrong: I know you are a volunteer Developer and have other things to do also. So do i, but it is extremely frustrating when there is no feedback at all. Even if you decide to WONTFIX these issues i think i could cope with it. But no answer is the worst answer. Regards + have a nice day, --Gnu1742 12:05, 25 March 2010 (UTC)Reply

Unfortunately I don't have time to do any large changes, I only do server maintenance and keep everything working and online as it currently is. I think both bugs are worthwhile being fixed, but currently there is no-one working to resolve this kinds of search bugs, and this is why there has been no progress. As far as I know, the Wikimedia foundation is not planning to employ anyone to do it, and unless someone steps forward with some free time, it is not going to be dealt soon. --Rainman 13:00, 25 March 2010 (UTC)Reply

Lucene[edit]

Lucene is fantastic - thank you! I love the various options like fuzzy search and the results are always very good. --Robinson Weijman 12:49, 30 July 2010 (UTC)Reply

Turn on stemming for Russian in LuceneSearch[edit]

Hi! Thanks for your extension! I see that in Russian Wikipedia searching the forms of the word is working for Russian. How is it possible to turn it on for standalone wiki? Katkov Yury (talk) 22:02, 30 January 2014 (UTC)Reply

--

Still no good. After I set the option wikivote: (single) (language,ru) and have run ./build Lucene have started understanding Russian searches and considering stopwords. Nevertheless it can't search for the word forms. For example when I search the word "банк" (bank), I expect Lucene to find also "банков", "банки", "банке", etc.

I can see that these word forms are presented in a file LuceneSearch.jar/uzip://org/apache/lucene/analysis/ru/stemsUnicode.txt and words.Unicode.txt

Still when I search for "банк", I only get "банк" and the following log:

18409 [pool-2-thread-1] INFO  org.wikimedia.lsearch.search.SearchEngine  - Using FilterWrapper wrap: {} []
18414 [pool-2-thread-1] INFO  org.wikimedia.lsearch.search.SearchEngine  - search wikivote: query=[банк] parsed=[custom(+contents:банк^0.2 relevance ([((P contents:"банк") (P sections:"банк"^0.25))^2.0], (P alttitle:"банк"~20^2.5) (P related:"банк"^12.0)) (P alttitle:"банк"~20))] hit=[0] in 7ms using IndexSearcherMul:1391088160991
18439 [pool-2-thread-1] INFO  org.wikimedia.lsearch.spell.Suggest  - wikivote for original=[банк] suggest: [банк] using=[] in 18 ms
24262 [pool-2-thread-2] INFO  org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/wikivote/%D0%B1%D0%B0%D0%BD%D0%BA?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15%2C90%2C91%2C92%2C93%2C102%2C103%2C106%2C107%2C108%2C109%2C170%2C171&offset=0&limit=20&version=2.1&iwlimit=10&searchall=1 what:search dbname:wikivote term:банк
24263 [pool-2-thread-2] INFO  org.wikimedia.lsearch.search.SearchEngine  - Using FilterWrapper wrap: {} []

Any thoughts why is that or how to debug it?

There is also a second issue, that may be related to this one: during the index building, lsearch can't load the essages for Russian, because the filenames of php files are different: in MediaWiki 1.22 we use MessagesRu.php instead of Messages_ru.php Katkov Yury (talk) 12:49, 1 February 2014 (UTC)Reply