User talk:Rainman
From MediaWiki.org
Hey Rainman, just a sponateous thank you for your work on Lucene. Searching really feels like being improved now :-) --:Bdk: 11:03, 4 July 2007 (UTC)
Contents |
[edit] How is search a page action?
I guess it isn't except in the broadest sense that it queries and operates on batches of pages (normally we think of a page action operating on a single page). You're right that that is pretty forced. I'm taking by your choice of "search" as a value that you feel it is a fundamental implementation type and not just a use case for the extension? Writing search engines is definitely a programming discipline unto itself and within MediaWiki there are specific hooks one needs to work with.
That being the case, it should remain part of the implementation taxonomy. On Template_talk:Extension we've been trying to sort out the distinction between the two (when there is one - which there isn't always). Your contributions would be most helpful. The point is to make this code set helpful to developers. Egfrank 10:29, 18 September 2007 (UTC)
- From the taxonomy in Template_talk:Extension I think nearest match is special page, since LuceneSearch redefines the default mysql search on Special:Search. "Page action" is a bit misleading, because it implies that it's search within a page and not whole wiki. I guess that search is a core function of any wiki, and is different from other special pages, so there is some argument to single it out, especially if there are other implementations that would fall into this type... --Rainman 20:32, 18 September 2007 (UTC) - taken from my talk page.
Thanks for your thoughts - good documentation is impossible without a partnership between those who write the code and those who like to describe it. With your permission I'd like to move my question and your response to the Template_talk:Extension talk page. In the meantime, I'll respond here.
- page action implies "within page" - hadn't thought about it that way, but I see your point.
- There are other extensions related to this core function, see Category:Search extensions.
- Implementation techniques differ widely. What I've seen so far:
- page widgets using third party off-site search engines, e.g. Extension:Google
- page widgets using local search daemons, e.g. Extension:Hyper Estraier
- page widgets using default search engine, e.g. Extension:Inputbox
- special page using local search daemons, e.g. your Extension:LuceneSearch
- special page using custom SQL query, e.g. Extension:RigorousSearch
- subclassing SpecialSearch.php and then replacing the default class via the SpecialPage_initList hook. No examples yet.
- attaching functions to SpecialSearchNogomatch defined in SpecialSearch.php. No examples yet
- configuring the SearchEngine.php by setting $wgSearchType to a custom class, e.g. Extension:Wildcard search.
- patches to core code! Extension:Multi-select Namespace Search, Extension:GoogleSiteSearch
- Despite being a core function, documentation is poor: we don't even have a Manual:Search overview page or a Manual:Search extensions page to describe how to customize or extend it. On the other hand, because it is core, people probably look for help using that word.
- Even though the techniques vary widely, an extension writer still needs to make a choice among the techniques. Providing examples and documenting them as a single entity still has merit. Furthermore, back-end, any decent extension needs to consider some common questions: caching, indexing (and possibly intercepting saves to do it), and, most basically, the lovely MediaWiki version dependent multi-table join that is needed to connect a page title to its current text. Front-end, special page isn't unique to search and not all search extensions subclass special page.
So my vote would be: add it to the implementation type taxonomy in honor of it being a core function of a wiki but don't further subtype it based on details. If we had a hundreds of extensions for this core function (e.g. parser extensions) maybe we would need subtypes, but this isn't the case at present. Egfrank 09:05, 19 September 2007 (UTC)
PS Is there any page or forum where people interested in improving MediaWiki search tend to congregate?
Just to verify, OK if I cut and paste both half our our discussion over to Template_talk:Extension? Also do you want me to leave the bit here as is or replace it with a link? Egfrank 12:30, 19 September 2007 (UTC)
[edit] LuceneSearch.jar
Hi Rainman,
Would you happen to have a binary of LuceneSearch.jar that I could use on a MediaWiki installation? The documentation says one can get a binary release but doesn't say where. This would be tremendously helpful.
Thanks! Ben
- Hi Ben, there is no official download site for the binaries. E-mail me (at rainmansr -at- gmail) and I'll send you the binary release by mail - it's about 4.6MB. --Rainman 00:10, 14 December 2007 (UTC)
[edit] about mwsuggest.js
Hi! rainman. I'm from korea. so english very fool, forgive me.
I use mwsuggest. it is very good tools. thank you rainman.
It is one thing you need to know.
mwsuggest is only 10 results on suggest search. my site is duplicates of many suggest. So I want to see 100 results.
What should I do?
editing mwsuggest.js? or other file edit?
I want to know.
Have a good day. rainman. ^^ (my mediawiki version-1.13.0 rc1)
- from Korea: Thank you very much. Rainman. Your efforts will bring Progress of the world ^^.
Oh, and I was just watching a Korean movie :) Anyways, here is how you would do it:
Edit your LocalSettings.php and add this line to the end:
$wgMWSuggestTemplate="http://www.yoursite.com/api.php?action=opensearch&search={searchTerms}&namespace={namespaces}&limit=100";
Of course, change www.yoursite.com/api.php to point to api.php on your site. You will find api.php in the same directory as index.php. The crucial part is the limit=100 parameter at the end, you can tune that to get more/less results. --Rainman 18:00, 17 August 2008 (UTC)
[edit] commandline option -configfile
Hi Rainman,
just a quick idea.
Would it be much work to unify the commandline options that every program is able to deal with -configfile, if it is started on the commandline? I changed more or less quick and dirty the following classes with main-methods for our installtion. It works but I don't know if there are any side effects.
- BuildAll.java
- IncrementelUpdater.java
- Snapshot.java
- SuggestedBuilder.java
- RelatedBuilder.java
If you are interested in my changes, just send me an e-mail.
Thanks for the great work on lucene. Kind regrads, Peter --Voglerp 08:27, 23 October 2008 (UTC)
[edit] Sane default for log4j / CentOS init script
Hi Rainman,
I must say that I almost run crazy while trying to get lucene run smoothly. I had especially problems with daemonizing lucene as I am not skilled in Java and that log4j stuff took quite a time to understand. So if you could put a log4j example into your distribution that logs to a file instead of the console I'd be glad. Anyway here are the modifications I did to have lsearchd being started at boot time on a CentOS5 machine (though I am sure this isn't the best way):
lsearchd
#!/bin/sh pidfile=/var/run/lsearchd.pid jardir=/var/lucene # put your jar dir here! java -Djava.rmi.server.codebase=file://$jardir/LuceneSearch.jar -Djava.rmi.server.hostname=$HOSTNAME -jar $jardir/LuceneSearch.jar $* >/dev/null 2>&1 & echo $! > /var/run/lsearchd.pid
/etc/init.d/lsearchd
# chkconfig: 2345 80 20
# description: Apache Lucene is a high-performance, full-featured text \
# search engine library written entirely in Java
# processname: lsearchd
# config: /etc/lsearch.conf
# pidfile: /var/run/lsearchd.pid
# Source function library.
. /etc/rc.d/init.d/functions
ant=/usr/bin/ant
java=/usr/bin/java
prog=lsearchd
basedir=/var/lucene
pidfile=${PIDFILE-/var/run/lsearchd.pid}
lockfile=${LOCKFILE-/var/lock/subsys/lsearchd}
start() {
echo -n $"Starting $prog: "
daemon --pidfile ${pidfile} ${basedir}/lsearchd
RETVAL=$?
echo
[ $RETVAL = 0 ] && touch ${lockfile}
return $RETVAL
}
stop() {
echo -n $"Stopping $prog: "
killproc -p ${pidfile}
RETVAL=$?
echo
[ ${RETVAL} = 0 ] && rm -f ${lockfile} ${pidfile}
}
reload() {
echo -n $"Reloading $prog: "
killproc -p ${pidfile} -HUP
RETVAL=$?
echo
}
usage() {
echo $"Usage: ${prog} {start|stop|restart|reload|status|help"
exit 1
}
# See how we were called.
case "$1" in
start) start;;
stop) stop;;
status) status -p ${pidfile} ${prog} ; RETVAL=$?;;
restart) stop && start;;
reload) reload;;
*) usage;;
esac
mwsearch.log4j
# Set root logger level and its only appender to A1. log4j.rootLogger=ERR, A1 # A1 is set to be a ConsoleAppender. log4j.appender.A1=org.apache.log4j.RollingFileAppender log4j.appender.A1.File=/var/log/lsearchd.log # A1 uses PatternLayout. log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
Regards, Frank.
[edit] Mono search?
What happened to the Mono search? This is Zac from the Mono team. I was curious why you guys went back to Java? --24.27.70.200 07:56, 4 December 2008 (UTC) (Wikipedia user ZacBowling)

