Extension talk:OAIRepository

Wikimedia OAIRepository configuration
From CommonSettings.php:

(Just guessing: probably to be included into LocalSettings.php rather than CommonSettings.php???) --Vigilius 20:41, 16 July 2008 (UTC)

Installation Instructions
An installation instruction is very much needed. The readme speaks only of adding one table, but several sql scripts are present. I could not get this to work. Please help. --Vigilius 22:23, 17 July 2008 (UTC)

This sounds like a great extension - it would be great if somebody was willing to put down a bit more information about how to use it. Bjoern 14:42, 19 June 2009 (UTC)


 * Please refer to the installation instructions on the extension's page itself. They may not work for everyone, but they have a lot of notes about all of the "gotcha"s that I ran into when doing an (eventually) successful installation of this extension.
 * Hope it helps!
 * -SColombo 15:17, 19 June 2009 (UTC)


 * Thanks. I would really like to know which scripts go on the server, and which go onto the client. I guess the instructions describe wiki-lucene configuration, rather than a wiki-wikimirror configuration. Bjoern 17:35, 19 June 2009 (UTC)
 * I've gone through the instructions and installed the extension for use in a wiki-wikimirror configuration. My notes are here. I've added the key points below. Bjoern 11:15, 20 June 2009 (UTC)

Wiki - Wiki Mirror Configuration
The extension allows you to use one wiki as a repository ('repository'), and a second wiki to mirror the repository ('client', the harvester copies page updates from the repository into the mirror). To use the extension in a wiki mirroring configuration, you need to install the extension on both wikis. (Some parts of the installation are probably not needed depending on which wiki you are on, but installing everything on both wikis basically seems to work.) One you've installed the extension, you may want to test running a few OAI queries (examples here).

After you have installed the extension on both wikis (i.e. on 'repository' and 'client'), you need to add the following lines to LocalSettings.php on the client to enable the harvester: @include( $IP.'/extensions/OAI/OAIHarvest.php' ); $oaiSourceRepository = "http://url.to.the.repository.wiki/wiki/index.php/Special:OAIRepository"; (where the url points to the 'respository' wiki, i.e. to Special:OAIRepository on that wiki).

From the extensions/OAI directory on the client, you can now can run php oaiUpdate.php You'll see messages about pages to be updated, and the harvesting is working. Page updates are working, but image updates are broken version 1.11 upwards. I've also had some problems with authetication, so I had to turn authetication off for testing (see here for extended notes). Bjoern 11:32, 20 June 2009 (UTC)

IIS
Proposing the following:

Enabling OAI Authentication with IIS:

To get OAI auth. working with IIS, you need to do the following:
 * 1) Add the $oaiAuth = true; line to your LocalSettings.php file (as documented above)
 * 2) Create the Audit/User database (use oaiuser_table.sql, included in the download) and set up access to the OAI audit/security database -- see main page
 * 3) Add one (or more) OAI users to the oaiuser table (see main page)
 * 4) Most important step: Open IIS Manager, select your wiki site (or directory) and right-click, select Properties. Under the Directory Security tab, click the Edit button and uncheck everything but anonymous access -- you will probably need to un-check the 'Integrated Windows Authentication' box.

Tested with IIS7, MW 1.13.3, OAIRepository r39772.

Status 500 when running IncrementalUpdater
At the time of this writing, the OAIRepo_body.php code does not use. There is now a bugzilla bug for this. If you need to fix it and the bug isn't resolved yet, just go into the code and find all instances where one of the new table names is called and make sure to prepend the $wgDBprefix value to it (you will have to put "GLOBAL $wgDBprefix" at the top of the same function).

Redirecting too many times
This is actually an open-ended question for anyone who could figure it out. When attempting to run the IncrementalUpdater, I get the following error: [...] 8626 [main] INFO org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 8732 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 8809 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 8942 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 9020 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 9088 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 9152 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 9287 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 9429 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 9600 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 9601 [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - I/O exception listing records: Server redirected too many  times (20) java.lang.NullPointerException at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:181) 9601 [main] WARN org.wikimedia.lsearch.oai.IncrementalUpdater  - Retry later: error while processing update for lyricwiki : null 9601 [main] INFO org.wikimedia.lsearch.oai.IncrementalUpdater  - Sleeping for 60000 ms [...] Going directly to the url which the updater accesses (of the form ) and successfully shows a page though, not infinite redirects. Any ideas? -SColombo 00:03, 18 January 2009 (UTC)


 * "Server redirected too many times" is associated with authentication failures, and Special:OAIRepository is password-protected, so the updater is probably repeatedly failing to authenticate (which looks to it like a redirect loop). —Emufarmers(T 11:16, 18 January 2009 (UTC)


 * Thanks Emufarmers! That was exactly the problem. There was one more section of lsearch.conf that needed some authentication vars.  I've fixed the docs again (and fixed some parts that I had wrong).  It's still not "working" yet, but that particular problem is subdued.  Almost there!
 * -SColombo 19:57, 18 January 2009 (UTC)


 * Got everything working now (I think)! I updated the docs as I went along and made notes about any problems that I ran into that I think might be common, so I think the installation instructions should be fairly complete/accurate now at least as a good starting point. *sleeps*
 * -SColombo 21:26, 18 January 2009 (UTC)

classnotfound exception
[root@testwiki lucene-search-2.1]# ls -l *.jar -rwxrwxrwx 1 root root 6575971 Mar 23 14:12 LuceneSearch.jar [root@testwiki lucene-search-2.1]# java -Xmx256m -cp LuceneSearch.jar org.wikimedia.lsearch.ranks.RankBuilder wikidb.xml wikidb Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/lsearch/ranks/RankBuilder Caused by: java.lang.ClassNotFoundException: org.wikimedia.lsearch.ranks.RankBuilder at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:336) Could not find the main class: org.wikimedia.lsearch.ranks.RankBuilder. Program will exit. [root@testwiki lucene-search-2.1]# I am getting the problem above, what is wrong?


 * I asked this question in mediawiki-l and was told most of the OAIRepository installation instructions are obsolete. All you need to do is: install OAIRepository in the  directory, create the   table in wikidb, update LocalSettings.php with the single line to include , and run the Lucene   script. Maiden taiwan 15:25, 16 September 2009 (UTC)

Wikimedia update feed service
This is used to implement the Wikimedia update feed service, right? Is there still any way to get an OAI feed from Wikipedia? Or are we expected to use Atom? If so, is there a way to import Atom into one's wiki and thereby create a mirror of another wiki? Thanks, Tisane 16:54, 29 March 2010 (UTC)

Error reading from url / Server returned HTTP response code: 500 for URL
I installed OAIRepository in the  directory, created the   table in wikidb and updated LocalSettings.php with

Running the Lucene  script works fine, but the   script generates the following error.

Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/search/ls2/lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 134  [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - sigwikidb using base url: http://localhost/wiki/index.php?title=Special:OAIRepository 134 [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - sigwikidb using base url: http://localhost/wiki/index.php?title=Special:OAIRepository 134 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Resuming update of sigwikidb from 2011-08-31 134 [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - Reading records from http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31 193 [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - Error reading from url (will retry): http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31 244 [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - Error reading from url (will retry): http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31 293 [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - Error reading from url (will retry): http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31 344 [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - Error reading from url (will retry): http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31 java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1403) at java.net.URL.openStream(URL.java:1029) at org.wikimedia.lsearch.oai.OAIHarvester.read(OAIHarvester.java:68) at org.wikimedia.lsearch.oai.OAIHarvester.getRecords(OAIHarvester.java:46) at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:202) 397 [main] WARN  org.wikimedia.lsearch.oai.IncrementalUpdater  - Retry later: error while processing update for sigwikidb : Server returned HTTP response code: 500 for URL: http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31 java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1403) at java.net.URL.openStream(URL.java:1029) at org.wikimedia.lsearch.oai.OAIHarvester.read(OAIHarvester.java:68) at org.wikimedia.lsearch.oai.OAIHarvester.getRecords(OAIHarvester.java:46) at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:202)

When going directly to the url I get a HTTP 500 error.

What am I missing to get the  working?

(Debian 6.0, mw 1.17.0)