Extension talk:OAIRepository

From mediawiki.org
Latest comment: 14 years ago by Tisane in topic Wikimedia update feed service

Wikimedia OAIRepository configuration[edit]

From CommonSettings.php:

# OAI repository for update server
@include( $IP.'/extensions/OAI/OAIRepo.php' );
$oaiAgentRegex = '/experimental/';
$oaiAuth = true; # broken... squid? php config? wtf
$oaiAudit = true;
$oaiAuditDatabase = 'oai';
$wgDebugLogGroups['oai'] = '/home/wikipedia/logs/oai.log';

(Just guessing: probably to be included into LocalSettings.php rather than CommonSettings.php???) --Vigilius 20:41, 16 July 2008 (UTC)Reply

Installation Instructions[edit]

An installation instruction is very much needed. The readme speaks only of adding one table, but several sql scripts are present. I could not get this to work. Please help. --Vigilius 22:23, 17 July 2008 (UTC)Reply

This sounds like a great extension - it would be great if somebody was willing to put down a bit more information about how to use it. Bjoern 14:42, 19 June 2009 (UTC)Reply

Please refer to the installation instructions on the extension's page itself. They may not work for everyone, but they have a lot of notes about all of the "gotcha"s that I ran into when doing an (eventually) successful installation of this extension.
Hope it helps!
-SColombo 15:17, 19 June 2009 (UTC)Reply
Thanks. I would really like to know which scripts go on the server, and which go onto the client. I guess the instructions describe wiki-lucene configuration, rather than a wiki-wikimirror configuration. Bjoern 17:35, 19 June 2009 (UTC)Reply
I've gone through the instructions and installed the extension for use in a wiki-wikimirror configuration. My notes are here. I've added the key points below. Bjoern 11:15, 20 June 2009 (UTC)Reply

Wiki - Wiki Mirror Configuration[edit]

The extension allows you to use one wiki as a repository ('repository'), and a second wiki to mirror the repository ('client', the harvester copies page updates from the repository into the mirror). To use the extension in a wiki mirroring configuration, you need to install the extension on both wikis. (Some parts of the installation are probably not needed depending on which wiki you are on, but installing everything on both wikis basically seems to work.) One you've installed the extension, you may want to test running a few OAI queries (examples here).

After you have installed the extension on both wikis (i.e. on 'repository' and 'client'), you need to add the following lines to LocalSettings.php on the client to enable the harvester:

@include( $IP.'/extensions/OAI/OAIHarvest.php' );
$oaiSourceRepository = "http://url.to.the.repository.wiki/wiki/index.php/Special:OAIRepository";

(where the url points to the 'respository' wiki, i.e. to Special:OAIRepository on that wiki).

From the extensions/OAI directory on the client, you can now can run

php oaiUpdate.php

You'll see messages about pages to be updated, and the harvesting is working. Page updates are working, but image updates are broken version 1.11 upwards. I've also had some problems with authetication, so I had to turn authetication off for testing (see here for extended notes). Bjoern 11:32, 20 June 2009 (UTC)Reply

IIS[edit]

Proposing the following:

Enabling OAI Authentication with IIS:

To get OAI auth. working with IIS, you need to do the following:

  1. Add the $oaiAuth = true; line to your LocalSettings.php file (as documented above)
  2. Create the Audit/User database (use oaiuser_table.sql, included in the download) and set up access to the OAI audit/security database -- see main page
  3. Add one (or more) OAI users to the oaiuser table (see main page)
  4. Most important step: Open IIS Manager, select your wiki site (or directory) and right-click, select Properties. Under the Directory Security tab, click the Edit button and uncheck everything but anonymous access -- you will probably need to un-check the 'Integrated Windows Authentication' box.

Tested with IIS7, MW 1.13.3, OAIRepository r39772.

IncrementalUpdater[edit]

Status 500 when running IncrementalUpdater[edit]

At the time of this writing, the OAIRepo_body.php code does not use $wgDBprefix. There is now a bugzilla bug for this. If you need to fix it and the bug isn't resolved yet, just go into the code and find all instances where one of the new table names is called and make sure to prepend the $wgDBprefix value to it (you will have to put "GLOBAL $wgDBprefix" at the top of the same function).

Redirecting too many times[edit]

This is actually an open-ended question for anyone who could figure it out. When attempting to run the IncrementalUpdater, I get the following error:

[...]
8626 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
8732 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
8809 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
8942 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
9020 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
9088 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
9152 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
9287 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
9429 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
9600 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Authenticating ... 
9601 [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - I/O exception listing records: Server redirected too many  times (20)
java.lang.NullPointerException
        at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:181)
9601 [main] WARN  org.wikimedia.lsearch.oai.IncrementalUpdater  - Retry later: error while processing update for lyricwiki : null
9601 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Sleeping for 60000 ms
[...]

Going directly to the url which the updater accesses (of the form http://[MY_HOST_HERE]/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=lsearch&from=2009-01-17T12:19:56Z) and successfully shows a page though, not infinite redirects.
Any ideas?
-SColombo 00:03, 18 January 2009 (UTC)Reply

"Server redirected too many times" is associated with authentication failures, and Special:OAIRepository is password-protected, so the updater is probably repeatedly failing to authenticate (which looks to it like a redirect loop). —Emufarmers(T|C) 11:16, 18 January 2009 (UTC)Reply
Thanks Emufarmers! That was exactly the problem. There was one more section of lsearch.conf that needed some authentication vars. I've fixed the docs again (and fixed some parts that I had wrong). It's still not "working" yet, but that particular problem is subdued. Almost there!
-SColombo 19:57, 18 January 2009 (UTC)Reply
Got everything working now (I think)! I updated the docs as I went along and made notes about any problems that I ran into that I think might be common, so I think the installation instructions should be fairly complete/accurate now at least as a good starting point. *sleeps*
-SColombo 21:26, 18 January 2009 (UTC)Reply

classnotfound exception[edit]

[root@testwiki lucene-search-2.1]# ls -l *.jar
-rwxrwxrwx 1 root root 6575971 Mar 23 14:12 LuceneSearch.jar
[root@testwiki lucene-search-2.1]# java -Xmx256m -cp LuceneSearch.jar org.wikimedia.lsearch.ranks.RankBuilder wikidb.xml wikidb
Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/lsearch/ranks/RankBuilder
Caused by: java.lang.ClassNotFoundException: org.wikimedia.lsearch.ranks.RankBuilder
        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:336)
Could not find the main class: org.wikimedia.lsearch.ranks.RankBuilder. Program will exit.
[root@testwiki lucene-search-2.1]# 

I am getting the problem above, what is wrong? — Preceding unsigned comment added by 166.50.205.143 (talk • contribs) 19:31, 23 March 2009

I asked this question in mediawiki-l and was told most of the OAIRepository installation instructions are obsolete. All you need to do is: install OAIRepository in the extensions directory, create the updates table in wikidb, update LocalSettings.php with the single line to include OAIRepo.php, and run the Lucene update script. Maiden taiwan 15:25, 16 September 2009 (UTC)Reply

Wikimedia update feed service[edit]

This is used to implement the m:Wikimedia update feed service, right? Is there still any way to get an OAI feed from Wikipedia? Or are we expected to use Atom? If so, is there a way to import Atom into one's wiki and thereby create a mirror of another wiki? Thanks, Tisane 16:54, 29 March 2010 (UTC)Reply

Error reading from url / Server returned HTTP response code: 500 for URL[edit]

I installed OAIRepository in the extensions directory, created the updates table in wikidb and updated LocalSettings.php with

# OAI repository for update server
@include( $IP.'/extensions/OAI/OAIRepo.php' );

Running the Lucene build script works fine, but the update script generates the following error.

Trying config file at path /root/.lsearch.conf
Trying config file at path /usr/local/search/ls2/lsearch.conf
0    [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En
134  [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - sigwikidb using base url: http://localhost/wiki/index.php?title=Special:OAIRepository
134  [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - sigwikidb using base url: http://localhost/wiki/index.php?title=Special:OAIRepository
134  [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Resuming update of sigwikidb from 2011-08-31
134  [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - Reading records from http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31
193  [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - Error reading from url (will retry): http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31
244  [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - Error reading from url (will retry): http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31
293  [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - Error reading from url (will retry): http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31
344  [main] WARN  org.wikimedia.lsearch.oai.OAIHarvester  - Error reading from url (will retry): http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31
java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1403)
        at java.net.URL.openStream(URL.java:1029)
        at org.wikimedia.lsearch.oai.OAIHarvester.read(OAIHarvester.java:68)
        at org.wikimedia.lsearch.oai.OAIHarvester.getRecords(OAIHarvester.java:46)
        at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:202)
397  [main] WARN  org.wikimedia.lsearch.oai.IncrementalUpdater  - Retry later: error while processing update for sigwikidb : Server returned HTTP response code: 500 for URL: http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31
java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1403)
        at java.net.URL.openStream(URL.java:1029)
        at org.wikimedia.lsearch.oai.OAIHarvester.read(OAIHarvester.java:68)
        at org.wikimedia.lsearch.oai.OAIHarvester.getRecords(OAIHarvester.java:46)
        at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:202)

When going directly to the url (http://[IP_HERE]/wiki/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki&from=2011-08-31) I get a HTTP 500 error.

What am I missing to get the Special:OAIRepository working?

(Debian 6.0, mw 1.17.0)

Same problem here. I also changed the "localhost" to my site, but the problem remains. When I open the link inside a browser, only a white page appears. Also the whole documentation is really bad, I searched 8 websites to install this single extension, and every website with different "solutions". Just read following solution: Try to modify OAIRepo_body.php: $this->auditTableName( $wgDBprefix . 'oaiuser' ), $this->auditTableName( $wgDBprefix . 'oaiaudit' ), $this->mAuditDb = $lb->getConnection( DB_MASTER, $wgDBprefix . 'oaiAudit', $oaiAuditDatabase ); and add global $wgDBprefix;