Extension:ArchiveLinks/Project/status

This page is designed as a status update/work log for ArchiveLinks so anyone who is interested can easily track the status of the project.

February 2012

 * After much stalling contact has been made with Sumanah and the Internet Archive

Projected Deadlines

 * February 9th, 2012 - Data feed of live new links made available to the Internet Archive
 * Feb 10th: Currently have access problems to the toolserver, and ts admins are swamped with outage problems. Delayed on putting feed up live.


 * Feb 29th, By tomorrow this *will* be deployed.

March 2012
From March 12:
 * good news is the toolserver feed is up at http://toolserver.org/~nn123645/toolserver-feed/, bad news is that the script I'm using to populate the DB tables isn't working so an empty result


 * also note that the link I provided gives an XML result so you need to click view source to see anything

From March 28:
 * Good news is I think the script actually works, I’m getting the select queries working and am to the point where I should be inserting the data.


 * The bad news is I can’t insert the data at all on the toolserver. I keep getting MySQL Error 1290 which google tells me is a permissions related error.


 * I’m not sure if they disabled user queries due to replag or what is going on (apparently they did a schema change on the main cluster and s1 (the English Wikipedia Toolserver Cluster) is replagged by 1 week and a few days), and it’s kind of difficult to find help at 4 in the morning . This problem persists even in phpMyAdmin, so I know it’s not just my code....

(From later on the 28th):
 * I think I figured a workaround (using a different cluster) but I didn't have time to implement it due to time constraints. I'll fix the lingering problems and have it ready by Monday morning...

April 2012

 * So apparently about 6 hours later I did get an answer as to why the original toolserver cluster I was using didn't work:


 * [10:37]       [#wikimedia-toolserver] kevin_brown: 1290 mean the database is in read-only mode. The WMF is adding a column that'll be done sometime next month.


 * Anyhow I have switched to sql.toolserver.org, and the feed is now up at (with actual data in the database!):


 * http://toolserver.org/~nn123645/toolserver-feed/index.php


 * It uses the same query parameters as in the ArchiveLinks extension. Also I'm thinking I should rename the folder to something more descriptive, I'm thinking "en-wiki-link-feed" or something along those lines, any ideas?

//I do have a few names, "WikiDataFeed" or "En-Wiki-Data-Flow". Or something like that. Cheers!


 * I will set the script to a cron script to run every 20 seconds and pull 100 pages at a time....

Later on 1 April:


 * As a side note I'm still working on cron, I'm having some issues with the scheduling.


 * For the time being if you hit http://toolserver.org/~nn123645/toolserver-feed/cronscript.php it will pull over 100 pages and insert all the new links on those pages into the db.


 * There are currently 11k rows in the db...