Toolserver:User:LA2

From mediawiki.org

This page was moved from the Toolserver wiki.
Toolserver has been replaced by Toolforge. As such, the instructions here may no longer work, but may still be of historical interest.
Please help by updating examples, links, template links, etc. If a page is still relevant, move it to a normal title and leave a redirect.

LA2 is the username of Lars Aronsson, Sweden.

April 2013: Note to self: to login, "ssh la2@willow.toolserver.org". To change the weekly job: ssh hawthorn ; cronie -e

January 2012: I ask for more quota, but this somehow doesn't work, so my data collection for January and first half of February consists of many files that are 0 bytes long.

December 5, 2011: I uploaded historic linkstats to the Internet Archive for 2008, 2009, Q1 2010, Q2 2010, Q3 2010, Q4 2010, and for 2011: January, February, March, April, May, June, July, August, and September.

December 3, 2011: You can download historic linkstats, based on database dumps for commonswiki, enwiki, frwiki, nowiki, svwiki. Each line contains the count of external links from the Article, File, Author, Index, and Portal namespaces, a comma, and the domain.

November 29, 2011: My link statistics have evolved to include some more namespaces (File, Author, Index, Portal) and all available websites. This runs as an SQL query on the toolserver for all databases (~la2/linkstats.sh), except for commons.wikimedia and en.wikipedia, where it times out. For these and for historic data, I have a different script (~la2/dumplinks.pl) that can parse the database dumps (page.sql and externallinks.sql). This is fine as a proof of concept. Now, in what format should these data be made available? And what kinds of reports are useful?

November 25, 2011: You can download linkstats.csv, containing statistics on external links for a number of Wikimedia projects. This is based on the following query:

select count(*), trim(leading 'www.' from substring_index(substring_index(el_to, '/', 3),'/',-1)) as site
 from externallinks, page
 where el_from = page_id and page_namespace=0
 group by 2  having count(*) >= 10;