Toolserver:Iterating over wikis

From mediawiki.org

This page was moved from the Toolserver wiki.
Toolserver has been replaced by Toolforge. As such, the instructions here may no longer work, but may still be of historical interest.
Please help by updating examples, links, template links, etc. If a page is still relevant, move it to a normal title and leave a redirect.

Problem[edit]

Perform the same given query (e. g. SELECT ss_users FROM site_stats;) over the set of wikis received from another query from toolserver database (e. g. list of all Wikipedias or list of all wikis in en language etc.). The wikis of course do not have to be on the same SQL server.

Solution[edit]

Bash[edit]

#!/bin/bash
# Count interwiki links to Wikimedia Commons 
# Author: User:Dispenser
# License: Public domain
WIKIS=$(mysql -BN -h sql -e 'SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL;')

for dbname in $WIKIS;
do
	echo $dbname
	echo '
/* commons_interwikis_count.sh SLOW_OK */
SELECT COUNT(*)
FROM iwlinks
WHERE iwl_prefix="commons"
-- ' | mysql -h ${dbname/_/-}.rrdb.toolserver.org -BcN $dbname
	# mysql switches
	# -c  Prevent comment stripping, need to prevent the query killer
	# -N  No column name heading
	# -B Bare formatted / -t Table formatted / -H HTML output / -X XML output
done;

C[edit]

Java[edit]

Perl[edit]

PHP[edit]

// connect (or, get DISTICT from toolserver.wiki db (if you need dbnames by a-z) or
// ORDER BY server and check if different than other and if so, connect to sql-s{row->server})

$db_link_sql = mysql_connect( 'sql' );
$db_links_sql = array();
$count = 0;
foreach ( range(1, 7) as $server )
    if ( $db_links_sql[$server] = mysql_connect( 'sql-s'.$server ) )
        $count++;

// check
if ( !$db_link_sql && !$count )
    die( 'Error in connecting to database. Please try again later.' );
elseif ( !$db_link_sql || $count < 7 )
    echo '<div><strong>Connection failed to one or more servers. Some wikis may be hidden from the results.</strong></div>';

// get wikis
$result_wiki = mysql_query( "SELECT * FROM toolserver.wiki WHERE is_closed = 0", $db_link_sql );

// loop
while ( $wiki = mysql_fetch_assoc( $result_wiki ) ) {
    $db_link = $db_links_sql[$wiki['server']];

    $result = mysql_query( <whatever>, $db_link );
    // do stuff with $result
    // perhaps while()-loop mysql_fetch_object() and push into a big global array
    // So that after this foreach loop you have all results
}

Python[edit]

"""
Extremely simple pooling implementation.  Includes page title "grep" example program.
"""
import oursql

connections = {}
def getCursor(dbname, host=None):
	if not host: host="sql-s%d" % wikiservers[dbname]
	if host not in connections:
		connections[host] = oursql.connect(host=host, read_default_file='~/.my.cnf', charset="utf8", use_unicode=False)
	cursor = connections[host].cursor()
	cursor.execute('USE `%s`'%dbname)
	return cursor

cursor = getCursor(dbname='toolserver', host='sql')
cursor.execute("SELECT dbname, server FROM toolserver.wiki WHERE domain IS NOT NULL")
wikiservers = dict( cursor.fetchall() )

def titlegrep(pattern):
	print("Finding page titles matching /%s/ across all wikis" % (pattern,))
	for dbname in wikiservers:
		cursor = getCursor(dbname)
		cursor.execute("""
		SELECT page_title
		FROM page
		WHERE page_namespace=? AND page_title REGEXP ?
		""", (0, pattern))
		for (page_title,) in cursor:
			print('%-14s : %s' % (dbname, page_title))

# Find all page titles ending with .pdf (case-sensitive)
titlegrep(r'\.pdf$')

Category:Database