User:GWicke/Notes/Storage/Cassandra testing

Hosts:
 * cerium 10.64.16.147
 * praseodymium 10.64.16.149
 * xenon 10.64.0.200

Cassandra node setup
apt-get install cassandra openjdk-7-jdk

On Debian, open /etc/cassandra/cassandra-env.sh and uncomment/edit this line (localhost is key here):

JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=localhost"

Set up /etc/cassandra/cassandra.yaml according to the docs. Main things to change:
 * listen_address : set to external IP of this node
 * seed_provider / seeds : set to list of other cluster node IPs: "10.64.16.147,10.64.16.149,10.64.0.200"

(Re)start cassandra. Right after install it does not seem to be running by default, so a simple  should be enough. If it is running, the restart might involve using kill, as the init scripts use the same rmi connection to control cassandra. After this fix, the command

nodetool status

should return information and show your node (and the other nodes) as being up. Example output: root@xenon:~# nodetool status Datacenter: datacenter1

=
========== Status=Up/Down -- Address       Load       Tokens  Owns   Host ID                               Rack UN 10.64.16.149  91.4 KB    256     33.4%  c72025f6-8ad8-4ab6-b989-1ce2f4b8f665  rack1 UN 10.64.0.200   30.94 KB   256     32.8%  48821b0f-f378-41a7-90b1-b5cfb358addb  rack1 UN 10.64.16.147  58.75 KB   256     33.8%  a9b2ac1c-c09b-4f46-95f9-4cb639bb9eca  rack1
 * / State=Normal/Leaving/Joining/Moving

Rashomon setup
We need node 0.10. We are running an old Ubuntu version, so we need to do some extra work to get this : apt-get install python-software-properties python g++ make add-apt-repository ppa:chris-lea/node.js apt-get update apt-get install nodejs # this ubuntu package also includes npm and nodejs-dev On Debian unstable, we'd just do  and get the latest node including security fixes rather than the old Ubuntu PPA package.

Now onwards to the actual rashomon setup: npm config set https-proxy http://brewster.wikimedia.org:8080 npm config set proxy http://brewster.wikimedia.org:8080 cd /var/lib git clone https://github.com/gwicke/rashomon.git cd rashomon npm install cqlsh < cassandra-revisions.cql cp contrib/upstart/rashomon.conf /etc/init/rashomon.conf adduser --system --no-create-home rashomon service rashomon start
 * 1) temporary proxy setup for testing
 * 1) will package node_modules later

Create the revison tables (on one node only): cqlsh < cassandra-revisions.cql

Note re nodejs version: The PPA listed above is not quite up to date with security fixes etc. Maybe we should try to build the Debian unstable source package on Ubuntu Precise and use that if successful.

Tests

 * Import several wiki's dumps with history in parallel
 * Read back random revisions from random wikis

du -sh. 11G. ls enwiki-20131001-pages-meta-history25.xml-p026204561p026624999.7z enwiki-20131001-pages-meta-history26.xml-p026625002p027446124.7z enwiki-20131001-pages-meta-history26.xml-p027446125p028014757.7z enwiki-20131001-pages-meta-history26.xml-p028014758p028973952.7z enwiki-20131001-pages-meta-history26.xml-p028973953p029625000.7z enwiki-20131001-pages-meta-history27.xml-p029625001p030587586.7z enwiki-20131001-pages-meta-history27.xml-p030587587p031240058.7z enwiki-20131001-pages-meta-history27.xml-p031240059p031839850.7z enwiki-20131001-pages-meta-history27.xml-p031839851p032101301.7z enwiki-20131001-pages-meta-history27.xml-p032101302p033177808.7z enwiki-20131001-pages-meta-history27.xml-p033177810p034316341.7z enwiki-20131001-pages-meta-history27.xml-p034316342p035749414.7z enwiki-20131001-pages-meta-history27.xml-p035749415p037161963.7z enwiki-20131001-pages-meta-history27.xml-p037161964p038849072.7z

Configurations

 * Commit log on ssd, data files on rotating metal