This is a page about mysql for the Hu laboratory. This page used to be specifically about replication, but replication has been moved into a subsection and other sections added.

Usage[edit]

Starting and Stopping[edit]

I think there are about 1.2e5 different ways to start and stop mysql, some of which are:

using mysqld_safe script
- Can be run to start a mysqld daemon. mysqld_safe adds some safety features such as restarting the server when an error occurs and logging runtime information to an error log file.
using mysqldadmin
- not used to start mysqld daemons, but can be used to do other things like flush-tables, start/stop replication, shutdown. always good in a pinch.
using launchctl (for macs)
using the launchctl wrapper: sudo /opt/local/etc/LaunchDaemons/org.macports.mysql5/mysql5.wrapper [start|stop]

--or--

using launchctl directly: sudo launchctl load -w /opt/local/etc/LaunchDaemons/org.macports.mysql5/org.macports.mysql5.plist
- this method is typically used by the system and not used by the system-admin
using mysql.server script (for linux)
- similar to the aforementioned launchctl method, but for systems that use V-style run directories
using mysqld_multi[5]
- for starting one or more daemons

General Documentation[edit]

These are things that are good to know, but didn't fit in other sections.

Programs/Utilities[edit]

mysql[edit]

mysqld_safe[edit]

mysqld_multi[edit]

This program is a perl script for starting/stopping multiple daemons at the same time.
It logs to /opt/local/share/mysql5/mysqld_multi.log (for macports)
mysqld_multi at dev.mysql.com

mysql.server & mysqld_multi.server[edit]

Scripts that are typically put into /etc/init.d (like on Linux systems) so that the MySQL server(s) are started automatically upon system start.
For Macintosh computers, I don't think these are relevant, as Macintosh has migrated to the launchd system of starting/stopping processes when the system boots.

See /Library/LaunchDaemons/org.macports.mysql5.plist, and/or /opt/local/etc/LaunchDaemons/org.macports.mysql5/org.macports.mysql5.plist (the first might be a soft-link to the latter)

my_print_defaults[edit]

mysql_config[edit]

MacPorts[edit]

These might be helpful one day:

Paths[edit]

Macports installs MySQL in the following places:

/opt/local/etc/my.cnf
/opt/local/bin/mysql*5
/opt/local/var/db/mysql5/
/opt/local/lib/mysql5/
... probably lots of other places

Gotchas[edit]

The difference between % and localhost[edit]

When setting up users, you can choose between three options for the host:

%
This means any user connecting over TCP/IP.
localhost
This means any user connecing via the socket - which means they must be on this computer.

This is totally different than connecting via 127.0.0.1 (which would use the permissions set for user@%)
string - hostname
A specific host, over TCP/IP

When does it read `my.cnf`[edit]

It seems like sometimes the mysql binaries read the configuration directives that are stored in [/opt/local]/etc/my.cnf and sometimes they don't. I haven't figured out why this is; maybe just some programs/scripts know about my.cnf.

Locking[edit]

types[edit]

shared locks / read locks: many clients can read a shared-locked resource
exclusive locks / write locks: only one client can access/change this

Storage Engines[edit]

Attribute	MyISAM	InnoDB
transactions	no	yes
lock-granualrity	table	row
storage	split files (3 of them)	tablespace (possibly one file)
referential integrity	no	yes
primary key with data	no	yes
MySQL caches data records	no	yes
full-text indices	yes	no
select count(*)	fast (keeps a count on-hand)	slow (must count each time)

Replication[edit]

File:Mysql replication.png

Simple mysql replication

It is important to understand how MySQL replication works. There is a lot of documentation on the web, but here are a few notes I have made through my process of learning:

This process is as simple as:

configure the master
configure the slave
start both servers
connect the slave to the master

Russell Dyer, author of MySQL in a Nutshell explains on this website the rational for having a simple replication setup.

One of the difficulties with a large and active MySQL database is making clean backups without having to bring the server down. Otherwise, a backup may slow down the system and there may be inconsistency with data, since related tables may be changed while another is being backed up. Taking the server down will ensure consistency of data, but it means interruption of service to users. Sometimes this is necessary and unavoidable, but daily server outages for backing up data may be unacceptable. A simple alternative method to ensure reliable backups without having to shut down the server daily is to set up replication for MySQL.

For the Hu-laboratory we're going to set up a single master and slave pair on the same server. This means we're dealing with two issues:

replication between master & slave
running concurrent mysqld processes at the same time

Chapter 1 - Why Replication?[edit]

Why would you want to set up replication?

High availability: In case of catastrophe or system-failure, you can (automagically) fall back to a slave.

Load Balancing: For large systems you can spread the load out over several machines/daemons, thereby increasing efficiency and speed.

Off-site processing: Don't disturb the master server when doing database-intensive processing.

For us it is a matter of availability. We want to have one database (the master) that is always available, instead of having a long period of latency for backups. With this system we can backup the slave at any time without affecting what goes on with the production sites.

Chapter 2 - Replication Basics[edit]

In this tutorial we will be using a single master server and a single slave server. The master is the control-point of replication and all changes should happen on the master and get "pushed" to the slave. (In actuality, the slave will ask for changes and execute them.) This is not a synchronous mechanism - meaning the changes to the slave might not happen in the same order that they took place on the master. In theory this sounds bad, but in practice it turns out to be alright. Synchronous replication is slower and has higher fidelity, but is harder to setup. For our purposes using the more simple method is sufficient.

Essentially the replication process consists of these steps:

an SQL query is sent to the MASTER. MySQL records this query in the bin.log file along with the log position number
the SLAVE database checks (through an IO process) the bin.log file for any changes
when a change is encountered the SLAVE copies the SQL statement to it's relay.log file, along with the position number and records this number in the master.info file
the SLAVE executes the SQL andqueries the MASTER
the SLAVE checks the results of both queries for consistency and issues any errors if necessary

Chapter 3 - The `bin-log` file(s)[edit]

The binary log is a set of files that keep track of what changes have been made on the master server so that the same changes can be made on the slave. Here's a brief section from MySQL High Availability (available through the library as an e-book):

The previous chapter included a very brief introduction to the binary log. In this chapter, we will fill in more details and give a more thorough description of the binary log structure, the replication event format, and how to use the mysqlbinlog tool to investigate and work with the contents of binary logs.
The binary log records changes made to the database so that the same changes can be made on any of the slaves as well. Since the binary log normally keeps a record of all changes, you can also use it for auditing purposes to see what happened in the database, and for PITR by playing back the binary log to a server, repeating changes that were recorded in the binary log.
The binary log contains only statements that could change the database. Note that statements that do not change the database but that could potentially change the database are logged. The most notable statements are those that optionally make a change, such as DROP TABLE IF EXISTS or CREATE TABLE IF NOT EXISTS, along with statements such as DELETE and UPDATE that have WHERE conditions that don’t happen to match any rows.
SELECT statements are not normally logged, since they do not make any changes to any database. There are, however, exceptions.
Transactions on a server are not normally executed in sequence, one after the other, but are rather interleaved and executed in parallel. To ensure that two transactions do not conflict and generate an inconsistent result, the server ensures the transaction execution is serializable, meaning the transactions are executed in such a way that the execution yields the same result as if they were executed in a serial order—that is, in a fixed order, one transaction after another.
The binary log records each transaction in the order that the commit took place on the master. Although transactions may be interleaved on the master, each appears as an uninterrupted sequence in the binary log, the order determined by the time of the commit.

So, we'll need to configure the master/slave pair to use the bin-log.

Chapter 4 - Configuration[edit]

Replication Settings[edit]

On the master we will need a few options configured in the my.cnf file...usually found at /etc/my.cnf. The necessary options needed are:

log-bin
server-id

and optionally:

log-bin-index
binlog-do-db
binlog-ignore-db

The explanation for each parameter is as follows:

log-bin: the base name for all the files created by the binary log

log-bin-index: gives the name of the binary log index file, which keeps a list of all bin-log files; defaults to whatever the log-bin basename is

server-id: each server must have a unique id for identification purposes

binlog-do-db, binlog-ignore-db: a list of databases to replicate or skip, respectively.

On the slave all we really need is:

server_id

and optionally:

replicate-do-db
replicate-ignore-db
replicate-do-table
replicate-ignore-table

read-only
log-slave-updates
skip-slave-start
...

I think these options are pretty self-explanatory. If you want to read more, see the Replication and Binary Logging Options and Variables documentation section on the MySQL website.

concurrent `mysqld` processes[edit]

At this point let's take a moment to focus on setting up the two mysqld servers without worrying about replication. Running Multiple MySQL Instances on One Machine has some good information. At this point we have a few options:

~~Compiling different binaries for each daemon we want to run with different settings.~~ (seems like overkill, and I don't want to have to keep up with many different binaries.)
Instantiating the same binary many times with different settings:
- ~~set on the command line~~
  - we would need a shell script to instantiate both daemons with different options because doing it manually is too much work and I forget things too easily
  - we can't set all the options we want on the commandline
- set in option files
  - we would need one option file for each daemon running
- ~~or by setting environment variables~~ (not permanent enough for me)

Each daemon will need unique settings for the following things:

--datadir
This is the most important! This path is where each daemon will keep its data. We can put each directory on a different physical disk to spread out the diskIO, or not.
--port / --bind-address
The port this daemon is bound to.

If we're going move the slave to another machine we need to substitute --bind-address for --port.
--socket
Path to the local unix socket.
~~--shared-memory-base-name~~
Used only on Windows
--pid-file
Path to the file which the daemon records it's process id.

In order to increase performance, it is suggested to spread the disk IO load over separate disks with this option:

--tmpdir

See http://dev.mysql.com/doc/refman/5.1/en/multiple-servers.html for more information.

Chapter 6 - A note about managing multiple servers[edit]

There are some scripts packaged with mysql that can help us have concurrent processes:

mysqld-multi

The mysqld_multi script is (as far as I can tell) a very basic wrapper around mysqld_safe. It loops and starts each server you tell it to.

mysqlmanager
Also you can manipulate/start/stop/etc. each server with mysqladmin -- assuming you give the script the correct socket so it knows which server to connect to.

Starting[edit]

After fiddling with this for a while, I have found the mysqld_multi script adequate for starting servers. The command goes something like this:

sudo mysqld_multi start

This will start all servers in the my.cnf file (which means both master and slave in our case.) Optionally you can put a "group number" as the last parameter. See the docs for an explanation.

Reporting what's going on[edit]

Either try:

sudo mysqld_multi report

or, (which I do a lot):

ps aux | grep mysqld

and sort through the processes.

Look carefully at the process list! -- there are processes from both the root and mysql user, as well as processes for both the mysqld binary and the mysqld_safe script.

Stopping[edit]

The mysqld_multi binary should stop the servers just like starting them, but I've found that sometimes it doesn't (for unknown reasons.) If this doesn't work:

sudo mysqld_multi stop

then try this (you'll have to stop each server independently):

sudo mysqladmin --socket=/var/run/mysqld/mysql.sock        shutdown
sudo mysqladmin --socket=/var/run/mysqld/mysql-slave.sock  shutdown

If nothing works, you can kill the processes, but I'm afraid that might interrupt a transaction in the database.

Connecting[edit]

If you want to connect to the mysql server, you have two options:

using the mysql command-line interface
using phpMyAdmin through the web-browser

command line[edit]

If you want to use the command line you can connect to the master like this:

mysql --user=$USERNAME --password=XXXXXX

or if you want to be prompted for a password

mysql --user=$USERNAME -p

You can simplify some of this by creating a ~/.my.cnf file that holds your settings. You can read more at MySQL's option files page. Here is a short sample:

# personal settings for MySQL programs

[client]
user            = daniel
password        = XXXXXXXXXXXXX

[mysql]
auto-rehash
line-numbers
pager           = more

Connecting to the slave requires the --socket flag:

mysql --user=$USERNAME -p --socket=/var/run/mysq/mysql-slave.sock

In practice normal users should never need to log onto the SLAVE database, the backup-bot should be the only person to ever log-on. If you don't have a good reason for connecting to the slave, don't.

phpMyAdmin[edit]

I think most of the servers have phpMyAdmin installed. Have a look at the Servers section on the Hardware page and then on the server's page -- hopefully there is a URL to phpMyAdmin. Also ask.

Chapter 7 - Getting things running[edit]

So far we have covered the background of replication, learned about options, and written a my.cnf configuration file but have not actually set anything up. Now comes the fun part - implementation.

The process can be broken down into sizable pieces:

set up the directories & permissions
configure the servers (if not already done)
start the servers
add a dedicated replication user to the master (with correct permissions)
lock the master
dump the master to an SQL file
load the slave from SQL file
connect the slave to the master
start slave
unlock master
debug/test

Backups[edit]

Locations[edit]

Scripts[edit]

Version 1[edit]

the first script written only dumped two databases. not sufficient for reliable backups.

#!/bin/bash

# ---------- options ------------------------
DBuser="XXXXX"
DBpassword="YYYYYYY"
DUMPDIR="/path/to/dumps.txt"
TIMESTAMP=`date +%a`


# ----------- backup colipedia ------------------
ewDBname="foo"
ewDUMPFILE="foo.$TIMESTAMP.sql"


# echo "Dumping $DBname to ... $DUMPFILE"
mysqldump $ewDBname -u $DBuser -p$DBpassword > $DUMPDIR/$ewDUMPFILE
# modified to use gzip instead of tar
gzip -c $DUMPDIR/$ewDUMPFILE > $DUMPDIR/$ewDUMPFILE.gz
# remove the huge sql file. 
rm $DUMPDIR/$ewDUMPFILE

# ----------- backup GONUTS ------------------
goDBname="bar"
goDUMPFILE="bar.$TIMESTAMP.sql"

#echo "Dumping $DB2name to ... $DUMPFILE2"
mysqldump $goDBname -u $DBuser -p$DBpassword > $DUMPDIR/$goDUMPFILE
gzip -c $DUMPDIR/$goDUMPFILE > $DUMPDIR/$goDUMPFILE.gz
rm $DUMPDIR/$goDUMPFILE

Version 2[edit]

this version checks (all) the databases and repairs them if necessary. run-time is about 6 hours, which is a little long.

#!/bin/bash
# --------------------------------------------------------------------
# This is a free shell script under GNU GPL version 2.0 or above
# Copyright (C) 2011
# --------------------------------------------------------------------
# Adapted for use by the Hu Laboratory by:
#  Daniel Renfro - Jan 2011
#  <bluecurio@gmail.com>
# --------------------------------------------------------------------

# some basic variables
USERNAME="XXXXXXx"             # USERNAME
PASSWORD="YYYYYYYYY"       # PASSWORD
HOST="localhost"                  # Hostname

# Linux bin paths, change this if it can not be autodetected via which command
MYSQL="$(which mysql)"
MYSQLDUMP="$(which mysqldump)"
MYSQLCHECK="$(which mysqlcheck)"
GZIP="$(which gzip)"

# Backup Dest directory, change this if you have someother location
DEST="/path/to/dumps.txt"

# Get hostname
#HOST="$(hostname)"
HOST='localhost'

# Get date in dd-mm-yyyy format
NOW="$(date +"%d-%m-%Y")"

# File to store current backup file
FILE=""

# Store list of databases
DBS=""

# DO NOT BACKUP these databases
SKIP="test"

# Get all database list first
if [ "$DBS" == "" ];
then
    # you can get a subset by changing the query like so:
    # DBS="$($MYSQL -u $USERNAME -h $HOST -p$PASSWORD -Bse 'show databases like "subtilis%"')"

    # or you can get ALL the databases
    DBS="$($MYSQL -u $USERNAME -h $HOST -p$PASSWORD -Bse 'show databases')"
fi


for db in $DBS
do
    skipdb=-1
    if [ "$SKIP" != "" ];
    then
	for i in $SKIP
	do
	    [ "$db" == "$i" ] && skipdb=1 || :
	done
    fi

    if [ "$skipdb" == "-1" ] ; then

        echo $db;

    	# check the database for consistency (time-consuming, but safer.)
    	$MYSQLCHECK -u $USERNAME -h $HOST -p$PASSWORD --check-only-changed --optimize --auto-repair --medium-check $db

    	# set a valid filename
		FILE="$DEST/$db.$HOSTNAME.$NOW.gz"

		# do all in one job in pipe,
		# connect to mysql using mysqldump for select mysql database
		# and pipe it out to gz file in backup dir :)
        $MYSQLDUMP -u $USERNAME -h $HOST -p$PASSWORD $db | $GZIP -9 > $FILE
    fi
done

Slow Queries[edit]

The slow query log usually is defined in /etc/my.cnf, and typically lives in /var/log/mysql/. It can be analyzed using the mk-query-digest from the maatkit software suite.

mk-query-digest /var/log/mysql/mysql-slow.log > mk-query-digest.slow.2011-07-12.out

Notes[edit]

Sandbox[edit]

The MySQL Sandbox stuff looks interesting. I installed this via cpan.

You can make a single sandbox:

cd ~daniel/working/mysql/sandbox/
make_sandbox ./mysql-5.1.51-osx10.5-x86.tar.gz

Or a replication sandbox

cd ~daniel/working/mysql/sandbox/
make_replication_sandbox ./mysql-5.1.51-osx10.5-x86.tar.gz

Links[edit]

Helpful links:

http://dev.mysql.com/doc/refman/5.0/en/replication-howto-existingdata.html