User:Catrope/SQL talk

Indexes (AKA keys)
mysql> SHOW CREATE TABLE page; [snip] `page_id` int(8) unsigned NOT NULL AUTO_INCREMENT, `page_namespace` int(11) NOT NULL DEFAULT '0', `page_title` varbinary(255) NOT NULL DEFAULT '', `page_restrictions` tinyblob NOT NULL, `page_counter` bigint(20) unsigned NOT NULL DEFAULT '0', `page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0', `page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0', `page_random` double unsigned NOT NULL DEFAULT '0', `page_touched` varbinary(14) NOT NULL DEFAULT '', `page_latest` int(8) unsigned NOT NULL DEFAULT '0', `page_len` int(8) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`page_id`), UNIQUE KEY `name_title` (`page_namespace`,`page_title`), KEY `page_random` (`page_random`), KEY `page_len` (`page_len`), KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`) ) ENGINE=InnoDB AUTO_INCREMENT=35610364 DEFAULT CHARSET=binary |
 * page | CREATE TABLE `page` (

ORDER BY without index
mysql> EXPLAIN SELECT * FROM page ORDER BY page_touched LIMIT 10; ++-+---+--+---+--+-+--+--++ ++-+---+--+---+--+-+--+--++ ++-+---+--+---+--+-+--+--++ 1 row in set (0.00 sec) Bad signs:
 * id | select_type | table | type | possible_keys | key | key_len | ref  | rows     | Extra          |
 * 1 | SIMPLE     | page  | ALL  | NULL          | NULL | NULL    | NULL | 26647370 | Using filesort |
 * type=ALL: full table scan
 * key=NULL: no index used
 * Using filesort

Query execution:
 * Retrieve all 26M rows
 * Sort them by page_touched
 * Return first 10

ORDER BY with index
mysql> EXPLAIN SELECT * FROM page ORDER BY page_len LIMIT 10; ++-+---+---+---+--+-+--+--+---+ ++-+---+---+---+--+-+--+--+---+ ++-+---+---+---+--+-+--+--+---+ 1 row in set (0.00 sec) Filesort is gone, type=index, key=page_len, and rows=10
 * id | select_type | table | type | possible_keys | key      | key_len | ref  | rows | Extra |
 * 1 | SIMPLE     | page  | index | NULL          | page_len | 4       | NULL |   10 |       |

Query execution:
 * Look at page_len index (pre-sorted list of row pointers ordered by page_len)
 * Grab 10 row pointers off the top
 * Retrieve those rows and return them

WHERE without index
mysql> EXPLAIN SELECT * FROM page WHERE page_is_new=0 LIMIT 1; ++-+---+--+---+--+-+--+--+-+ ++-+---+--+---+--+-+--+--+-+ ++-+---+--+---+--+-+--+--+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key | key_len | ref  | rows     | Extra       |
 * 1 | SIMPLE     | page  | ALL  | NULL          | NULL | NULL    | NULL | 26647485 | Using where |

Query execution:
 * Iterate over all rows, in any order
 * For each row:
 * If page_namespace=0, return row and stop
 * If not, continue to next row
 * If no rows match, return empty result

Worst case: no rows match, entire table is scanned

WHERE with index
mysql> EXPLAIN SELECT * FROM page WHERE page_id=12345 LIMIT 1; ++-+---+---+---+-+-+---+--+---+ ++-+---+---+---+-+-+---+--+---+ ++-+---+---+---+-+-+---+--+---+ 1 row in set (0.00 sec) Seeing const is usually good
 * id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra |
 * 1 | SIMPLE     | page  | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |

Query execution:
 * Use binary search to find the row with page_id=12345
 * This scans at most 1+log(26M) = 27 rows
 * If found, return the row, otherwise return an empty result

WHERE .. LIMIT N
mysql> EXPLAIN SELECT * FROM page WHERE page_id >= 12345 ORDER BY page_id LIMIT 10; ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ 1 row in set (0.00 sec) This is a bug in MySQL: rows=13M is wrong. Note type=range
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows     | Extra       |
 * 1 | SIMPLE     | page  | range | PRIMARY       | PRIMARY | 4       | NULL | 13323844 | Using where |

Query execution:
 * Use binary search to find the row with page_id=12345 (or the first one that's higher)
 * Return that row plus the next 9

Examples of indexed and unindexed queries
Assuming there is an index on foo:
 * Good: WHERE foo = 'abc'
 * Bad: WHERE foo != 'abc'
 * Good: WHERE foo > 'abc' ORDER BY foo (or >=, <= , < , BETWEEN)
 * Bad: WHERE foo > 'abc' ORDER BY bar
 * Good: WHERE foo LIKE 'abc%'
 * Bad: WHERE foo LIKE '%abc' (or anything else that isn't a prefix match)
 * Good: WHERE foo LIKE 'abc%' ORDER BY foo
 * Bad: WHERE foo LIKE 'abc%' ORDER BY bar

mysql> EXPLAIN SELECT * FROM image WHERE img_name='Flag_of_Canada.svg'; ++-+---+---+---+-+-+---+--+---+ ++-+---+---+---+-+-+---+--+---+ ++-+---+---+---+-+-+---+--+---+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra |
 * 1 | SIMPLE     | image | const | PRIMARY       | PRIMARY | 257     | const |    1 |       |

mysql> EXPLAIN SELECT * FROM image WHERE img_name != 'Flag_of_Canada.svg' ++-+---+---+---+-+-+--++-+ ++-+---+---+---+-+-+--++-+ ++-+---+---+---+-+-+--++-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows   | Extra       |
 * 1 | SIMPLE     | image | range | PRIMARY       | PRIMARY | 257     | NULL | 814940 | Using where |

-- rows=400K is wrong mysql> EXPLAIN SELECT * FROM image WHERE img_name > 'Flag_of_Canada.svg' ORDER BY img_name LIMIT 10; ++-+---+---+---+-+-+--++-+ ++-+---+---+---+-+-+--++-+ ++-+---+---+---+-+-+--++-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows   | Extra       |
 * 1 | SIMPLE     | image | range | PRIMARY       | PRIMARY | 257     | NULL | 407464 | Using where |

-- You have to explicitly trigger a worst case here (img_name > 'z' is rare, img_name > 'A' is common) mysql> EXPLAIN SELECT * FROM image WHERE img_name > 'z' ORDER BY img_sha1 LIMIT 10; ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows | Extra                       |
 * 1 | SIMPLE     | image | range | PRIMARY       | PRIMARY | 257     | NULL | 1150 | Using where; Using filesort |

-- rows=1144 is wrong mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE 'Flag_of_%' ORDER BY img_name LIMIT 10; ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows | Extra       |
 * 1 | SIMPLE     | image | range | PRIMARY       | PRIMARY | 257     | NULL | 1144 | Using where |

mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE '%Canada.svg' ORDER BY img_name LIMIT 10; ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows | Extra       |
 * 1 | SIMPLE     | image | index | NULL          | PRIMARY | 257     | NULL |   10 | Using where |

mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE 'Flag%' ORDER BY img_sha1 LIMIT 10; ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ ++-+---+---+---+-+-+--+--+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows | Extra                       |
 * 1 | SIMPLE     | image | range | PRIMARY       | PRIMARY | 257     | NULL | 1144 | Using where; Using filesort |

Index on two fields
This works just like a phone book: CREATE INDEX name ON phonebook (lastname, firstname); Note that the order matters. We're sorting by lastname first, then by firstname:

Examples of indexed and unindexed queries

 * Good: ORDER BY lastname, firstname
 * Bad: ORDER BY firstname, lastname
 * Good: ORDER BY lastname DESC, firstname DESC
 * Bad: ORDER BY lastname ASC, firstname DESC
 * Good: WHERE lastname = 'Roberts' AND firstname = 'Daniel'
 * Bad: WHERE lastname = 'Roberts' OR firstname = 'Adam'
 * Good: WHERE lastname = 'Roberts' ORDER BY firstname
 * Bad: WHERE firstname = 'John' ORDER BY lastname
 * Good: WHERE lastname = 'Roberts' AND firstname > 'Daniel' ORDER BY firstname (or >=, <= , < , BETWEEN)
 * Bad: WHERE firstname = 'John' AND lastname > 'Roberts' ORDER BY lastname
 * Good: WHERE lastname = 'Roberts' AND firstname LIKE 'D%' ORDER BY firstname
 * Bad: WHERE firstname = 'John' AND lastname LIKE 'R%' ORDER BY lastname

mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace, page_title LIMIT 10; ++-+---+---+---++-+--+--+---+ ++-+---+---+---++-+--+--+---+ ++-+---+---+---++-+--+--+---+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key        | key_len | ref  | rows | Extra |
 * 1 | SIMPLE     | page  | index | NULL          | name_title | 261     | NULL |   10 |       |

mysql> EXPLAIN SELECT * FROM page ORDER BY page_title, page_namespace LIMIT 10; ++-+---+--+---+--+-+--+--++ ++-+---+--+---+--+-+--+--++ ++-+---+--+---+--+-+--+--++ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key | key_len | ref  | rows     | Extra          |
 * 1 | SIMPLE     | page  | ALL  | NULL          | NULL | NULL    | NULL | 26647899 | Using filesort |

mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace DESC, page_title DESC LIMIT 10; ++-+---+---+---++-+--+--+---+ ++-+---+---+---++-+--+--+---+ ++-+---+---+---++-+--+--+---+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key        | key_len | ref  | rows | Extra |
 * 1 | SIMPLE     | page  | index | NULL          | name_title | 261     | NULL |   10 |       |

-- Note that ASC is the default, so this is really namespace ASC, title DESC mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace, page_title DESC LIMIT 10; ++-+---+--+---+--+-+--+--++ ++-+---+--+---+--+-+--+--++ ++-+---+--+---+--+-+--+--++ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key | key_len | ref  | rows     | Extra          |
 * 1 | SIMPLE     | page  | ALL  | NULL          | NULL | NULL    | NULL | 26647905 | Using filesort |

mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=0 AND page_title='Wikimania' LIMIT 10; ++-+---+---+---++-+-+--+---+ ++-+---+---+---++-+-+--+---+ ++-+---+---+---++-+-+--+---+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key        | key_len | ref         | rows | Extra |
 * 1 | SIMPLE     | page  | const | name_title    | name_title | 261     | const,const |    1 |       |

mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=0 OR page_title='Wikimania' LIMIT 10; ++-+---+--+---+--+-+--+--+-+ ++-+---+--+---+--+-+--+--+-+ ++-+---+--+---+--+-+--+--+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key | key_len | ref  | rows     | Extra       |
 * 1 | SIMPLE     | page  | ALL  | name_title    | NULL | NULL    | NULL | 26647913 | Using where |

-- rows is wrong, this one is quite efficient mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=2 ORDER BY page_title LIMIT 10; ++-+---+--+---++-+---+-+-+ ++-+---+--+---++-+---+-+-+ ++-+---+--+---++-+---+-+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key       | key_len | ref   | rows    | Extra       |
 * 1 | SIMPLE     | page  | ref  | name_title    | name_title | 4       | const | 4506840 | Using where |

-- rows is wrong, this one is inefficient. The index is still used to sort by page_namespace though mysql> EXPLAIN SELECT * FROM page WHERE page_title='Catrope' ORDER BY page_namespace LIMIT 10; ++-+---+---+---++-+--+--+-+ ++-+---+---+---++-+--+--+-+ ++-+---+---+---++-+--+--+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key        | key_len | ref  | rows | Extra       |
 * 1 | SIMPLE     | page  | index | NULL          | name_title | 261     | NULL |   10 | Using where |

-- rows is wrong, this one is actually quite efficient mysql> EXPLAIN SELECT * FROM page WHERE page_namespace = 0 AND page_title >= 'Wikimania' ORDER BY page_title LIMIT 10; ++-+---+---+---++-+--++-+ ++-+---+---+---++-+--++-+ ++-+---+---+---++-+--++-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key        | key_len | ref  | rows   | Extra       |
 * 1 | SIMPLE     | page  | range | name_title    | name_title | 261     | NULL | 681676 | Using where |

mysql> EXPLAIN SELECT * FROM page WHERE page_title = 'Wikimania' AND page_namespace >= 1 ORDER BY page_namespace LIMIT 10; ++-+---+---+---++-+--+--+-+ ++-+---+---+---++-+--+--+-+ ++-+---+---+---++-+--+--+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key        | key_len | ref  | rows     | Extra       |
 * 1 | SIMPLE     | page  | range | name_title    | name_title | 261     | NULL | 13323968 | Using where |

-- rows is wrong, this one is actually quite efficient mysql> EXPLAIN SELECT * FROM page WHERE page_namespace = 0 AND page_title LIKE 'W%' ORDER BY page_title LIMIT 10; ++-+---+---+---++-+--++-+ ++-+---+---+---++-+--++-+ ++-+---+---+---++-+--++-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key        | key_len | ref  | rows   | Extra       |
 * 1 | SIMPLE     | page  | range | name_title    | name_title | 261     | NULL | 610080 | Using where |

mysql> EXPLAIN SELECT * FROM image WHERE img_user_text LIKE 'J%' ORDER BY img_timestamp; ++-+---+---+++-+--++-+ ++-+---+---+++-+--++-+ ++-+---+---+++-+--++-+ 1 row in set (0.01 sec)
 * id | select_type | table | type | possible_keys          | key                    | key_len | ref  | rows   | Extra                       |
 * 1 | SIMPLE     | image | range | img_usertext_timestamp | img_usertext_timestamp | 257     | NULL | 112420 | Using where; Using filesort |

Things you shouldn't do
mysql> EXPLAIN SELECT * FROM revision WHERE rev_user_text='Catrope' AND rev_deleted=0 ORDER BY rev_timestamp LIMIT 10; ++-+--+--+++-+---+--+-+ ++-+--+--+++-+---+--+-+ ++-+--+--+++-+---+--+-+ 1 row in set (0.01 sec)
 * Run queries that return a potentially unlimited number of rows
 * Retrieving millions of rows from the DB is slow
 * Sending them from the DB to PHP is slow
 * Processing them in PHP is slow and can lead to memory exhaustion
 * Sending the huge HTML result to the client is slow
 * So always use LIMIT
 * Run queries that scan a potentially unlimited number of rows
 * Like full table scans, and some of the stuff below
 * Run unindexed queries :)
 * Unindexed WHERE on a condition that is rarely false (e.g. WHERE rev_deleted=0) is sometimes fine
 * Filesort is never OK. ORDER BY must always be indexed
 * ORDER BY an expression is unindexed and therefore bad
 * Use OFFSET (or LIMIT n, m which is equivalent to LIMIT m OFFSET n)
 * Doesn't scale: LIMIT 10 OFFSET 5000 scans 5010 rows
 * Instead, use something like WHERE foo_id>=12345 for paging (be sure to use a unique index for this!)
 * Use COUNT, SUM, etc., they scan a potentially unlimited number of rows
 * MAX(indexed_field) with no WHERE is fast though
 * id | select_type | table   | type | possible_keys      | key                | key_len | ref   | rows | Extra       |
 * 1 | SIMPLE     | revision | ref  | usertext_timestamp | usertext_timestamp | 257     | const | 1044 | Using where |

mysql> EXPLAIN SELECT * FROM image ORDER BY img_width/img_height LIMIT 10; ++-+---+--+---+--+-+--+++ ++-+---+--+---+--+-+--+++ ++-+---+--+---+--+-+--+++ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key | key_len | ref  | rows   | Extra          |
 * 1 | SIMPLE     | image | ALL  | NULL          | NULL | NULL    | NULL | 814945 | Using filesort |

mysql> EXPLAIN SELECT * FROM image ORDER BY img_name LIMIT 10 OFFSET 5000; ++-+---+---+---+-+-+--+--+---+ ++-+---+---+---+-+-+--+--+---+ ++-+---+---+---+-+-+--+--+---+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows | Extra |
 * 1 | SIMPLE     | image | index | NULL          | PRIMARY | 257     | NULL | 5010 |       |

mysql> EXPLAIN SELECT COUNT(*) FROM image WHERE img_name LIKE 'F%'; ++-+---+---+---+-+-+--+---+--+ ++-+---+---+---+-+-+--+---+--+ ++-+---+---+---+-+-+--+---+--+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows  | Extra                    |
 * 1 | SIMPLE     | image | range | PRIMARY       | PRIMARY | 257     | NULL | 71364 | Using where; Using index |

mysql> EXPLAIN SELECT AVG(img_size) FROM image; ++-+---+---+---+--+-+--++-+ ++-+---+---+---+--+-+--++-+ ++-+---+---+---+--+-+--++-+ 1 row in set (0.04 sec)
 * id | select_type | table | type | possible_keys | key      | key_len | ref  | rows   | Extra       |
 * 1 | SIMPLE     | image | index | NULL          | img_size | 4       | NULL | 814946 | Using index |

-- Special case: img_size is indexed, so the row with the highest img_size in the entire table is easy to find mysql> EXPLAIN SELECT MAX(img_size) FROM image; ++-+---+--+---+--+-+--+--+--+ ++-+---+--+---+--+-+--+--+--+ ++-+---+--+---+--+-+--+--+--+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key | key_len | ref  | rows | Extra                        |
 * 1 | SIMPLE     | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |

mysql> EXPLAIN SELECT MAX(img_size) FROM image WHERE img_name LIKE 'F%'; ++-+---+---+---+-+-+--+---+-+ ++-+---+---+---+-+-+--+---+-+ ++-+---+---+---+-+-+--+---+-+ 1 row in set (0.00 sec)
 * id | select_type | table | type | possible_keys | key     | key_len | ref  | rows  | Extra       |
 * 1 | SIMPLE     | image | range | PRIMARY       | PRIMARY | 257     | NULL | 71364 | Using where |