User:Catrope/SQL talk

From mediawiki.org

Indexes (AKA keys)[edit]

mysql> SHOW CREATE TABLE page;
[snip]
| page  | CREATE TABLE `page` (
  `page_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `page_namespace` int(11) NOT NULL DEFAULT '0',
  `page_title` varbinary(255) NOT NULL DEFAULT '',
  `page_restrictions` tinyblob NOT NULL,
  `page_counter` bigint(20) unsigned NOT NULL DEFAULT '0',
  `page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `page_random` double unsigned NOT NULL DEFAULT '0',
  `page_touched` varbinary(14) NOT NULL DEFAULT '',
  `page_latest` int(8) unsigned NOT NULL DEFAULT '0',
  `page_len` int(8) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`page_id`),
  UNIQUE KEY `name_title` (`page_namespace`,`page_title`),
  KEY `page_random` (`page_random`),
  KEY `page_len` (`page_len`),
  KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`)
) ENGINE=InnoDB AUTO_INCREMENT=35610364 DEFAULT CHARSET=binary |

Index on a single field[edit]

ORDER BY without index[edit]

mysql> EXPLAIN SELECT * FROM page ORDER BY page_touched LIMIT 10;
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows     | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
|  1 | SIMPLE      | page  | ALL  | NULL          | NULL | NULL    | NULL | 26647370 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
1 row in set (0.00 sec)

Bad signs:

  • type=ALL: full table scan
  • key=NULL: no index used
  • Using filesort

Query execution:

  • Retrieve all 26M rows
  • Sort them by page_touched
  • Return first 10

ORDER BY with index[edit]

mysql> EXPLAIN SELECT * FROM page ORDER BY page_len LIMIT 10;
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------+
|  1 | SIMPLE      | page  | index | NULL          | page_len | 4       | NULL |   10 |       |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------+
1 row in set (0.00 sec)

Filesort is gone, type=index, key=page_len, and rows=10

Query execution:

  • Look at page_len index (pre-sorted list of row pointers ordered by page_len)
  • Grab 10 row pointers off the top
  • Retrieve those rows and return them

WHERE without index[edit]

mysql> EXPLAIN SELECT * FROM page WHERE page_is_new=0 LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | page  | ALL  | NULL          | NULL | NULL    | NULL | 26647485 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)

Query execution:

  • Iterate over all rows, in any order
  • For each row:
    • If page_namespace=0, return row and stop
    • If not, continue to next row
  • If no rows match, return empty result

Worst case: no rows match, entire table is scanned

WHERE with index[edit]

mysql> EXPLAIN SELECT * FROM page WHERE page_id=12345 LIMIT 1;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | page  | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)

Seeing const is usually good

Query execution:

  • Use binary search to find the row with page_id=12345
    • This scans at most 1+log(26M) = 27 rows
  • If found, return the row, otherwise return an empty result

WHERE .. LIMIT N[edit]

mysql> EXPLAIN SELECT * FROM page WHERE page_id >= 12345 ORDER BY page_id LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows     | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+
|  1 | SIMPLE      | page  | range | PRIMARY       | PRIMARY | 4       | NULL | 13323844 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+
1 row in set (0.00 sec)

This is a bug in MySQL: rows=13M is wrong. Note type=range

Query execution:

  • Use binary search to find the row with page_id=12345 (or the first one that's higher)
  • Return that row plus the next 9

Examples of indexed and unindexed queries[edit]

Assuming there is an index on foo:

  • Good: WHERE foo = 'abc'
  • Bad: WHERE foo != 'abc'
  • Good: WHERE foo > 'abc' ORDER BY foo (or >= , <= , < , BETWEEN)
  • Bad: WHERE foo > 'abc' ORDER BY bar
  • Good: WHERE foo LIKE 'abc%'
  • Bad: WHERE foo LIKE '%abc' (or anything else that isn't a prefix match)
  • Good: WHERE foo LIKE 'abc%' ORDER BY foo
  • Bad: WHERE foo LIKE 'abc%' ORDER BY bar
mysql> EXPLAIN SELECT * FROM image WHERE img_name='Flag_of_Canada.svg';
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | image | const | PRIMARY       | PRIMARY | 257     | const |    1 |       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM image WHERE img_name != 'Flag_of_Canada.svg'
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | image | range | PRIMARY       | PRIMARY | 257     | NULL | 814940 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)

-- rows=400K is wrong
mysql> EXPLAIN SELECT * FROM image WHERE img_name > 'Flag_of_Canada.svg' ORDER BY img_name LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | image | range | PRIMARY       | PRIMARY | 257     | NULL | 407464 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)

-- You have to explicitly trigger a worst case here (img_name > 'z' is rare, img_name > 'A' is common)
mysql> EXPLAIN SELECT * FROM image WHERE img_name > 'z' ORDER BY img_sha1 LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | image | range | PRIMARY       | PRIMARY | 257     | NULL | 1150 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

-- rows=1144 is wrong
mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE 'Flag_of_%' ORDER BY img_name LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | image | range | PRIMARY       | PRIMARY | 257     | NULL | 1144 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE '%Canada.svg' ORDER BY img_name LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | image | index | NULL          | PRIMARY | 257     | NULL |   10 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE 'Flag%' ORDER BY img_sha1 LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | image | range | PRIMARY       | PRIMARY | 257     | NULL | 1144 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

Index on two fields[edit]

This works just like a phone book:

CREATE INDEX name ON phonebook (lastname, firstname);

Note that the order matters. We're sorting by lastname first, then by firstname:

lastname firstname
Roberts Daniel
Roberts John
Robertson Adam
Robertson John ... ...

Examples of indexed and unindexed queries[edit]

  • Good: ORDER BY lastname, firstname
  • Bad: ORDER BY firstname, lastname
  • Good: ORDER BY lastname DESC, firstname DESC
  • Bad: ORDER BY lastname ASC, firstname DESC
  • Good: WHERE lastname = 'Roberts' AND firstname = 'Daniel'
  • Bad: WHERE lastname = 'Roberts' OR firstname = 'Adam'
  • Good: WHERE lastname = 'Roberts' ORDER BY firstname
  • Bad: WHERE firstname = 'John' ORDER BY lastname
  • Good: WHERE lastname = 'Roberts' AND firstname > 'Daniel' ORDER BY firstname (or >= , <= , < , BETWEEN)
  • Bad: WHERE firstname = 'John' AND lastname > 'Roberts' ORDER BY lastname
  • Good: WHERE lastname = 'Roberts' AND firstname LIKE 'D%' ORDER BY firstname
  • Bad: WHERE firstname = 'John' AND lastname LIKE 'R%' ORDER BY lastname
mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace, page_title LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------+
|  1 | SIMPLE      | page  | index | NULL          | name_title | 261     | NULL |   10 |       |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM page ORDER BY page_title, page_namespace LIMIT 10;
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows     | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
|  1 | SIMPLE      | page  | ALL  | NULL          | NULL | NULL    | NULL | 26647899 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace DESC, page_title DESC LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------+
|  1 | SIMPLE      | page  | index | NULL          | name_title | 261     | NULL |   10 |       |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------+
1 row in set (0.00 sec)

-- Note that ASC is the default, so this is really namespace ASC, title DESC
mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace, page_title DESC LIMIT 10;
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows     | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
|  1 | SIMPLE      | page  | ALL  | NULL          | NULL | NULL    | NULL | 26647905 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+----------+----------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=0 AND page_title='Wikimania' LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+-------------+------+-------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref         | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+-------------+------+-------+
|  1 | SIMPLE      | page  | const | name_title    | name_title | 261     | const,const |    1 |       |
+----+-------------+-------+-------+---------------+------------+---------+-------------+------+-------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=0 OR page_title='Wikimania' LIMIT 10;
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | page  | ALL  | name_title    | NULL | NULL    | NULL | 26647913 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)

-- rows is wrong, this one is quite efficient
mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=2 ORDER BY page_title LIMIT 10;
+----+-------------+-------+------+---------------+------------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key        | key_len | ref   | rows    | Extra       |
+----+-------------+-------+------+---------------+------------+---------+-------+---------+-------------+
|  1 | SIMPLE      | page  | ref  | name_title    | name_title | 4       | const | 4506840 | Using where |
+----+-------------+-------+------+---------------+------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

-- rows is wrong, this one is inefficient. The index is still used to sort by page_namespace though
mysql> EXPLAIN SELECT * FROM page WHERE page_title='Catrope' ORDER BY page_namespace LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | page  | index | NULL          | name_title | 261     | NULL |   10 | Using where |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
1 row in set (0.00 sec)

-- rows is wrong, this one is actually quite efficient
mysql> EXPLAIN SELECT * FROM page WHERE page_namespace = 0 AND page_title >= 'Wikimania' ORDER BY page_title LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+
|  1 | SIMPLE      | page  | range | name_title    | name_title | 261     | NULL | 681676 | Using where |
+----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM page WHERE page_title = 'Wikimania' AND page_namespace >= 1 ORDER BY page_namespace LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows     | Extra       |
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
|  1 | SIMPLE      | page  | range | name_title    | name_title | 261     | NULL | 13323968 | Using where |
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
1 row in set (0.00 sec)

-- rows is wrong, this one is actually quite efficient
mysql> EXPLAIN SELECT * FROM page WHERE page_namespace = 0 AND page_title LIKE 'W%' ORDER BY page_title LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+
|  1 | SIMPLE      | page  | range | name_title    | name_title | 261     | NULL | 610080 | Using where |
+----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM image WHERE img_user_text LIKE 'J%' ORDER BY img_timestamp;
+----+-------------+-------+-------+------------------------+------------------------+---------+------+--------+-----------------------------+
| id | select_type | table | type  | possible_keys          | key                    | key_len | ref  | rows   | Extra                       |
+----+-------------+-------+-------+------------------------+------------------------+---------+------+--------+-----------------------------+
|  1 | SIMPLE      | image | range | img_usertext_timestamp | img_usertext_timestamp | 257     | NULL | 112420 | Using where; Using filesort |
+----+-------------+-------+-------+------------------------+------------------------+---------+------+--------+-----------------------------+
1 row in set (0.01 sec)

Things you shouldn't do[edit]

  • Run queries that return a potentially unlimited number of rows
    • Retrieving millions of rows from the DB is slow
    • Sending them from the DB to PHP is slow
    • Processing them in PHP is slow and can lead to memory exhaustion
    • Sending the huge HTML result to the client is slow
    • So always use LIMIT
  • Run queries that scan a potentially unlimited number of rows
    • Like full table scans, and some of the stuff below
  • Run unindexed queries :)
    • Unindexed WHERE on a condition that is rarely false (e.g. WHERE rev_deleted=0) is sometimes fine
    • Filesort is never OK. ORDER BY must always be indexed
  • ORDER BY an expression is unindexed and therefore bad
  • Use OFFSET (or LIMIT n, m which is equivalent to LIMIT m OFFSET n)
    • Doesn't scale: LIMIT 10 OFFSET 5000 scans 5010 rows
    • Instead, use something like WHERE foo_id>=12345 for paging (be sure to use a unique index for this!)
  • Use COUNT(), SUM(), etc., they scan a potentially unlimited number of rows
    • MAX(indexed_field) with no WHERE is fast though
mysql> EXPLAIN SELECT * FROM revision WHERE rev_user_text='Catrope' AND rev_deleted=0 ORDER BY rev_timestamp LIMIT 10;
+----+-------------+----------+------+--------------------+--------------------+---------+-------+------+-------------+
| id | select_type | table    | type | possible_keys      | key                | key_len | ref   | rows | Extra       |
+----+-------------+----------+------+--------------------+--------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | revision | ref  | usertext_timestamp | usertext_timestamp | 257     | const | 1044 | Using where |
+----+-------------+----------+------+--------------------+--------------------+---------+-------+------+-------------+
1 row in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM image ORDER BY img_width/img_height LIMIT 10;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
|  1 | SIMPLE      | image | ALL  | NULL          | NULL | NULL    | NULL | 814945 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM image ORDER BY img_name LIMIT 10 OFFSET 5000;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
|  1 | SIMPLE      | image | index | NULL          | PRIMARY | 257     | NULL | 5010 |       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM image WHERE img_name LIKE 'F%';
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | image | range | PRIMARY       | PRIMARY | 257     | NULL | 71364 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT AVG(img_size) FROM image;
+----+-------------+-------+-------+---------------+----------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+----------+---------+------+--------+-------------+
|  1 | SIMPLE      | image | index | NULL          | img_size | 4       | NULL | 814946 | Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+--------+-------------+
1 row in set (0.04 sec)

-- Special case: img_size is indexed, so the row with the highest img_size in the entire table is easy to find
mysql> EXPLAIN SELECT MAX(img_size) FROM image;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT MAX(img_size) FROM image WHERE img_name LIKE 'F%';
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
|  1 | SIMPLE      | image | range | PRIMARY       | PRIMARY | 257     | NULL | 71364 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
1 row in set (0.00 sec)