Manual:Database layout/MySQL Optimization/Tutorial

WikiMedia MySQL Optimization Tutorial
To many MediaWiki developers, SQL query performance and optimization is shrouded in mystery. Most know that there are efficient and inefficient queries, and that if they write an inefficient query, it will either be noticed during code review, or it will be noticed because it takes down a wiki, which will prompt an ops person to fix the breakage and yell at the developer that caused it. But few people seem to really understand how query performance works. How can you tell if a query is inefficient? How do you write efficient queries, and avoid inefficient ones? If so few people know this, it must be this difficult, mysterious thing, right? Fortunately, you don't have to be Domas or Tim to understand this. If you understand how a phone book works, you can learn this too.

This tutorial will cover the basics of how database engines in general, and MySQL specifically, execute different kinds of queries, and explain why certain queries are executed more efficiently than others and what role indexes play in this process. We will demonstrate better practices by writing efficient queries, and showing you how to use tables and indexes so they facilitate efficient queries, and discuss common pitfalls that result in inefficient queries and how to address them. We will also demonstrate how to obtain a query analysis from MySQL and how to make sense of it.

This will be a technical tutorial aimed at MediaWiki core and extension developers who want to avoid writing bad queries and reviewers that want to catch them. Only basic knowledge of SQL is needed to participate in the tutorial; and people who know nothing about things like PHP, MediaWiki internals or server-side database tweaking should be able to follow just fine.

Suggest a Query to be Used in the Demo
Below, add a link to a query you want to see optimized in the tutorial demo.

ContributionScores query
Extension:ContributionScores polls the wiki database to locate contributors with the highest contribution volume - this has NOT been tested on a high-volume wiki. The extension is intended for fledgling Wikis looking to add a fun metric for Contributors to see how much they are helping out.

It is used at translatewiki.net and occasionally causes (very) slow queries there. Example query: SELECT /* Erdemaslancan */ user_id, user_name, user_real_name, page_count, rev_count, page_count+SQRT(rev_count-page_count)*2 AS wiki_rank FROM `bw_user` u JOIN ((SELECT rev_user, COUNT(DISTINCT rev_page) AS page_count, COUNT(rev_id) AS rev_count FROM `bw_revision` WHERE rev_user NOT IN (SELECT ug_user FROM `bw_user_groups` WHERE ug_group='bot') GROUP BY rev_user ORDER BY page_count DESC LIMIT 50) UNION (SELECT rev_user, COUNT(DISTINCT rev_page) AS page_count, COUNT(rev_id) AS rev_count FROM `bw_revision` WHERE rev_user NOT IN (SELECT ug_user FROM `bw_user_groups` WHERE ug_group='bot') GROUP BY rev_user ORDER BY rev_count DESC LIMIT 50)) s ON (user_id=rev_user) ORDER BY wiki_rank DESC LIMIT 50;
 * 1) Time: 120525  6:51:18
 * 2) User@Host: twn[twn] @ localhost []
 * 3) Query_time: 28.124669  Lock_time: 0.000105 Rows_sent: 50  Rows_examined: 18860849

Uses:
 * Special page
 * Includable special page