Database optimization

Database optimization is an important consideration in MediaWiki coding, particularly when the code is intended for deployment on Wikimedia projects.

General
ORDER BY expression --> filesort == bad. Use COUNT, SUM, GROUP BY, etc.; there is no limit on rows scanned. MAX/MIN of indexed field on entire table is okay. In some cases, denormalize for performance. Add information duplicated from other tables. Use summary tables, counter tables, cache tables, etc. Think about how the DB will run your query. Add indexes where needed. Batch queries (when it makes sense).

Extensions generally shouldn't add fields to core tables. Instead, they should make their own table(s), and JOIN them if necessary.

Profiling
When in doubt, don't make assumptions; profile.

Select only needed columns
Unless all columns are needed, avoid 'SELECT * FROM ...' as this consumes resources transferring extraneous data. Select only the needed columns.

WHERE
Avoid full table scans, except in edge cases in which it would actually improve performance since almost everything is going to be selected anyway.

Indices
Weigh the advantages of an index in making queries more efficient against the efficiency losses of having to change the index when insertions, deletions, and updates are done. Generally, you shouldn't run unindexed queries. WHERE on rarely false conditions is usually okay. Unindexed ORDER BY (filesort) is never okay.

LIMIT
Use LIMIT to limit number of rows returned. When possible, also limit the number of rows scanned. Remember, LIMIT 10 OFFSET 5000 scans 5010 rows. If you can, use WHERE foo_id >= 12345 instead.

JOIN
Use JOINs rather than looping SELECT queries.

Calculations
Do calculations in such a way that the indexes are still being used.

Multiple row operations
Use multiple row operations to avoid looped INSERT and UPDATE queries.

Communication
If you have questions on this topic that are not answered by the documentation, you may wish to go on #wikimedia-tech and talk to binasher's successor (since he's gone now) or Preilly, both of whom are experts on Wikimedia and performance. If possible, be prepared to explain which queries are involved in the situation. Roan Kattouw, another MySQL optimization expert, answers questions on the wikitech-l mailing list and is sometimes available at #mediawiki.