User:Sharihareswara (WMF)/Performance guidelines

These performance guidelines aim to help MediaWiki developers avoid common performance problems that slow down MediaWiki and the site.

What To Do (summary)
Meta-TODOs:
 * Be prepared to be surprised by the performance of your code - we are famously bad at predicting this.
 * Be scrupulous about measuring performance and know where time is being spent
 * When latency is identified, take responsibility for it and make it a priority
 * (you have the best idea of usage patterns & what to test)
 * Performance is often related to other code smells; think about the root cause.

TODOs:
 * Use EXPLAIN & MYSQL DESCRIBE query to find out which indexes are affected by a specific query
 * Adopt the init module pattern
 * Avoid reflows (lots of good and bad examples, ask for help)

Details
In the worst case, an action that is sort of expensive but valuable, if it misses hitting the cache, should take at most 5 seconds. Strive for two seconds.


 * example: saving a new edit to a page
 * example: rendering a video thumbnail

Long-running queries
long-running queries that do reads should be on a dedicated server like we do with analytics, whether the read is longrunning or you have repeatable read - 1 transaction open for more than (ideally) seconds: bad idea on production servers. MySQL has to keep various rows open in indices, makes things slow in general. For other queries on other tables! ** There are research databases - use those.

Numbers: relative to perf server & to how often it will be run. (also more esoteric considerations)

Writes
Unless you're dealing with a tiny table, you need to index writes (similarly to reads). Watch out for deadlocks and for lock-wait timeouts (e.g., doing an update or a delete by primary query, rather than some secondary key).

gap locking - there are antipatterns (could be link tables, etc.) Entity doesn't have to be userID. BETTER: a few ways.
 * entity value attribute - key value map per entity - user id, pref id, pref value - it's tempting to - when you change prefs, delete all the rows for that userID, then reinsert new ones.
 * Have a JSON blob (hard to join on indiv rows)
 * change the query so you only delete by the primary key (which means you have to SELECT it first)
 * Locking select? don't do that.
 * So: select first, then decide what to do.... when you INSERT, you can INSERT IGNORE - if the row already exists, meh.

Mixing DB and non-DB transactions

 * Careful in mixing ops on an external thing like SWIFT or another database with a db transaction. Be careful. This is also about locking order. Every time you update or delete or insert anything, ask what you are locking, are there other callers, what are you doing after making the query all the way to making the commit?


 * Every web request, everything should happen in a transaction

Front-end performance (JavaScript and the browser)

 * Adopt the init module pattern
 * Avoid reflows (lots of good and bad examples, ask for help)

Wikimedia-specific gotchas

 * image loading?
 * cache & cachebusting?

axes:
 * avoid excessive contention
 * avoid locking things in an unnecessary order, espec slow, committing at the end
 * counter column you increment every time something happens. DON'T increment it in a hook before you parse a page for 10 seconds.
 * avoid things that, on cache miss, are ridiculously slow. (People think that it's ok to count * and put memcache in front of it, but misses and timeouts eat a lot of resources. Caches are not magic.)
 * Make your queries such that uncached is okay


 * What other hooks and weird extensions are happening?


 * We don't store files in a database. We store files Somewhere Else. Usually SWIFT for Wikimedia, but you could make the case to put it somewhere else.
 * If you are storing blobs, ... see if you can reuse current sys. In general, store resources under names that won't change.
 * We made the mistake of storing files under their "pretty names" - if you click Move, it ought to be fast (renaming title), but other versions of the file also have to be renamed. And Swift is distributed, so you can't just change the metadata on one volume of one system.

When to use the job queue
If the thing to be done is fast (~5 milliseconds) or needs to happen synchronously, then do it synchronously. Otherwise, put it in the job queue.

The job queue does things with the database.

Does a thing happen synchronously or async (maybe triggered by a user action)? * MW developers have sometimes thought things need to be synchronous - e.g., file uploads - and then squid timed out. So we had to move that to async * HTMLCacheUpdate used to be partly synchronous (a few backlinks, would do them immediately) - but then there were deadlocks. Users do not want to see deadlock notifications! Got changed maybe 2012 * GWToolset

We use Redis to do most of the heavy lifting - to store queue itself. * The runner part is still in-house;