Talk:Wikimedia Performance Team/Backend performance

Good & bad examples
Leave 'em here! Sharihareswara (WMF) (talk) 16:56, 5 May 2014 (UTC)
 * hoo has https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/client/resources/wikibase.client.linkitem.init.js as a good example of lazy loading - used to be a script, "a big JS module in Wikibase that would ahve loaded ~90kib of javascript in the wikipedias". Hoo is finding it to add here. Sharihareswara (WMF) (talk) 23:01, 7 May 2014 (UTC)
 * https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/877b838aee/client/resources/Resources.php - hoo says, "note that it has *huge* dependencies on other things not usually loaded in client at all. that's the actual point, introducing dependencies" Sharihareswara (WMF) (talk) 23:04, 7 May 2014 (UTC)
 * https://github.com/wikimedia/mediawiki-extensions-examples/blob/ae0aac5f9/Example/Example.php#L104 has good examples of ResourceLoader modules. Krinkle (talk) 13:06, 9 May 2014 (UTC)
 * Migration for existing non-ResourceLoader-using extensions (bad example): ResourceLoader/Developing with ResourceLoader Krinkle (talk) 13:31, 9 May 2014 (UTC)

Bad example for "We want to deliver CSS and JavaScript fast": Extension:SyntaxHighlight GeSHi before – it used to put the styles in the section of page HTML. Matma Rex (talk) 13:17, 9 May 2014 (UTC)
 * TimedMediaHandler has historically had a lot of problems with aggressively preloading CSS/JS modules, not sure if that's been cleaned up yet. Need to dig for specific examples. --brion (talk) 15:18, 9 May 2014 (UTC)

It's kind of hard to provide examples for "We are massively cached!" that would be understandable, but I guess    each provide some kind of a bad example plus a fix for some kind of a cache. You could probably search Bugzilla for 'cache' for more :) Matma Rex (talk) 13:17, 9 May 2014 (UTC)
 * These are good -- the notion that HTML output may sit around for a long time and still needs to be supported by the CSS and JS is a basic one to hammer in. Things where old JS/CSS hang around are in some ways more obvious, but stale HTML can be insidious! --brion (talk) 15:17, 9 May 2014 (UTC)

On latency: some operations may have surprisingly variable network latency, such as looking up image files when Instant Commons is enabled. There can be some ways to manage this: --brion (talk) 15:23, 9 May 2014 (UTC)
 * first, be aware of which code paths are meant to always be fast (DB, memcache) and which may be slow (fetching File info or spam blacklists that might be cross-wiki and go over the internet)
 * when creating a code path that may be intermittently slow, DOCUMENT THIS FACT
 * be careful not to pile on requests -- for instance an external search engine might be slow to return under poor conditions while it's normally fast. Bottlenecking may cause all web servers to get caught up.
 * Consider breaking operations into smaller pieces which can be separated
 * Alternately, consider running operations in parallel -- this can be tricky though, we don't have good primitives for doing multiple HTTP fetches at once right now
 * Parsoid has parallel HTTP, though, using curl_multi. Superm401 - Talk 03:51, 10 May 2014 (UTC)

Good example (from an in-progress change) of not poisoning the cache with request-specific data (when cache is not split on that variable): Background: mw.cookie will use MediaWiki's cookie settings, so client-side developers don't think about this. These are passed via the ResourceLoader startup module. Issue: However, it doesn't use Manual:$wgCookieSecure (instead, this is documented not to be supported), since the default value ('detect') varies by the request protocol, and the startup module does not vary by protocol. Thus, the first hit could poison the module with data that will be inapplicable for other requests. Superm401 - Talk 03:51, 10 May 2014 (UTC)

Failure to ensure that new or upgraded extensions function properly with other core extensions
It should be required that all upgraded or new extensions that permit the addition of content visible to the public operate with the revision deletion/suppression module, and that any actions related to content creation be included in the logs reported to checkusers - before installation on any non-testing project. This should be a mandatory criterion before installing, even as a test example, on any "real" project; failure to do this has resulted in extremely inappropriate content addition or difficulty for checkusers to identify and block vandals. AFT5 did not have this ability designed in, and required significant re-engineering to fix the problem; after that, a promise was made not to release future extensions on production wikis, even as tests, until the ability to revision delete/suppress and checkuser was demonstrated. Then Flow was released without the ability to checkuser contributions, or to revision delete/suppress. (Incidentally, the reverse is also true - any actions taken to revision delete/suppress any form of content addition needs to show up in the deletion and/or suppression logs.)

I am certain there are other core extensions with which anything new needs to be able to interact appropriately; these are the two I'm most familiar with, so I'm giving this example. Risker (talk) 01:22, 8 May 2014 (UTC)
 * Risker, thank you so much for leaving this detailed comment! I think you are absolutely right that MediaWiki or MediaWiki extension developer needs to consider revision deletion/suppression compliance and the other criteria and tasks you mentioned. However, Performance guidelines is about *how fast* we deliver content to users, not about security concerns like the one you have mentioned. Therefore I am going to copy and paste your comment onto the talk page of Security for developers/Architecture and have already brought it to the attention of Chris Steipp, the Wikimedia Foundation software security expert. Thank you again! Sharihareswara (WMF) (talk) 14:59, 9 May 2014 (UTC)

What to do

 * Work with your product managers/dev manager/yourself to understand general performance targets before you start architecting your system. For example, a user facing application might have an acceptable latency of 200ms while a database might something like 20 ms or less, especially if further access is decided by the results of previous queries. You don't want to prematurely optimize but understand if your targets are physically possible.

General Principles

 * Always consider 99% numbers rather than averages. IOW, you don't want half of your users to have a good experience, you want all of them to. So you need to look at the 99th slowest sample to really understand performance.

Backend

 * You must consider the cache characteristics of your underlying systems and modify your testing methodology accordingly. For example, if your database has a 4 GB cache, you'll need to make sure that cache is cold before you begin by accessing 4 GB of random data before you begin.


 * Particularly with databases, but in general performance is heavily dependent on the size of the data you are storing (as well as caching) -- make sure you do your testing with realistic data sizes.


 * Spinning disks are really slow; use cache or solid state whenever you can; However as the datasize grows, the advantages of solid state (avoiding seek times) are reduced.