Architecture Summit 2014/Performance

From MediaWiki.org
Jump to navigation Jump to search

Architecture Summit 2014/RFC clusters#Performance

James: Performance is important, both for editors and readers. RfC accepted by Mark.

(Ori just found out that he is appointed as "author" of this RFC (Nemo was). An unstructured conversation starts, discussing best practices.)

Advice[edit]

Ori says the following are red flags and advice:

  1. Network requests: adding network requests to the default page load is a sin against God.
    1. Use init for absolute essentials then lazy load all/most other modules

Design pattern:

  • Create your extension with a split between a thin chrome module that provides the bare minimum needed for you to render the entry-point to other functionality provided by your extension and the modules that actually implement that functionality
  • Search for mw.loader.using() if you are not familiar with it!

Trevor: cache fragmentation.

"If you load assets after the page has loaded, it's free" --> WRONG
  • Often leads to jankiness / flickering / rendering changes after text appears
  • Users (allegedly) don't start reading until the page has "settled down"
  • Make a rule? Not allowed to cause reflows document ready?
  • Load interfaces:
    • At the same time as the rest of the page
    • Triggered by user interaction
  • Vector sidebar mentioned as an example of something that jumps around
  • Populating an already present and correctly sized bubble mentioned as a good example

Quim: What about the process implications from the RFC? Should Ori be able to block a deployment?

Roan: I think he should be, but let's keep talking about frontend perf rather than process for now
Ori: This isn't about "what is the minimal effort to make Ori happy", it's about how we can make our users' experience better. Performance considerations are just a part of making users happy

James mentions some numbers

Ori: I don't want to set numbers as guidelines to prevent the argument of "I followed your guidelines therefore there are no performance issues".
James: Twitter/Facebook don't meet these numbers either
Quim: regressions might be taken into account
Matt: We need good metrics, and we need to know how to do things better
James: AJAXy pages like Twitter can be a performance improvement from the user's point of view
Inez: We tried this at Wikia and we didn't find evidence it was better. But we have some new insights and are considering rerunning that experiment
Trevor: It's a disruptive architectural change. Even if there's a lot of gain there, it's not low-hanging fruit
Steven: Our designers would LOVE if we did this. A great deal of the friction and lack of smooth workflows in our UX is deal to having to send users back and forth from page to page, rather than have them perform tasks which can update endllessly on one page/in one UI frame. For example: compare the new page patrolling approach of putting a toolbar floated on every page to review, versus providing a review frame inside which the content updates.
Ori: Performance is only as strong as the weakest link (ResourceLoaderLanguageDataModule example) it's easy to break. Good performance is highly dependent on people being proactive and diligent as opposed to external supervision
Nik: Monitor, identify, slap people on the wrist when they regress things, repeat several times a year? Is that OK?
Ori: That's my top priority right now
Timo: Like security, performance is not a feature, it mostly naturally flows from architecting things right. Lots of performance regressions come from things that are already architecturally unsound
Victor: Performance is a correctness issue, just like security

James's summary[edit]

  1. Adopt the init module pattern
  2. Avoid reflows (lots of good and bad examples, ask for help)
  3. Performance is often related to other code smells; think about root cause

Ori to product managers[edit]

  • If you find a spike of cranky, (seemingly) unrelated, unreasonable bugs in BZ --> probably a perofrmance problem. People notice latency subconsciously, more latency makes them grumpy
    • Users notice latency even if they don't notice that they notice
    • Dan: Do we have the ability to regression-test performance from Jenkins or something?
    • Ori: We should provide feedback to devs, we're not doing it yet; it's coming
    • Steven: Launch new feature, is slow, angry bugs come in, we think it's probably slow, devs deny it's slow, how do we prove to devs it is slow? How do we know?
      • Firebug net tab is a good start.
    • Ori: There's no burden of proof, if users are angry something is wrong. Re diagnosis, I need to build tools
    • Ori: Historic tendency for deployment to be a flag on the hill, you have to defend it to the death and if your code gets undeployed you lose. We need to move to a model where we can undeploy things
    • Matt: Don't be afraid to ask for help cross-team
    • Timo: Re tools, don't trust them blindly. Often trend lines are more meaningful than snapshots. We can recommend a list of tools, but we should caution against this*
Ori had a second point but forgot it

On shipping, deployment, recovering from bad first impressions[edit]

Nik: We gracefully undeployed CirrusSearch in November, went back and fixed problems. Went to the community that suffered, apologized, said "here's what we're gonna do to fix it" and they were cool with it.
Trevor: VE was deployed prematurely due to pressure. Was functional but had issues. Spent the next 3-6 months stabilizing, that was on our roadmap. Should have done a broader deployment /after/ stabilization. Product managers need to negotiate with stakeholders to build in time for limited deployments, then stabilization, then wider deployments. Stabilization also allows rush jobs to be cleaned up, which helps performance too. It's the first thing that goes out of the window when you're just trying to get something to work
Nik: I think BetaFeatures is a wonderful way to deploy things in a limited way
Andrew: I worry BetaFeatures will become a graveyard
James: I'm PM for BetaFeatures. There's a 6-month sundown clause. James promises Greg he'll document that
Steven: Re Trevor, arbitrary deadlines are poison. This is a larger conversation.
Trevor: We don't need to do secretive grand unveilings, there's no value in that for us. Iterate instead
Brian: First impressions matter. Things getting released prematurely hurts reputation
Trevor: I think Vector was done well
James: Some people still don't use Vector because they never looked at it since the first iteration. So I suppose we should only ever ship perfect code :P

To Do[edit]

Wrapping up re reflows. Roan says we should write some minimal guidelines for what features should *at least* do, like not cause reflows from document ready, lazy-load on user interaction, etc. Avoid people complying with letter but not spirit, but there are some hard and fast (-ish) rules we can put out there.