Wikimedia Performance Team/Perceived Performance

The field of web performance tends to assume that faster is always better, and has produced many guidelines over the years like RAIL that promote the existence of absolute universal values for what feels instantaneous, fast or slow. Over the years, many academic studies have looked into the subject of performance perception, and we attempt here to provide a meaningful literature review for this body of work, to shed some light on what is known, common myths and what is yet to be studied.

Magic numbers
Specific duration values are often cited in performance guidelines, and they come up regularly in academia, but their origin is always traced back to arbitrary essays that didn't use research to prove these values. One paper in particular with popular magic numbers, written by Robert B. Miller in 1968, comes up constantly and has been used as the basis for many performance guidelines, including RAIL.

Miller's numbers are pulled out of thin air, and his influential paper is only an essay, with no original research to back up the dogmatic magic numbers that have proven to be so popular. Miller develops some intellectually appealing arguments to explain his point of view, which is probably why his essay became so popular, but none of the arguments made are demonstrated through experiments. Miller also describes ideas that look bizarre nowadays, such as delaying errors on purpose to let users give up on their train of thought naturally, rather than being abruptly stopped by an instantaneous error.

The second most popular source of magic numbers is Jakob Nielsen, who like Miller position himself as an expert on the matter and refreshes Miller's magic numbers into a more modern - yet equally arbitrary and unproven - package, filtering out the wildest of Miller's theories. Nielsen's essays are probably popular for their simplicity and intellectual appeal, but puts forward magic numbers that are not backed by any research.

Study quality
The majority of studies on the matter of performance perception are of limited quality. People tend to only quote the one-liner finding, but the nature of the studies that "proved" those statements often leaves much to be desired.

Many of them are dated, looking at waiting times in increments of dozens of seconds or even minutes. Such waiting durations were relevant at the beginning of the web, but people's expectations has changed greatly over time and it seems very far-fetched to transpose people's reaction of waiting for a web page to load for 30 seconds on a desktop computer in the early 2000s to the current experience of mobile web browsing.

Other studies are on entirely different mediums than the web, making the translating of their results as universal psychological phenomenons questionable. How can the behavior of people waiting in line in a bank in the pre-internet era translate to the way people experience web performance?

Another frequent offender is the use of fake browsers and fake mobile devices. Conclusions are drawn about intricate interactions with the medium, but subjects aren't using the real thing, let alone in a real-life context.

And the most common weakness of all is that many of those studies are conducted in a lab with subjects all being students from the same university. Young people with access to higher education aren't a representative group of the human population, and their tech savviness in particular introduces a lot of bias that make these studies' results hard to unquestionably accept as universal truth that apply to everyone.

Time perception
While this doesn't tell us directly about the positive or negative feeling associated with a waiting experience, it's interesting to know how granular people's perception can be, when deciding whether or not a specific amount of performance improvement is worth pursuing.

At equal duration, it seems like people might under-estimate a black screen's duration compared to a waiting animation. And that people will tend to over-estimate short times and under-estimate long times. However that study was done on a fake mobile device using an unusual loading animation, when loading an app. A context for which people probably have a de-facto expectation of behavior, based on the other apps they use.

Context matters
The more busy people are physically, the shorter their attention windows to their mobile devices get.

While WOP stats is full of real-world studies correlating web performance improvements with better business KPIs, they are always in highly competitive contexts, where people have plenty of competition to pick from should the site at hand display sub-par performance. While at first glance we might consider that Wikipedia doesn't have direct competition for the in-depth information people seek on it, it is competing for people's attention.

Asking users about performance
A real-world study showed the correlation between the frequency of backend performance issues and user reports of bad performance. This was used as a way of establishing baseline target performance once improvements were made to the point that user report of bad performance stopped. This black box approach is effective in gauging the real opinion of users on performance, but in itself doesn't help target where the improvements need to be made.

When asked to rate their experience of loading sites over HTTP/1 and HTTP/2, people were unable to feel the difference. However, the objective difference in that study was between 20 and 40ms, which might be below the granularity that people can actually perceive.