There is no such thing as an average user experience. No two experiences will ever be the same for a web page. There are a host of factors that impact a user’s experience, including their physical location, time of day, network congestion, whether they are on a wired or wireless connection, their device and its specs, personalization of the application – the list goes on and on. Is a user accessing an application with Chrome from Brisbane, Australia on a wired connection going to have the same experience as a user accessing the application with Safari, from Miami, FL in the US across their mobile provider’s network? I highly doubt it. Reporting on the average across a single parameter such as location or browser or connectivity does not take into account the large number of other variables that are also influencing the user experience. In experiments typically only 1 independent variable is changed at a time to ensure fair and valid results. With multiple independent variables changing, measuring the single impact of a variable is impossible. So, why are we in the web performance community focused on defining the average user experience when the list of independent variables is so large and in constant flux?
2. Most metrics don’t represent real life
Most performance testing is synthetic, which means that most metrics are synthetic and are not able to accurately quantify the true end user experience. Organizations rely on synthetic data for the majority of decision making purposes. When running synthetic tests, it is important to try and mimic real world conditions as much as possible in terms of locations of testing agents, device types, and connectivity. If the majority of users access an application with Chrome from Europe using a mobile device, testing should not be conducted from the United States on Internet Explorer using a cable modem connection.
Real User Monitoring (RUM) is attempting to change the reliance on synthetic data, but is still in it’s infancy. User Timing and Resource Timing are not yet supported by all browsers. As more browsers adopt these specifications, more decisions will be made based on RUM data.
3. Averages obscure the truth
Averages are commonly used to report on web performance metrics, such as the average page size or average page load time. I’m not sure why averages are used – maybe it’s because it is a metric that most people understand... but remember secret #1. Ilya Grigorik recently wrote about the myth of averages when it comes to page size. If the average page size is a myth, can’t the same be said for any average?
Averages make sense when the data forms a nice bell curve, but in the web performance space it is rare that we see such a distribution. Often our data looks like the image below.
I would venture to say there is no such thing as an average when it comes to web performance given the variability of the user experience. As a result, the truth is often obscured. Averages can be misleading based on outliers or based on the number of test runs performed. If only three test runs were performed, is that really indicative of average performance?
For example, say you conducted an A/B test for a configuration change to determine impact on page timings. Both averages across ten runs were 4.1 seconds. Looking just at the average one might say there is no difference in the user experience, but if you look at the individual test runs a different story is told.
Looking at the individual data points it is harder to say that the performance is the same for configuration A & B. In configuration B response times go up for 8 out of 10 users, while with configuration A response times are significantly higher for 2 out of 10 users. Which is better?
4. User perception matters more than numbers
Traditional web performance metrics have included items such as time to first byte (TTFB) and page load times. The problem is these metrics don’t accurately reflect how the end user perceives page loading. The user doesn’t have to (and most likely won’t) wait until all items on the page have loaded before they begin interacting with the page. There has been a shift in the web performance community to focus on metrics that have more to do with the page rendering and interactivity, as these are more indicative of the user experience. Understanding how a user perceives the page load is much more important than how long it takes for all resources to actually load. Take a look at this side by side comparison of two pages loading. Which do you think loads faster?
Would you be surprised to hear that they both load in the exact same amount of time? Yet it feels like the J.Crew page loads faster (at least I think it does).
Humans have memory bias and perceive time differently. We have a tendency to perceive things as taking longer than they actually do; then, when recalling an event, we think it took even longer than we originally perceived it to. The bottom line is, if a user perceives a page as loading slowly, that’s all that matters.
5. Metrics can lie
Measurements and metrics can be manipulated to present anything in a good or bad light. If the average isn’t showing improvement, try reporting on the median or percentile. If start render times aren’t showing a good enough gain, look at TTFB or page load times. Not all pages are showing an improvement? Present results for only the home page. In web performance there are many metrics that matter; reporting on a single metric doesn’t show the whole picture. But when some metrics show improvement and others don’t, how do you make an informed decision?
When presented with performance results, ask questions and ask for additional metrics – especially if you are only presented with a limited set of data (Tip: always ask for full histograms to make judgments). While it is frustrating to have to explain to customers why 1 page out of 10 tested didn’t show the results expected, I always prefer to share more data than less when conducting A/B or performance tests.
In the coming weeks we will share more details on how to find the truth in the maze of web performance.