Understating Customer Perceived Latency

Measuring web performance through end user’s lens

When we talk about web app’s performance there are many metrics that come into the picture like the load time, time to the first byte, time to contentful paint, etc.

Out of these, there are some metrics which talk about the network latency, the time a web page was waiting for the connection to be established & files/scripts to be served over the network & some metrics are specific to what happens after that, for example the time to first paint, first contentful paint, time to interact & first input delay.

https://www.w3.org/TR/navigation-timing/timing-overview.png

Before we hit the page’s onLoad handler, all these happen, but this doesn’t include the latency that end users may perceive.

The real user’s perceived latency will most of the time be higher than the time for loadEventEnd after hitting a URL in the browser.

What is preventing the web-page to be ready, fully loaded & interactable after the content is downloaded by the browser?

The CPU thread being busy at the time of page load is what causes a page to be rendered fully or partially but still not usable or interactable. Let us look at some of the reasons for that:

  1. There is a lot of JS computation happening on page load.
    Take a look at these recommendations to avoid it: https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/javascript-startup-optimization/
Image Credit: Patrick Meenan

2. The web app is image-heavy. In such a case you will see images being rendered like a curtain drop effect & page is not usable.

3. The CPU is busy doing something else so that it is not able to follow the event loop correctly & is missing cycles to complete the page rendering. (This mostly is environmental & something, as a web app developer we have less control on). This is what makes it difficult to use the time to interactive as a metric be captured for real user monitoring & to make business decisions out of it. Real user devices may vary drastically in terms of RAM/CPU/OS configuration, browser, platform & operating conditions like %CPU busy, etc.

Time to first interactive vs Time to consistently interactive

Image Credit: Patrick Meenan

Since time to interactive was coined as a web performance metric there have been opinions & questions that should we be interested in time to first interactive (the time when for the first time CPU thread was available to respond the user’s interaction) or time to consistently interactive (the time after page load after which CPU is idle consistently & available to respond to user’s interactions without any delay)?

The major difference between the two is also the way or algorithm to calculate these two numbers. We will have a look at the algorithm at a high level in a short while.

Vendors who provide time to interactive have chosen either one of them as a standard.

How to capture time to interactive?

There have been many ways to capture time to interactive by different vendors like Google, Akamai, etc & for different purposes. The time to interactive numbers that we get in Chrome Dev Tool’s Lighthouse is restricted to have a constant platform & environment.

While there are vendors that provide time to interactive for Real User Monitoring (RUM) as well, like Akamai’s Boomerang.

At a high level, this is how the algorithm looks like:

https://github.com/WICG/time-to-interactive#definition

Please note that since in this algorithm we wait for 5 seconds of CPU idle period to report time to interactive it is more likely to be time to consistently interactive.

If we just report the time at which we see the network & CPU to be idle for the first time after the reference, it is time to first interactive.

Many vendors prefer time to consistently interactive as a number to report, due to its stability.
https://docs.google.com/document/d/1GGiI9-7KeY3TPqS3YT271upUVimo-XiL5mwWorDUD4c/edit

Tools to capture time to interactive

  1. Google’s Lighthouse (Lab data) https://developers.google.com/web/tools/lighthouse/
  2. Akamai’s Boomerang (RUM) https://developer.akamai.com/tools/boomerang/

Custom Implementation

For my use case, I tried a custom implementation of the algorithm.

The support for getting to know the active network requests, the behavior of timers, support for LongTaskAPI varies with browsers. To have RUM in place supporting the majority of the browsers, we do fallbacks of the non-supported APIs.

1. We look for 4 occurrences of consecutive 50ms intervals where we see:1.1 No resource being downloaded.1.2 CPU busy % being under 25.1.3 There are no disabled submit buttons on the page1.4 User is already not interacting with the page2. If we see a window of 4 cycles where all above conditions are met, we declare TTI as start of the window3. ELSE we move the window ahead

Learnings

“TTI can be difficult to track in the wild”, time to interactive is not something as a developer I would be very much interested in. There are platform/environmental factors that can increase the numbers. At the same time, it is good to see how end users are perceiving the webpage through such metrics through real user monitoring. For a business decision-maker, the CPL makes it clear where to invest in terms of performance improvements.

Full-stack engineer, drummer, running on JS & Music. “Frameworks & Libraries come & go, JS stays ❤” https://www.linkedin.com/in/gauravbehere/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store