I had a fantastic time hosting our talk, Going Beyond DevEx Metrics, with two incredible guests: Chas Mastin, Senior Engineering Manager from YouTube, and Jon Kern, Co-author of the Agile Manifesto from Adaptavist. Both are true enthusiasts and experienced practitioners when it comes to building and “protecting” metrics that deliver magical user experiences. Here’s the webinar recording, and below are a few of my reflections.
Watch the 10-minute recap video!
Here are the main highlights from our discussion—I hope you find them valuable!
Please note that Chas is not speaking on behalf of YouTube or Google. Instead, he shares his personal experiences across his career, which have certainly been influenced by his time working at YouTube in recent years.
Focus on what truly matters: the users’ actual experiences and the value they gain while using your product—even down to endorphin levels. This includes those that drive reward and motivation, like clicking a link and expecting it to load instantly, as well as those that sustain long-term satisfaction. These are the experiences your software can strive to deliver and optimize for.
Now, let’s dive into this topic with a video example, creating experiences for millions of users, especially as over 80% of all internet traffic is video.
You really have to think about the human beings experiencing the video. It starts with each individual user. What we want is for them to have a connection—with the creator, with a great movie, or with amazing educational content. That connection creates all the good chemicals we want: serotonin and oxytocin. We want people to have these great experiences. But what happens if there's buffering? What happens if playback fails? What happens if an ad comes on and messes everything up? What happens is frustration—it’s cortisol, it’s adrenaline, it’s all these bad chemicals and experiences we want to avoid. - Chas
You don't add quality at the end. Like you don't add performance at the end. If you look deeper, those are parts of the very beginning of understanding: What do we need to achieve for our users? - Jon
When chosen wisely, metrics are more than just numbers—they reflect user delight and, ultimately, the quality of your software.
Quality, to me, has to start with the human beings. It starts with the people. Then we use proxy metrics to determine how people are experiencing the video. If I could detect people's endorphin levels, I would—but I can’t. Instead, I have these proxies. From there, we derive metrics that reflect delivery issues, device issues, individual stream issues, or encoding problems that can happen. – Chas
Identify key metrics that capture the quality of user experience—both positive ones to elevate and protect, and negative ones to minimize or avoid.
Latency is the most important thing—specifically, join latency on a video. When you click on a video on YouTube, you expect it to start as instantly as possible. The magic happens with the speed—it’s just boom. It starts, and it starts all the time. - Chas
Latency is a leading indicator of a problem. When there’s a problem, start latency is maybe the first thing we think about. – Chas
Latency is also one of the easiest metrics to reason about. You can sit there and more easily correlate it with other factors, whereas rebuffering is very noisy. With 3 billion people on the planet using YouTube, there are countless reasons for rebuffering to happen. But when everyone starts experiencing join latency at the same time, you can say, ‘Hey, we understand why this is happening,’ especially when conducting experiments or making tradeoffs. – Chas
With your leading indicator as your service level indicator (SLI), set a clear threshold for the experiences you want to protect—this becomes your service level objective (SLO). From there, you can build and use it to create a service level agreement (SLA). Chas mentioned a great book ‘Site Reliability Engineering’ where you can learn more about this approach.
So, if you have all of these proxies for how users experience your software, how do you think about them, and how do you manage them? The secret to this actually comes from the SRE culture. - Chas
So, I'm going to draw on this pad of paper here. I'm going to draw a time series. Everything is a time series because we don’t really know the truth unless we have the context for it. In this case, we’re looking at, let’s say, CPU usage within a particular service. This line here represents our CPU usage, and that is our service level indicator. It tells us where we are within that service over time. - Chas
Then, we might put a horizontal line on our graph. That horizontal line is our service level objective. It means when we go over that line, we’ve got a problem. We’ve spiked our CPU, and that correlates to bad user experience. If I could detect people’s dopamine and endorphin levels, I would, but I can’t. Instead, I have these other proxies. So, out of these two things, you say, ‘Okay, now I’ve gone over this line. - Chas
Unite leadership and teams to prioritize and safeguard the thresholds.
The idea of the SREs was to create something called an error budget. Every time I go over this line, I spend some of my error budget. And when I’ve spent too much of it, then my service is broken. I’ve got things I need to fix. The agreement you create out of this is a service-level agreement. This agreement, whether with my customers or with other teams, says, 'I’m not going to go over this level for this thing.’– Chas
It is really about saying, 'Hey, look, I don’t want to just work randomly to improve CPU. I want to know what my goal is.' And so it’s like, 'I have a goal.' It doesn’t mean the CPU is always going to be at zero—because that’s not realistic—but we’re going to make sure that we identify the real outlier problems and fix them. - Chas
This is a great way to think about individual services, services that get a remote procedure call and need to respond within a certain amount of time. But after the SREs developed and propagated these ideas, they became so effective that people asked, 'Why can’t I push this threshold down? Why can’t I improve all our services by setting goals around these service-level objectives?’ The answer is you can’t—because there are millions of servers, and they’re all encountering issues at different times. It’s really hard to focus on just one thing, like CPU, RAM usage, or latency for a specific service. – Chas
Select the right indicators to build synthetic metrics that capture diverse user experiences in actionable ways.
If you just look at any individual metric from a quality perspective, even at a global scale, it’s going to be unbelievably choppy, right? Take something like rebuffer rates—it depends on how many people are on your system and other systems. An individual neighbourhood could have an outage or something similar. – Chas
So, even with snagmillions of concurrent users, you’re going to see all this noise in the system. And you can’t improve a system that noisy—at least not at first glance. – Chas
You need to come up with a synthetic way to think about experience. What you have to do is develop what I call synthetic metrics. These are metrics that represent many different states of users experiencing your product—states you want to try to avoid. - Chas
We don’t want customers to experience a group of bad conditions, right? Like playback failures, crashes, or other issues. So, we combine these together and look not just at individual playbacks, but at playbacks over a longer period of time, we normalise the data. These generate metrics that we can then improve. - Chas
Not only can we break these metrics down to see when they’re getting worse, but each experiment looks at these overall metrics and asks, 'Are we trying to improve this synthetic metric?' – Chas
Continuously improve by modeling, testing, and iterating to enhance experiences reflected in your synthetic metrics.
The next evolution was, hey, we can use these same processes to set a North Star, head in a direction, and continuously work on improvement. – Chas
Once you have that synthetic way to think about experiences, then you can create a service level agreement between leadership and the engineers—across many teams—to say, 'Hey, look, our goal is to improve whatever metric we’re focusing on.' We’ll call it a proxy latency. We’re going to improve video quality, and we’re going to move the line up on video quality. How are we going to do that? - Chas
What’s so agile about this, is that besides setting those targets and figuring out what those targets are with the engineers, the engineers themselves answer that question. Leadership isn’t coming in and saying, 'You have to do X, Y, and Z.' Instead, they’re saying, 'We’re going to agree on what this SLO should be for these overall experience metrics.' Then everyone figures out how to coordinate and drive improvement together. - Chas
To illustrate this approach, Jon showcased a table featuring a long-running synthetic metric representing an end-to-end user journey—a set of features, so to speak. The table also visualized how he implemented synthetic testing to simulate a user completing this journey, enabling him to track the synthetic metric alongside the individual metrics contributing to the overall user experience. Jon pointed out that there were a few metrics that were “above the line” so his team worked to dramatically improve those specific metrics and increase “User Magic.”
The secret is being agile in implementation and being data-driven in deciding what you’re going to improve. And you remain data-driven throughout this process. And the secret to OKRs working is when they’re not just words, but rather data. - Chas
How do we coordinate all these teams working together to improve latency, rebuffering, or other metrics? It’s through top-level YouTube OKRs. For example, the objective might be to reduce rebuffering, with a key result to lower it from X to Y. Each team gets that key result and participates in achieving it. This year, we had over a dozen teams contributing to 100+ performance improvement projects. That doesn’t happen magically—it happens because we plan.The planning isn’t about dictating what engineers will refactor; it’s about aligning on OKRs, identifying opportunities, and agreeing on both the wording and, more importantly, the data that defines our targets. And it works—it’s magic. – Chas
If you don’t show people what good looks like or give them the goal, you’re restricting their ability to make their own judgments. But the most effective way to work is to let people make micro-judgments. Over time, they’ll learn—maybe they’ll take a risky step and realize it was too much, or they’ll know when to ask for guidance. – Jon
When you let go of the obsession with power, control, and telling people what to do, the magic happens. People start thinking, 'Oh, we just need to work toward this goal, and we’ll be recognized.' You can even tell them, 'Yes, you will be rewarded if we achieve these challenging goals. But you also need a team to act as gatekeepers for protecting key metrics and thresholds. This includes launch reviews where teams present any launches impacting those metrics. We analyze, discuss, and sometimes help improve their metrics if something isn’t optimal.– Chas
Validate what drives impact and uncover any side effects.
You need the ability to conduct experiments, because without it, you’re just pushing things out and hoping—which is incredibly inefficient. You need to be able to turn things off. Hope might be a strategy, but it’s a bad one. – Chas
To reach the level of conducting experiments, you need the basics in place—like test coverage, automation, etc.. But what’s often overlooked is the importance of feature flag-driven development. You need the ability to release something, turn it on or off, roll it out to a specific percentage, and direct where that traffic goes. – Chas
We have an incredibly advanced experimental system that enables everything to happen. For every feature released or change made—even something as small as adding or moving a button—an experiment is conducted. We track dozens of metrics, analyze the impact with deep statistical significance, and gain the confidence to approach development like science. This system prevents us from just throwing features at a product and hoping it works.– Chas
If you really want to take the magic to the next level, it’s not just about guessing numbers—it’s about using machine learning. You’ve got to build a model and tune it for the specific experience you’re optimizing for. So you’re like, 'Hey, how do I actually improve this?' Well, at the heart of many delivery systems are what I call magic numbers. For example, 'I’m going to wait three seconds to retry against this service.' Okay, that’s great. But what if you try two seconds? Well, two seconds might cause too much load on another system. So, how can we solve this in a better way? With machine learning, instead of relying on a magic number, you can look at a distribution of different experiences. You build a model to detect where in that distribution you are and provide the right retry logic, the right prefetch logic, or determine how many bytes to prefetch.– Chas
You can’t just wiggle one part of the system without affecting the rest. What I’ve always done is build those mental models and make them explicit. With those models, you understand how the system works and the effects of wiggling different parts of it. – Jon
Let data guide decisions and prove progress. Take it seriously and make commitments.
Words are just bad technology from the Bronze Age. They’re just tech, like a phone or anything else. We’re so used to them, so influenced by the way we speak, that we mistake them for the ground truth of reality. But they’re not. Words are merely indicators—sometimes showing when things are right or wrong—but they’re not the only or best tool to represent reality. Reality is found in the actual empirical outcomes of our actions, and good data works well for this. – Chas
What we do every day is align the thought bubbles above our heads because words are ambiguous. Collaboration takes effort—talking, slowing down, and being okay with complexity and ambiguity. Some embrace it, while others compartmentalize, thinking more prediction and planning will solve everything—but it won’t. It takes time, effort, and collaboration to build a culture where people can navigate uncertainty. That’s where innovation happens, where we outdo the competition. – Jon
How to create and maintain a data-informed culture? While it starts top-down, it’s also about the practical day-to-day. Every morning, I look at the metrics. People often say, “Chas, you could set an alert to notify you of changes.” But I tell them, “No, I am the alert.” I’m responsible, so the first thing I do every morning is open my browser and check. – Chas
Do I need to see the user experience data every day? No. But did I look at it? Yes. Because it kind of washes over you—you develop a feel for it. You start to sense what’s going on. Sure, we could create alerts, but that misses the trends and the overall intuition. – Jon
We celebrate wins—we send out newsletters, including a metrics quality newsletter. I’m a software engineer, so why am I writing a newsletter? Because sharing this information is really important. It helps people feel connected to the larger organization and lets them share in the data-informed wins. – Chas
Within any organization, people build fiefdoms—they take ownership of their services and get defensive when questioned, saying, 'Don’t comment on my service or my code; I built this.' Metrics help disentangle that reaction by shifting the focus to objective discussions. Rather than critiquing code, they enable teams to align on optimizing proxy metrics that reflect what customers truly want. As an engineer, whether testing equipment or delivering video to the world, you need to take personal responsibility—not for your code or system, but for the totality of the experience you’re creating. – Chas
Exceptional user experiences start—and end—with how developers work.
Any organization can transform into one dedicated to the customer, but it has to start at the top. – Chas
The key part of developer experience is having autonomy and a clear 'flag in the distance,' a direction of travel that’s continuously communicated. It permeates everything we do because we hire smart people. If we don’t have that flag in the distance, a way for individuals to measure their ideas or actions against the goal—like, 'Am I getting warmer or colder?'—then you’re left with command and control. – Jon
It’s about people—both outside your company and within it. We don’t build software for any reason other than to make ourselves and our species smarter, faster, and better. – Chas
Software would be easy if it weren’t for all the people. – Jon
PS. Think Bigger About 2025—Or, More Realistically, 2125
As we step into 2025, let’s move beyond maintaining the status quo. Reflecting on Chas’s insights during our webinar, imagine a world where data drives decisions, and transparency builds trust—not just in software, but across all of society.
Wishing you a year filled with clarity, innovation, and meaningful connections!