Products

Experimentation

Learn more

Feature Flagging

Learn more

Key features

Artificial Intelligence

RESOURCES

FEATURED CASE STUDY

Coinbase Saves Millions, Reduces Experiment Analysis Time by 40%, and Restores Trust in Experimentation with Eppo

Learn more

Blog

A/B Testing

May 22, 2024

What are Guardrail Metrics? With Examples

Protect your SaaS business during experiments. Learn what guardrail metrics are, why they're vital, real-world examples, and how to set them up in A/B tests.‍

Ryan Lucht

Experimentation evangelist focused on sharing ambitious ideas for getting everyone testing. Before joining Eppo, Ryan was an experimentation consultant helping companies like DoorDash, Zillow, and Clorox grow their programs.

Monitoring your SaaS product’s key business metrics and running experiments to improve them can often feel like a balancing act. Making a big change to one part of your app can positively impact some metrics, while negatively impacting others.

For example, if your new search algorithm provides more relevant results to users but takes much longer to load, it may still have a net negative impact on important business metrics like revenue.

What’s worse, sometimes you might not remember to monitor those “tangential” metrics closely, meaning negative results might not get caught early enough.

This is why it’s important to pay close attention to your guardrail metrics.

In this article, we’ll explore what guardrail metrics are and why you should be making them an integral part of every single one of your experimentation efforts from now on.

We’ll go over:

A definition of guardrail metrics and reasons why they matter
Two real-world examples of guardrail metrics in action
An easy guide for choosing the right guardrail metrics
How to integrate and balance guardrail metrics
How to set up guardrail metrics in A/B testing

Let’s jump right into it.

What are guardrail metrics?

Guardrail metrics are critical business indicators you closely monitor during experiments like A/B tests. Think of them as safety nets designed to catch potential negative side effects while you try to improve one area of your SaaS product.

Unlike your primary success metrics, which focus on the specific target of your experiment, guardrail metrics are designed to safeguard other important areas of your business.

They help make sure your changes don't unintentionally cause problems like reduced website speed, declining customer satisfaction, or a drop in overall revenue.

By tracking guardrail metrics, you get an early warning system. If an experiment starts to negatively impact a guardrail, it signals that adjustments or even a complete halt to your experiment might be needed.

This approach helps you make sure that your efforts to improve one metric don't end up hurting other vital aspects of your business.

Why are guardrail metrics so important?

Guardrail metrics play a vital role in keeping your business healthy and on the right track. Here's why they matter so much:

They help with managing risk: Experiments are inherently about introducing change, and with change comes risk. By monitoring these metrics, you can minimize the chances of unexpected outcomes that could hurt revenue, user experience, or overall product stability.
They foster experimentation: Knowing that guardrail metrics are in place gives you more confidence to experiment. You don't have to worry so much about accidentally breaking something important while trying to make improvements. This encourages a culture of innovation and calculated risk-taking.
They help explain experiment outcomes: Guardrail metrics can also become helpful “storytelling” metrics when outcomes to primary metrics or business-level KPIs show surprising results. They can help paint a picture of the broader impact a change is having beyond the original intention.
Team coordination becomes easier: In larger organizations, teams often work on different goals simultaneously. Guardrail metrics promote collaboration and trust. ‍
One team can set guardrails for their key metrics, assuring other teams that their experiments won't cause unintended harm. This allows for smoother operations and prevents conflicts between teams.
Guardrail metrics protect against short-term gains at the expense of long-term health. By paying attention to things like user experience and core business indicators, you ensure that your growth isn't just a temporary spike, but a steady and sustainable trajectory that benefits both you and your customers.

Real-world examples of guardrail metrics in action

Example 1: How The RealReal could balance supply and demand

The RealReal operates a two-sided marketplace, bringing together both buyers and sellers of luxury goods. This dynamic presents a challenge: How do you encourage one side of the marketplace to grow without hindering the other?

Guardrail metrics offer a solution. Let's take a closer look:

Imagine the consignor (seller) team wants to boost the number of people selling their clothes on The RealReal.

They decide to experiment with a pop-up ad that encourages visitors to become sellers. This sounds great in theory, but what if the pop-up is so distracting that it drives potential buyers away, leading to fewer sales?

This is where guardrail metrics come in. By setting the "orders placed" metric (a key indicator for the buyer team) as a guardrail, the consignor team can track the impact of their experiment in real-time.

If the pop-up causes a significant drop in orders placed, it signals a potential problem. The consignor team can then investigate and adjust the pop-up to ensure it doesn't negatively impact the buyer experience.

So, what’s the takeaway?

This hypothetical approach shows how guardrail metrics can promote collaboration and guarantee balance within a two-sided marketplace.

By prioritizing the key metrics of both buyers and sellers, they can experiment confidently, knowing that efforts to drive growth on one side won't unintentionally damage the other. This creates a win-win situation where the platform thrives as a whole.

Example 2: Netflix and how it’s all about the content

Netflix uses guardrail metrics to make sure its content is resonating with its subscriber base. Here's how some of their “do no harm” guardrails work:

Average watch time per user: Netflix wants to make sure that experiments, like changes to the user interface or recommendation algorithms, don't unintentionally lead to users spending less time watching content. So, a decrease here could indicate that the changes are hurting the overall viewing experience.
Churn rate: While Netflix is always looking to grow, they're also very focused on retaining existing customers. An experiment that increases churn, even if it leads to more new sign-ups, could be a red flag that the changes are alienating loyal viewers.
New subscriber sign-ups: Attracting new customers is important, but Netflix also needs to make sure that experiments don't discourage potential subscribers from joining. This guardrail ensures that any changes made to the sign-up process or overall platform don't create barriers or negative impressions that might deter people from signing up.

Netflix uses specialized statistical checks like equivalence testing and non-inferiority testing to analyze these guardrails. Their focus is ensuring that even if an experiment isn't a win, it also doesn't cause unintended harm.

For more on these statistical checks, you can read the full report here.

Quick guidelines for choosing your guardrail metrics

Choosing the right guardrail metrics is like setting up a good security system. It's about knowing your weak points and putting alarms in the right places. Here are three key guidelines to help you pick the metrics that will truly protect your business:

Identify risks tied to your core goals

Before anything else, take a step back and ask yourself:

What are the biggest risks to my business or product?

A new feature might boost one area, but what could it potentially damage? Maybe faster checkout speeds could lead to more accidental orders, or a focus on new user acquisition might make existing customers feel neglected. Pinpointing these potential pitfalls is the first step toward choosing effective guardrail metrics.

This list should extend to risks posed to all business goals, not just the primary focus of an experiment.

For example, many businesses know that decreases in website load time negatively impact revenue. To ensure that no experiment introduces additional load time, they measure this as a guardrail metric on all experiments.

Monitor the risks directly

Once you have a list of risks, you need metrics that directly track them. Let's say you're worried about customer satisfaction dropping. Don't just assume this will show up in revenue — have metrics like customer feedback scores or support ticket volume to give you specific and fast feedback if customer happiness starts slipping.

Remember: The key is to choose metrics that give you a clear signal of trouble in the areas you're most concerned about.

Set meaningful thresholds

It's not just about the metrics themselves, but also about when they trigger an alert. A 1% drop in revenue might be negligible one week, but a serious warning sign the next.

Your thresholds should be strict enough to catch real problems, but not so strict that they constantly slow you down. Think of them as boundaries that give you time to react before things reach crisis level.

Tips for integrating and balancing guardrail metrics

Finding the right balance between protection and innovation is key. Here are some tips on how to integrate guardrail metrics effectively:

Make them part of the conversation: Include discussions about potential risks and relevant guardrail metrics whenever you're brainstorming new experiments or features. This fosters a more holistic view of product development.
Establish clear processes: Have guidelines for what happens when a guardrail triggers. Define who needs to be involved, what additional analyses are required, and the decision-making process for either pausing or continuing an experiment.
Use contextual thresholds: Avoid unnecessary alerts by adjusting your guardrail thresholds based on experiment size or rollout scope. A small-scale test can likely tolerate slightly larger shifts than a major launch.
Learn from your guardrails: Track which guardrails trigger most often. Use this information to spot recurring risks and improve the quality of your hypotheses over time.

Setting up guardrail metrics in A/B testing

A/B tests allow you to focus on specific changes. But it's easy to miss the bigger picture — how does this tweak impact other aspects of your product? Here's how to make guardrail metrics an integrated part of your A/B testing:

Make them part of the planning stage: As you design your experiment, don't just think about the primary metric you're hoping to improve. Also, identify key business, user experience, or strategic priority metrics that you want to safeguard. These become your guardrails.
Setting up in your platform: Modern experimentation platforms, like Eppo, allow you to define guardrail metrics alongside your main success metrics. This ensures they are tracked automatically throughout the test.
Establish alert thresholds: Work with your team to set meaningful thresholds for each guardrail. When a guardrail triggers an alert, it signals a potential problem needing deeper investigation.
Review, act, and keep experimenting: Having a clear process for reviewing any alerts is crucial. This might involve pausing the experiment, running additional analyses, or gathering more data before deciding to continue.

Next steps

You now understand the importance of guardrail metrics for mitigating risk and promoting experimentation. However, the real challenge lies in setting them up with precision and a focus on statistical rigor.

This is where Eppo excels.

Eppo is a powerful experimentation and feature management platform that simplifies your experimentation process, from setting up robust feature flags to providing in-depth analysis tools that protect against unintended outcomes.

Designed for data-driven teams who value accuracy, Eppo allows you to conduct and analyze experiments confidently and define powerful guardrail metrics alongside your key success goals.

Eppo also allows teams to create “collections” of guardrail metrics to ensure they’re added to every experiment, even when anyone in the organization can self-serve experiment creation.

Here’s how Eppo makes the difference:

Protect your bottom line: Eppo's data warehouse-native architecture means you work with your most trusted data sources, ensuring metric calculations are accurate and reliable.
Mitigate risk: Eppo’s sophisticated feature flagging capabilities help you stop experiments early if negative trends emerge, protecting your revenue and customer experience.
Build trust across teams: Eppo’s trustworthy data helps you make sure your teams are aligned for data-driven decision-making. With its highly detailed and shareable experiment reports, transparency is guaranteed and collaboration is much easier.
Experiment with confidence: Rigorous guardrail setup and automated diagnostics alerts prevent costly errors and give the peace of mind you need to accelerate experimentation without compromising statistical rigor.
Data-powered product development: Eppo empowers you to design experiments with guardrails at their core, ensuring your A/B tests always align with your key business goals.

Ready to see Eppo in action? Book a Demo and Explore Eppo.

Protect your SaaS business during experiments. Learn what guardrail metrics are, why they're vital, real-world examples, and how to set them up in A/B tests.

‍

No Headings