Products

Experimentation

Product Experimentation Web Experimentation Lifecycle Experimentation Lifecycle Experimentation

Feature Flagging

Release Management Automated Rollouts Config Flags Release Management

AI Personalization

Contextual Bandits Contextual Bandits

Incrementality

Geolift Geolift

Why Eppo

WHY EPPO

By Role

Data Scientists Engineers Product Managers Product Managers

Resources

Customers Outperform Updates White Papers White Papers

FEATURED CASE STUDY

Coinbase Saves Millions, Reduces Experiment Analysis Time by 40%, and Restores Trust in Experimentation with Eppo

Learn more

Blog

About

Strategy

February 5, 2025

Make Smarter, Data-Driven Decisions in Your Experiments With Confidence Levels

Ryan Lucht

Experimentation evangelist focused on sharing ambitious ideas for getting everyone testing. Before joining Eppo, Ryan was an experimentation consultant helping companies like DoorDash, Zillow, and Clorox grow their programs.

TL;DR:

Confidence levels are crucial for data-driven decisions in experiments.
Choosing the right confidence level impacts the width of confidence intervals.
Factors like sample size and standard deviation influence confidence intervals.
Eppo automates confidence level calculations for better experiment outcomes.
Confidence levels play a key role in deciding whether to ship or not ship products based on experiment results.

Confidence levels are so important when it comes to making data-driven decisions. They’re an essential part of our experiment design process when we want to avoid making guesses when it’s time to ship new features or optimize products. But understanding confidence levels and choosing them can be tricky - you’ll want to know how they impact confidence intervals, how small sample sizes can limit them, and the different statistical methods you can apply them to.

What You’ll Learn in This Blog:

The fundamentals of confidence levels and how they relate to statistical significance.
Different factors that influence the width of confidence intervals, including sample size and standard deviation.
How Eppo automates confidence level calculations for better experiment outcomes.

What is the Difference Between Confidence Levels and Confidence Intervals?

Confidence levels tell you the frequency with which you’d expect your test results’ confidence intervals to capture the actual value of what you’re measuring if you run the experiment many times. For example, choosing a 95% confidence level and thus calculating a 95% confidence interval means that if you repeated your test 100 times, about 95 of those intervals would include the “true mean” of your metric.

Confidence intervals show you a range of plausible values for your metric rather than focusing on one single number. Instead of incorrectly extrapolating what we observed during the experiment and saying, “This new feature improved conversion by 2%,” a confidence interval correctly quantifies the precision of our experiment design, saying, “Our true improvement lies somewhere between 1% and 3%.” This extra context helps you understand how much wiggle room there is around your estimate based on how you planned and conducted the experiment.

How To Choose Your Confidence Level %

Before running an experiment, you need to set the “certainty bar” you aim for. This is your confidence level (often 90%, 95%, or 99%). While choosing a confidence level is generally up to the person or team running the experiment, it’s influenced by your sample size, baseline rates, and variance, and how quickly you need to make decisions. Setting this level at the start establishes a clear standard for interpreting your results once the data comes in.

Strike The Right Balance Between Certainty and Speed

While 95% is common, it’s not your only option. Increasing to 99% makes you more certain but widens your confidence interval; the cost for certainty is less pinpointed insights and slower decision-making ability. Dropping to 90% narrows your interval and speeds things up but raises the risk of missing the true outcome. It’s a trade-off. Pick the level that aligns with how quickly you need answers and how sure you want to be.

Sizing Up Your Data

The choice also depends on your data’s quantity and stability. Larger sample sizes, higher baseline values for the metrics you care about, and/or consistent data patterns can make obtaining conclusive results at higher confidence levels easier. Smaller samples or higher-variance data might push you to choose a slightly lower confidence level to avoid overly broad, less actionable intervals.

Picking a Level That Matches Your Goals and Risk Tolerance

Finally, consider what’s on the line. If a poor decision would be costly or hard to undo, a higher confidence level could be worth the extra caution. If you need to move fast to respond to market changes, a lower confidence level might help you act decisively, even if it means accepting more uncertainty.

How Your Chosen Confidence Level Impacts Your Confidence Interval

The goal of any experiment is first to determine if we have sufficient evidence to reject our null hypothesis (i.e., to reject the assumption that our treatment has no measurable impact) and second, to get as close as possible to the true population parameter for the group you're studying. Statistical significance helps us measure the former, and confidence intervals help us measure the latter.

If you’re still not sure which confidence level to choose, seeing how that choice impacts the confidence interval might help you make a more informed decision because the level you choose makes a significant impact on the range of values you end up with at the end of your experiment.

In short, the higher you set your confidence level, the wider your intervals tend to become, reflecting that you’re being more cautious and allowing more room for the “true value” to fall inside. On the other hand, lower confidence levels produce tighter intervals but carry a higher risk of missing that true population parameter.

Are There Other Factors That Influence Confidence Intervals?

Variance and Standard Deviation

If your data shows more spread (higher standard deviation), your confidence intervals naturally widen due to the data points spreading farther from the mean. This spread makes it harder to pinpoint the true value with precision. To help with this, Eppo employs techniques like CUPED and winsorization, which reduce the standard errors in your results and bring those intervals back into a more useful, narrower range.

Sample Size

Bigger samples mean less guesswork. A larger sample size drives down the margin of error, giving you tighter intervals for the same confidence level. But if you’re working with a small or niche dataset, expect wider intervals since you have fewer data points to anchor your estimate. In these cases, you might have to consider adjusting your confidence levels.

Confidence Levels in Real-Life Experimentation

In the real world, confidence levels play a huge role in important decisions like “ship” or “no ship” regarding product launches or feature rollouts. Here’s how:

Ship Decision

Suppose the lower bound of the confidence interval is above zero. In that case, it suggests a positive outcome (e.g., a feature will likely have a beneficial impact), so the decision may be to proceed and "Ship" the product or feature.

No-Ship Decision

The results are inconclusive if the confidence interval includes zero (meaning it crosses from positive to negative values). This indicates that the data doesn't strongly support a positive or negative effect, so the decision may be to hold off, or "No-Ship," until further testing or data is gathered.

Confidence Intervals Take the Shape of a Bell Curve

Bell curve graphs can really help explain the relationship between the sample mean and the confidence interval. The sample mean is at the center, and the confidence interval is shown as a range that extends to either side. The confidence level indicates the likelihood that the true population mean lies within that range.

Misconceptions About Confidence Intervals

Before we move forward, we want to make sure we clear up some common misunderstandings about confidence intervals:

Confidence intervals don’t guarantee the true mean lies within them.

The interval represents the range of values where we expect the true mean to lie based on the sample data. However, even if the interval is calculated with a high confidence level, there's always a chance that the true mean falls outside this range.

It reflects uncertainty based on the random sample.

The interval is based on the specific sample chosen and can vary if you collect different samples. This uncertainty is captured in the interval, which shows the potential range of values for the true mean, not a fixed point.

How Eppo Simplifies Confidence Levels

Eppo simplifies statistical analysis by automatically calculating confidence intervals and precision. Using advanced techniques like CUPED (Controlled Pre-Exposure Data) and sequential analysis, Eppo ensures that experiment results are reliable and based on sound statistical methods.

Reducing Variance

Eppo's tools are designed to minimize standard deviation, delivering exact results—even with small sample sizes. This allows you to make informed decisions even when data is limited, reducing the risk of misleading conclusions.

Decision-Ready Insights

Eppo’s dashboards make it easy to understand the data by displaying essential metrics in an easy-to-digest, visual way. No statistical background is required to understand the experiment results in Eppo. This helps you interpret experiment results at a glance and turn insights into actionable decisions quickly.

Methodology Integration

Eppo supports a wide range of statistical methods to cater to the unique needs of different experiments. Whether you're testing new product features or updated marketing campaigns, Eppo’s flexibility ensures you can rely on the best methodology for your specific experiment.

Ready to simplify your confidence level calculations and make more reliable decisions? Explore how Eppo can help you automate confidence intervals, reduce variance, and streamline your experimentation process for better outcomes. Request a demo today!