Products

Experimentation

Product Experimentation Web Experimentation Lifecycle Experimentation Lifecycle Experimentation

Feature Flagging

Release Management Automated Rollouts Config Flags Release Management

AI Personalization

Contextual Bandits Contextual Bandits

Incrementality

Geolift Geolift

Why Eppo

WHY EPPO

By Role

Data Scientists Engineers Product Managers Product Managers

Resources

Customers Outperform Updates White Papers White Papers

FEATURED CASE STUDY

Coinbase Saves Millions, Reduces Experiment Analysis Time by 40%, and Restores Trust in Experimentation with Eppo

Learn more

Blog

About

Eppo News

September 4, 2024

Introducing the Experiment Performance Scorecard

Measuring Experiment Program Hygiene

Eric Metelka

Before joining Eppo as Head of Product, Eric led experimentation programs at companies like Cameo and Cars.com

Good product teams ship quickly and drive impact. Bad product teams move slowly or they have bad experimentation hygiene, making impact reporting untrustworthy. It can hard for product leaders to discern one type of team from the other. Because there’s so much going on, these leaders are hearing a lot of success stories but they lack the means to verify that teams are operating well.

We built the Performance Scorecard to solve these problems. It is designed to bridge the gap between individual experiments and overall program performance, and ultimately impact. This provides a comprehensive view of your experimentation efforts.

The Performance Scorecard goes beyond traditional metrics by measuring both inputs (how you're experimenting) and outputs (what you're achieving). This dual approach ensures that you can see whether teams are operating in ways that lead to velocity and impact.

Aggregate Impact: Quantifying Your Program's Success

One of the most significant challenges of experimentation programs is demonstrating their overall impact. Previously, teams had to manually compile data every quarter during planning sessions, often relying on a handful of standout experiments to make their case.

As a PM, every quarterly planning cycle, I was compiling the same table of results. I needed to aggregate what experiments we ran and how they impacted our north star metric in front of leadership. But leadership wasn't in the weeds, and this data always seemed new to them. At planning time, this data couldn't break through prior assumptions already formed.

The Performance Scorecard changes this by providing:

Holdout-based Aggregate Impact: A robust measure of your program's total impact, based on rigorous holdout experiments.
Bayesian Aggregate Impact Estimate: A new feature that allows teams without holdout capabilities to estimate their overall impact accurately.
Top experiments: A table view of top experiments for the expected measure shipped in the timeframe specified. With these tools, you can now answer the critical questions: "What is the total value our experimentation program has delivered?" and "What experiments impact our core metrics?" This data is invaluable for securing continued support and resources from leadership. Best of all, this view is always available, making it easy to share on a consistent basis, and not just at planning time. This means you don’t just get to report on success driving impact, you get to celebrate it!

‎

Velocity: Aligning Your Organization on Input Metrics

While outcomes are crucial, the Performance Scorecard also focuses on the inputs that drive those results. Experimentation velocity is one of those key inputs.

Once we decided on a quarterly goal, as the PM I would always advocate for loading up the team’s roadmap with a number of projects I believed would move that metric, with the one I believed would have the biggest impact first. Even if that project failed, the team would have more shots on goal before the quarter was over to move that metric.

With the Performance Scorecard, leaders can track if their teams are also taking enough shots on goals. By tracking experiment velocity, you can:

- Set and monitor organization-wide goals (e.g., running 10 tests per quarter)

- Identify teams that may need additional support or resources

- Encourage a culture of continuous experimentation across your organization

‎

Quality: Ensuring Rigorous Experimentation Practices

The quality of your experiments is just as important an input as their quantity. The Performance Scorecard helps you maintain high standards by tracking key quality metrics. This feature addresses common misconceptions and ensures that your organization is following best practices.

For example, I worked for a leader who suggested running experiments with a 20% control and 80% variant split to get new features to users faster. While well intentioned, this actually led to slower experiments and lagged velocity, as a 50%/50% split gives much more signal and ultimately a faster decision.

The Quality section of the scorecard helps you:

- Monitor experiment design parameters across your organization

- Identify and address potential issues before they impact your results

- Educate stakeholders on the importance of rigorous experimental design

‎

Win-Rate: Testing the Right Ideas

According to research, the success rate of experiments ranges from 8% to 33%. Over a large sample of experiments, this is what most teams should be. Yes we see many teams that fall outside this range.

If a team’s win rate is below what is expected, that indicates that the hypotheses aren’t good enough, and perhaps not founded in good customer insights. On the other end, if a team’s win rate is above this range, the experiments usually don’t have enough impact. These are easy and small ideas that win but don’t move the needle enough to achieve goals set.

With the Win-rate section of the scorecard you can:

- Understand how team win-rate compares to industry benchmarks

- Celebrate teams that are shipping a mix of successful, neutral, and unsuccessful experiments

- Investigate if a team has unusually high or low win-rates

‎

Empowering Program Leaders and Winning Executive Buy-In

The Performance Scorecard is more than just a reporting tool – it's a catalyst for building a true culture of experimentation. By providing program leaders with a comprehensive view of their experimentation efforts, we're enabling them to:

Evaluate performance across teams and over time
Identify areas for improvement and optimization
Demonstrate the value of experimentation to skeptical stakeholders
Make data-driven decisions about resource allocation and program direction

For executives, the Scorecard offers a clear, holistic view of the experimentation program's impact on the business. This transparency fosters trust and encourages continued investment in data-driven decision-making.

Transforming Experimentation from a Tool to a Strategy

With the introduction of the Performance Scorecard, Eppo is taking experimentation to the next level. We're moving beyond individual tests to create a comprehensive system for measuring, monitoring, and optimizing your entire experimentation program.

By providing insights into aggregate impact, velocity, and quality, the Performance Scorecard empowers you to build a more effective, more efficient, and more impactful experimentation culture. It's not just about running tests anymore – it's about transforming experimentation into a core strategic advantage for your business.

Ready to elevate your experimentation program? Contact us today to learn more about the Performance Scorecard and how it can drive growth for your organization.