A/B testing can be a powerful tool for optimization, but it often involves a patience-testing wait.
You meticulously set up your test, carefully define your metrics... and then you wait.
Not only do you wait, but you promise not to look at the results until your sample size is reached. Otherwise, it’s very likely that you’ll walk away with incorrect conclusions, thanks to what’s known as the “peeking problem.”
That means you’ll need time and discipline before it's clear whether your change is a winner, a loser, or just needs a larger sample size.
Sequential testing offers a solution. It proposes a different approach to A/B testing by letting you continuously analyze data and potentially reach conclusions much earlier.
In this primer, you’ll learn everything you need to know about sequential testing.
We’ll cover:
- What is sequential testing?
- What are the advantages of sequential testing?
- How sequential testing works step-by-step
- An example of sequence testing in action
- Best practices in sequential testing
- Challenges and considerations to keep in mind during implementation
What is sequential testing?
In A/B testing, sequential testing is a method that allows you to continuously analyze collected data and potentially make decisions about your experiments before reaching a predetermined sample size.
It is an evolution of traditional A/B testing, also called Frequentist statistics (named after its definition of probability).
Let's dive into how these differ and their ideal uses.
Key differences: Sequential vs. traditional A/B testing
|
Frequentist A/B Testing
|
Sequential A/B Testing
|
Decision Timing
|
Decisions are made after reaching a predetermined sample size.
|
Decisions can be made as data accumulates, potentially saving time and resources.
|
Data Monitoring
|
Interim data monitoring must be avoided to control false-positive rates.
|
Allows continuous data monitoring with statistical adjustments for multiple looks.
|
Focus
|
Better for small, incremental effects where precision is critical.
|
Best for major changes to quickly identify significant effects.
|
Use Cases
|
Ideal for fine-tuning existing features, where measuring effects with precision is crucial, or scenarios where large sample sizes are readily available.
|
Valuable for mitigating risks by stopping failing variations early, allowing for intuitive peeking by excited teams, and saving time when large effects are present.
|
For more on comparing these two statistical approaches (including two different flavors of sequential testing), read our article on Frequentist vs. Bayesian vs. Sequential methods.
The statistical basis of sequential testing
To maintain the validity of your results within this continuous analysis framework, sequential testing relies on specialized statistical techniques.
Instead of a single decision point at the end, it employs dynamic boundaries for efficacy and futility that adjust based on the amount of data collected. These boundaries control the likelihood of false positives and false negatives, even with frequent monitoring.
The advantages of sequential testing
Sequential testing can introduce valuable new dimensions to your A/B testing strategy. Let's explore some of the key benefits it offers:
- Adaptability in analysis: With sequential testing, you gain the ability to look at experiment results as they unfold. Teams can “peek” at any time without the risk of drawing incorrect conclusions.
- Potential resource savings: Sequential testing can reduce the sample size needed compared to traditional A/B tests when larger effects (positive or negative) are present. This can lead to significant savings in both time and the resources required to run the experiment.
- Faster, data-driven decisions: Perhaps the most notable advantage of sequential testing is the ability to make early decisions when clear winners or losers emerge.
This means you can roll out successful variations sooner and stop underperforming ones quickly, avoiding wasting time and resources.
How sequential testing works
While the statistical underpinnings of sequential testing can get quite technical, let's outline the practical steps involved in setting up and running a sequential test:
- Define your boundaries: Instead of a fixed sample size, sequential testing uses statistical boundaries, often called "efficacy" and "futility" boundaries. These boundaries change as more data comes in, and they control the likelihood of reaching false conclusions.
Don’t worry though, this is easy when using an experimentation platform that supports sequential analysis (like Eppo), which brings us to…
- Choose your tools: Besides needing an experimentation platform that can calculate sequential analysis, you’ll also want to ensure you have the right data pipelines for regular updates on results.
The advantage of sequential testing is in the interim peeks, so you’ll want your data to be updated as frequently as you plan on peeking.
- Monitor and analyze: As your test runs, you'll continuously monitor the results. Specific metrics (like conversion rate) will be compared against the statistical boundaries you defined.
Decision points
The key to using sequential testing smartly lies in these decision points:
- Cross an upper boundary: If your results exceed the upper (efficacy) boundary, you have enough evidence to declare a statistically significant winner! Consider rolling out the change more broadly.
- Cross a lower boundary: If your results fall below the lower (futility) boundary, it suggests the variation is performing negatively, or unlikely to reach significance at your minimum effect of interest (depending on what type of test you’re using). Consider stopping the test early to avoid wasting resources.
- In between: If results remain between the boundaries, you can continue collecting data, or deem the test inconclusive if no effect is detected even at your desired precision.
It’s important to make sure stakeholders understand result readouts. Powerful experimentation platforms like Eppo will calculate these boundaries behind-the-scenes so that stakeholders can easily read a scorecard of confidence intervals for each metric without needing to understand concepts like these boundaries or p-values.
(To keep describing results simple, we’ll use Eppo’s scorecards as an illustration.)
Example: Testing website layouts
- Goal: Improve user engagement by testing a redesigned website layout.
- Setup: Sequential testing is used because the redesign is a significant change, and the team wants to identify a winner (or stop a failing design) quickly.
- Interim analysis: The key metric (e.g., conversions) is monitored continuously.
Possible outcomes
- Clear winner: If conversion surges and the difference is statistically significant even when adjusting for the early nature of results, Eppo’s scorecard will light up green (and even throw some confetti to celebrate). Assuming no conflicting signal from guardrail metrics, the test can be stopped, and the new layout implemented broadly.
- Clear loser: If conversion drops drastically and the difference is statistically significant even when adjusting for peeking, the metric will light up red on Eppo’s scorecard. The test is stopped to prevent further decline.
- In between: Metrics will stay gray while Eppo’s Progress Bar displays how much more data would be needed to reach a desired precision target.
Implementing sequential testing: Best practices
To ensure your sequential A/B tests are both effective and reliable, let's dive into some key considerations:
Designing the test
- Choose the right situation: Sequential testing shines when you want to quickly identify major winners or losers, and when even sampling across days of the week is less important. It's ideal for high-risk changes or when quick action is paramount.
- Set meaningful boundaries: Work with a statistics expert or use specialized software to design the experiment and set goals that align with your risk tolerance and potential effect size.
- Sample size considerations: Even with sequential testing, consider a minimum sample size to ensure enough data for meaningful initial assessments. Understand that smaller effect sizes will also take longer to measure with sequential testing than a traditional Frequentist A/B test.
Monitoring and adjusting
- Choose the right tools: Use reliable statistical software or experimentation platforms for setting up the test, tracking metrics, and automatically calculating results and statistics.
- Regular analysis: Depending on the type of sequential test you use, you may need to establish a schedule for analyzing interim results in advance. With these “Group Sequential Tests,” resist the temptation to "peek" too often, but ensure you can act in a timely manner when needed.
Avoiding common pitfalls
- Ignoring confidence intervals: Don't focus solely on reaching statistical significance. Early in sequential tests, you may have wide confidence intervals, making it more difficult to gauge the business impact accurately.
- Using sequential testing to detect small effects: As mentioned earlier, when you care about detecting very small changes to your metrics, sequential testing will be a more “costly” method (in terms of time) than a traditional A/B test.
- Statistical expertise: Consider working with a statistician to ensure your testing methodology is sound and your interpretations of the results are accurate.
The “gotchas” of sequential testing
While sequential testing offers exciting possibilities, it's important to acknowledge the additional considerations and challenges it brings:
- Understanding the statistics: There are multiple flavors of Sequential test that use different approaches and accept different tradeoffs. Group Sequential Testing, for example, provides more power but requires pre-planning your peeks. Fully sequential tests, on the other hand, allow peeking at any time without additional planning, but trade-off some amount of statistical power.
For more on these topics, read our article comparing Group vs. Fully vs. Hybrid sequential tests.
- Speed vs. accuracy: The ability to make faster decisions is a major appeal of sequential testing. However, reaching conclusions early in the test often implies wider confidence intervals. It's essential to strike a balance and not sacrifice the ability to assess potential business impact for a quick win.
- The importance of tools: Effective sequential testing often relies on specialized experimentation and A/B testing platforms that include these capabilities.
Features like automatic boundary calculations, real-time analysis dashboards, and alerts when results cross thresholds — all part of Eppo’s feature offerings — make implementing and managing the process much easier.
Next steps
You should now have a clearer picture of how sequential testing helps you make quicker decisions and refine your experiments. Let's explore how Eppo amplifies these benefits while ensuring the accuracy and trustworthiness of your results.
Eppo is a unique experimentation platform built with statistical rigor at its core. For your sequential testing needs, this translates to:
- Pinpointing insights early: Eppo's granular tracking and real-time metrics mean you won't miss those crucial early signals within your experiment data. Trends and patterns emerge quickly, allowing for more decisive action.
- Visualizing results for confidence: Eppo's intuitive dashboards and analysis tools make it easy to monitor results against your dynamic boundaries. This clear visualization reinforces confidence in your decision-making process.
- Tailored analysis with segmentation: Slice and dice your data to understand how different user segments respond within your sequential tests. Identify where a variation performs exceptionally well (or poorly) within specific cohorts, uncovering deeper insights faster.
- Rigorous A/B testing for validation: Eppo's thorough A/B testing capabilities are tightly integrated with its sequential analysis. Seamlessly confirm early wins from sequential tests or refine strategies further with traditional methodologies, all in one platform.
- Data-driven decisions you can trust: Eppo's advanced statistical engine, diagnostics suite, and emphasis on transparency ensure your results are unimpeachable. Make those early calls with peace of mind.
Ready to start using Eppo for your sequential tests? Book a Demo and Explore Eppo.
Learn about sequential testing, a method that speeds up A/B testing outcomes. Understand how it works, best practices, and the advantages it offers.