Strategy
January 17, 2025

Production Testing Guide For Safely Experimenting with New Features

Ryan Lucht
Before joining Eppo, Ryan spent 6 years in the experimentation space consulting for companies like Clorox, Braintree, Yami, and DoorDash.

TL;DR:

  • Testing in production is crucial for real-world software performance.
  • Different types of testing in production include unit, integration, performance, regression, functional, A/B, and chaos testing.
  • Production testing is broken down into deploy, release, and post-release phases.
  • Best practices for safe and effective production testing include using feature flags, real-time monitoring, starting small, scaling gradually, automated testing, and having rollback and recovery plans.
  • Eppo's platform offers tools for automated testing in production to simplify workflows and ensure stable new feature releases.


Testing in production can feel like risky business, but when it's done right, it's one of the most effective ways to make sure your software meets real-world demands. By testing in the live environment, you get a clearer picture of how new features perform, catch bugs early, and make data-informed improvements.

In this blog, we'll walk you through how to implement production testing safely and efficiently.

What You'll Learn:

  • Why testing in production is critical for delivering high-quality software that performs under real-world conditions.
  • The best practices for safely testing new features in production include the use of feature flags and real-time monitoring.
  • How to leverage tools like Eppo to automate and streamline production testing while minimizing risk.

What Do We Mean By 'Testing in Production'?

A production testing environment allows you to run your code and test new features directly on live systems, using real data and real users.

Instead of running tests in a controlled staging environment, you're verifying how your software performs in the actual production environment where it will eventually live. This process usually involves tools like feature flags and limited rollouts so you can test with a smaller group of users first and minimize risk.

Some of the Benefits Offered to You by Testing in Production Include:

Real-World Validation: Testing in production provides the most accurate insights into how new features will perform under real-world conditions. While staging and development environments are valuable, they can't fully replicate the unpredictability of live user traffic, edge cases, or real user behavior.

Faster Feedback Loops: Unlike in controlled environments, where feedback is slower and may not reflect the full range of user behaviors, production testing allows you to identify issues in real time. This rapid feedback helps you catch bugs or performance issues early so you can roll out fixes quickly and minimize downtime.

Let's Take a Look at Some of the Most Common Types of Testing in Production

Unit Testing

Unit testing involves verifying the most minor, isolated units of code to make sure they function as expected. Even after deploying new code changes to production, running unit tests helps confirm that individual components still work correctly in the live environment.

Integration Testing

In a live production environment, any updates or changes to one part of the system could affect others unexpectedly. Integration testing's job is to make sure that different components of the application (e.g. APIs, databases, external services, etc.) are interacting smoothly.

Performance Testing (Load & Stress Testing)

Performance testing in production allows you to see how new features perform under real-world conditions.

Load testing checks how the system handles regular user traffic, while stress testing pushes the system to its limits to see how it responds under extreme conditions like unexpected spikes in user activity.

Regression Testing

Regression testing verifies that new updates or features haven't broken any previously functioning parts of the system. This is especially critical in production environments where any disruption—no matter how small—can negatively impact the user experience.

Functional Testing

Functional testing in production includes verifying all user-facing elements like logging in, completing transactions, or navigating through the app. By running functional tests, you can confirm that end users can interact with the new feature or product as expected before a broader release.

A/B Testing

A/B testing allows you to run experiments by serving different versions of features to subsets of real users. Testing in production lets you gather real-time data on how users interact with these variations and gives you insight into which feature version to roll out to the entire user base.

Chaos Engineering

Chaos engineering involves intentionally introducing controlled disruptions to test the resilience of your system. By simulating unexpected events like server crashes, network failures, or traffic surges, chaos testing helps you identify weaknesses in the recovery of your system.

Testing in Production is Typically Broken Out Into Three Different Phases

Deploy Phase

The deploy phase is the first step where new code is introduced to the production environment. At this stage, the focus is on making sure the system will be stable after deployment and that core features are functioning as expected. Testing during this phase includes integration testing to confirm that all parts of the system interact properly and load testing to make sure the system can handle real user traffic without issues.

Release Phase

Once the code has been deployed and basic functionality is verified, the release phase begins. In this phase, controlled rollouts are key to managing risk.

One common rollout strategy during this phase is called blue/green deployment. This process involves maintaining two identical production environments: one that serves active users and one where the new version is tested. By gradually switching traffic between the two environments, teams can validate new code in a real-world context. If the latest version experiences problems, traffic can be switched back to the previous version.

Another common method is the canary release, in which new features are rolled out to a small group of users before expanding to the entire user base. By testing the feature with a limited audience, developers can assess its impact and address potential problems before it affects a larger group of users.ep.

Post-Release Phase

After a feature has been fully deployed and exposed to the wider user base, ongoing testing and monitoring still continue. The post-release phase focuses on measuring the effectiveness of the feature and making sure that it doesn't introduce any unforeseen issues. This phase typically involves A/B testing to compare different versions of a feature and determine which performs best in terms of user engagement, conversions, or other relevant metrics.

Chaos engineering can also be used during the post-release phase to simulate disruptions. The goal is to identify vulnerabilities and make sure that the application's recovery can maintain stability and minimize downtime.

When Should You Test in Production?

After New Code Deployments

Once new code is deployed, run quick smoke tests in production to check whether core features are still functioning as expected. This initial test checks for stability before diving into more detailed testing.

For Complex Features

When introducing complex features that rely on real user interactions, like personalized recommendations or payment processing, testing in production is going to be your best friend. By testing in the live environment, you confirm that the feature works as intended with actual user data and traffic.

In Continuous Delivery Pipelines

Incorporating production testing into your continuous integration and deployment pipelines allows you to test continuously and gather immediate feedback on every change. This real-time validation helps identify issues early.

Best Practices for Safe and Effective Production Testing

Feature Flags

Use feature flags to manage who sees new features and when. Gradual rollouts allow you to minimize risk by exposing new code to only a subset of users first. This reduces the potential blast radius in case something goes wrong.

Real-Time Monitoring

Set up strong monitoring and observability tools to track system performance in real-time. By integrating this with your CI/CD pipeline, you'll be able to spot failures and trigger alerts or automatic rollbacks so you can stay on top of issues before they affect users.

Start Small, Scale Gradually

Testing a new feature with a small user group is key to managing the risk of testing. From here, you can gradually expand the rollout as confidence in the feature's stability and performance grows.

Automated Testing

Testing is time-consuming and resource-intensive. It's highly recommended that your tests be automated in production, especially for high-traffic apps.

Rollback and Recovery Plans

Always have a clear recovery plan in place. If something goes wrong, feature flags can quickly disable problematic features without needing to redeploy so you can address issues on the fly without causing downtime or additional disruptions.

Production Test with Confidence With Eppo

Eppo's platform offers test automation tools like feature flagging, real-time metrics, and full integration into your CI/CD pipeline to simplify your workflows and make sure new features are stable before full release.

Ready to take your testing in production to the next level? Request a demo today and see Eppo in action!

Table of contents

Ready for a 360° experimentation platform?
Turn blind launches into trustworthy experiments
See Eppo in Action

Ready to go from knowledge to action?

Talk to our team of experts and see why companies like Twitch, DraftKings, and Perplexity use Eppo to power experimentation for every team.
Get a demo