Engineering
AB Testing 101 for Engineers
What I wish I knew about AB testing when I started my career
Learn more
Ever release a new feature, only to face unexpected issues or negative user reactions? Feature flags take the guesswork out of development. They let you control the rollout and visibility of changes, minimizing risk and allowing you to get useful customer feedback early.
In this primer, we’ll explore what feature flags are and how they can help you test out features with a portion of your audience before they go live for everyone.
We’ll go over:
What are feature flags exactly?
How do they work?
What are some practical uses of feature flags?
Example scenarios where feature flags work
A quick guide for using feature flags
Key benefits of feature flagging
Pitfalls to avoid when using feature flags
FAQs
Let’s jump right into it.
Feature flags (or feature toggles) are a software development and experimentation technique that allows you to turn features on or off within your code without redeploying the entire application.
Think of them as light switches that give you granular control over which parts of your software are visible and accessible to different users or segments.
Here are some of the main reasons why feature flags are so important:
Minimizes the risk of breaking things: Feature flags provide insurance against introducing half-finished or buggy features into your main product. This protects your reputation and keeps users happy.
Validate features before committing fully: Don't gamble on what users want. Experiment directly in production, getting concrete data to support which changes will actually improve your metrics.
Meets the needs of a diverse audience: Not every feature is right for every user. Personalize experiences to ensure your product aligns with differing needs, increasing its value for each customer segment.
Guarantees uptime during critical moments: When problems arise, feature flags provide the agility to react without a full-scale panic. Ensure core services keep running while you address the issue.
Innovate faster and with greater confidence: Feature flags let you push development boundaries without jeopardizing release reliability. This promotes a culture of experimentation that helps you propel your product forward into new horizons.
At their most basic level, feature flags are conditional statements ("if-else" logic) within your code. However, the true power and complexity of feature flags come from how you manage them at scale.
Let's illustrate this with an example. Imagine you've built a new image editing feature for your photo-sharing app. You wrap the relevant code for this feature within a feature flag named something like "advanced-image-editing."
When a user tries to access this functionality, your code checks the central configuration to see if the "advanced-image-editing" flag is enabled for that user.
If the flag is enabled: The feature functions as intended, and the user has access to the new editing tools.
If the flag is disabled: Your code executes an alternative path, either hiding the feature from the user interface or perhaps providing a simplified experience.
This is a very simple example — it’s just one new feature controlled by a single flag. It’s easy to handle directly in your code.
But what if you have dozens of features in development, experiments running, and targeted rollouts happening for different user groups? That's where a dedicated experimentation and feature flag management platform becomes essential.
Here’s why:
Centralized control: A feature flag management platform provides a dashboard to control the state of all your flags without editing the codebase for each change. This is vital when numerous features or experiments are active at once.
Refined targeting: These systems allow you to define complex rules for who sees what. Target based on user IDs, subscriptions, locations, behaviors... the possibilities go far beyond a simple "on/off" toggle.
Experimentation at scale: Tracking experiments across multiple feature variations, each potentially impacting metrics differently, requires careful analysis. Feature flag platforms often integrate with your analytics tools to make sense of this data, streamlining the process of finding winning changes.
Performance and reliability: Evaluating a large number of flags needs to be a lightning-fast process to avoid slowing down your app. Dedicated systems are optimized for this, so you save precious time.
Remember: While feature flags are simple in concept, a robust feature flagging system involves more than just code snippets. It includes a management interface, intricate flagging rules, integration with your development workflow, and the ability to handle a high volume of flag evaluations without impacting your application's performance.
Now that we’ve gone over what feature flags are and how they work, let’s look at some practical uses in SaaS:
Feature flags let you confidently ship code even when features aren't fully complete. Imagine you're building a complex new recommendation system. With feature flags, you can integrate that system into your main codebase without exposing it to users.
Thorough testing happens behind the scenes, and if a major bug appears, you simply toggle the feature off, avoiding a panicked rollback of your whole release. This is especially valuable when you're aiming for frequent updates.
Not sure if a new button color will boost sales? Or whether a revamped signup flow is actually easier for users? Feature flags make A/B testing much easier.
You code up both versions, wrap them in flags, and let the data speak for itself. Half your users see option A, half see option B. By tracking key metrics, you get a clear winner instead of relying on gut feelings.
This doesn't just apply to visual changes — you can test API changes, LLM outputs, or even backend logic variations.
Sometimes, you want to give new features to specific audiences first. Perhaps your premium customers get early access to a beta feature, or you gradually roll out a change across different countries to test infrastructure load.
Feature flags let you control exactly who sees what. This is great for getting feedback from select groups or carefully managing the strain on your systems.
Let's face it: Things sometimes go wrong in production. A third-party service you rely on might misbehave, or sudden traffic spikes could threaten performance. Feature flags act like quick-access "kill switches."
You can instantly disable the troublesome feature or degrade functionality in a controlled way. This buys you time to fix the root problem without your whole app crashing and burning.
Let’s take a quick look at some more practical uses for feature flags that go beyond the basics:
Simplified compliance: If regulations change or you need to cater to specific geographical privacy laws, feature flags let you quickly adapt those targeted user experiences.
Dark launches: Release a feature to a tiny percentage of users or even just internal staff. This is great for early stress testing, identifying scaling issues, or gathering initial feedback before the "big reveal."
Load management: Degrade non-critical features gracefully during unexpected traffic spikes, preserving core functionality and preventing total outages.
Migration support: Run old and new systems side-by-side, toggling traffic between them to ease complex migrations or mitigate risk.
You've invested heavily in a new image search tool for your e-commerce site, allowing shoppers to find products visually. However, this feature relies on complex algorithms that could strain your servers. Feature flagging provides a safety net.
First, you wrap the image search functionality in a feature flag, starting it in the “off” mode. You then introduce the feature to a small percentage of users (say 1%), keeping a keen eye on server metrics. Since your feature flag system lets you dynamically adjust configurations, you steadily increase exposure as performance proves stable.
If problems arise, a simple toggle flips the feature off, buying time to troubleshoot without disrupting your whole site.
Additionally, you could initially restrict the image search to internal employees for real-world testing and feedback before a wider release.
You're torn between two revamped checkout page designs — one prioritizes speed, the other emphasizes extra product recommendations. How do you know which will boost your sales more effectively?
Here, feature flags and A/B testing work together. You code both designs, each controlled by its own feature flag. Your feature flag system randomly assigns 50% of users to see Design A and the rest to see Design B.
Remember, metrics tracking is vital — you compare the percentage of users successfully completing the checkout process for each design. The clear winner gets permanently enabled via your feature flag system, backed by concrete data.
Bad news… Your integration with a major payment processor suddenly fails for some users, throwing errors and disrupting orders. Feature flags can prevent this from spiraling into a full-blown crisis.
You could create a feature flag to quickly disable the payment feature altogether, providing a clear message to those affected instead of confusing error messages.
For a more nuanced solution, implement a temporary fallback — perhaps a simplified payment option like "cash on delivery" — enabled via a second feature flag.
What’s more, it's even possible to target this fix geographically — if the bug only impacts one region, a feature flag lets you disable payments there while keeping service online for everyone else.
The right way to manage feature flags depends on your project's size and how you'll use them. For smaller projects, you can often start with a simple configuration file (think YAML or JSON). This stores your feature names and on/off states, keeping things quick and easy with just a few toggles.
As your feature flag usage expands, you'll likely want a dedicated feature flag management platform like Eppo.
This powerful experimentation platform offers user-friendly dashboards, visual editing for toggles, and the ability to define complex targeting rules (like enabling a feature for users of a certain subscription tier) — plus many more features we’ll discuss in our “Next steps” section.
Where you insert feature flags in your codebase strongly impacts what you can do with them. There are two main locations to think about:
At the edge: This means in your frontend code, like your web app or mobile app interface. These toggles are perfect for directly controlling what users see: New layouts, hidden features for select groups, or experiments comparing different designs.
In the core: If a feature involves your backend services or complex algorithms (say, a new search system or caching strategy), your flags need to live within those codebases. This lets you toggle the internal workings of your system.
Remember, feature flags are more than simple on/off switches. Think creatively about these use cases:
Gradual rollouts: Start by exposing a new feature to a portion of your traffic, keeping a close eye on performance and errors. Slowly increase the percentage as you build confidence. This careful approach is especially vital for major overhauls.
Targeting: Zero in on specific user groups. Enable a beta feature for select power users before a wider launch, roll out changes country-by-country to test the waters, or grant access based on subscription tiers.
Experiments: Let cold hard data guide your decisions. Run A/B tests where your system randomly assigns users to different variations of a feature, tracking how they perform against your key business metrics.
As you use more feature flags, it's easy for things to become cluttered. Here's how to keep your codebase clean and understandable:
Clear naming conventions: Don't just use generic names like "test1". Choose names that clearly describe what the feature flag controls.
Documentation: Even a brief comment attached to a feature flag definition helps. Note why the flag exists, when to remove it, and who's responsible for it.
Keep it clean: Set reminders or schedule recurring reviews to evaluate your feature flags. Remove those that are no longer needed to prevent code bloat. Some feature flag platforms allow you to set automatic "expiry dates" on flags, enforcing cleanup.
We've already seen how feature flags enable safer releases, controlled rollouts, and testing. But their benefits for SaaS companies go even deeper:
Reducing risk: Feature flags act like safety nets. When releases contain incomplete features, you greatly reduce the risk of major disruptions should something go wrong. This is vital for SaaS, where constant uptime is expected.
Empowering product teams: Data-driven decisions are key to SaaS success. Feature flags let product teams easily run experiments directly in their production environment, getting real-world results. This eliminates guesswork about what features will drive growth, engagement, and customer satisfaction.
Embracing modern development practices: Feature flags encourage trunk-based development. This approach, where small code changes are merged frequently, is a hallmark of fast-moving SaaS teams. It leads to smaller, more manageable releases with fewer potential conflicts.
Adapting to change: SaaS often means serving clients across various regions or evolving with shifting regulations. Feature flagging gives you the ability to tailor experiences based on the user's location or other attributes, ensuring compliance and meeting local needs effectively.
While feature flags are such a useful tool, it’s important to not ignore the potential pitfalls of incorrect implementation. Here are some mistakes you should avoid:
Each feature flag you add creates a bit of technical debt — extra code branches and logic. Leaving old, unused flags clutters your codebase, making it harder for developers to understand what's actually active and what can be removed.
Prevent this by having a clear plan for removing flags after a feature is fully released or an experiment is concluded. Remember that some platforms allow you to set automatic expiration dates, so consider using them to make cleanup easier.
Imagine seeing a flag called “test_12” without any further explanation. Weeks later, no one remembers why it exists or what it does.
Avoid this frustration by using descriptive names for your feature flags.
Additionally, add brief comments explaining their purpose. This is invaluable when it's time to tidy up or when a new developer joins your team.
Feature flags let you test code in your production environment, but that's no excuse to skimp on regular and continuous testing. It's important to have a solid testing suite for each possible variation that a flag controls, ensuring your changes don't break existing functionality.
While basic feature flags are simple, using a lot of them at once can become tricky. If every little tweak is behind a flag, it becomes hard to anticipate how different combinations of features will interact.
This issue is exacerbated if multiple teams manage feature flags without proper coordination. Have a plan for managing and testing a growing set of flags and iterating constantly so experimentation becomes core to your business strategy.
Feature flags are absolutely worth it for most software businesses, especially those in SaaS. They reduce the risk of major release problems, enable data-driven experiments, and simplify operations.
You can manage feature flags with simple configuration files for smaller projects, or invest in a dedicated feature flag management platform for larger-scale use.
Remove a feature flag once its purpose is fulfilled — when a feature is fully released, an experiment has concluded, or a temporary toggle is no longer needed. Leaving unused flags behind creates technical debt and confusion.
Now that you understand the core concepts and benefits of feature flags, you might be wondering how to best implement them for rapid, safe development. Instead of straining your teams by requesting them to manage feature flags, use specialized feature flag software.
Consider Eppo, an experimentation and feature management platform built to facilitate the use of feature flags for experiments of all types, such as A/B testing.
Here’s how you can go beyond just toggles with Eppo:
Know your audience, target your impact: Eppo lets you deliver new features or experiments to specific user segments. Tailor experiences based on demographics, behavior, device types — anything that helps you create a more focused release or test.
Protect your user experience with kill switch control: Unexpected issues happen. Eppo's feature management gives you the ability to quickly disable buggy features without a full rollback. Minimize headaches, save time, and protect your users from frustration.
Decisions built on trustworthy data: Eppo sits on top of your existing data warehouse (Snowflake, BigQuery, Redshift, etc.). This guarantees you're experimenting with the most reliable, up-to-date metrics, driving informed choices for your product's success.
Experiment smarter, not harder: With advanced statistical tools like CUPED and sequential analysis, Eppo helps you find winning variations quickly. Gain insights faster and iterate on your product improvements with greater efficiency.
Accelerate your experimentation with precision: Use Eppo's support for contextual bandits to create personalized experiences for your customers. Ensure features are relevant to each user, boosting engagement and satisfaction.
Discover how feature flags empower product and engineering teams to release with confidence, test rigorously, and optimize based on real-world data.