Engineering
January 12, 2023

Introduction to Feature Flagging and Randomization

Feature flagging and randomization are two prerequisites for running experiments. Here's how they work.
Justin Gage

Once you’re ready to get your first experiment going, the first, and arguably most important part, is setting it up properly. This blog post will talk about feature flagging and randomization, two of the key cogs in the experiment setup machine. By the end of this post you should understand how to split traffic between user groups, and how to randomize which users are in each of those groups.

The basics: feature flagging and randomization

Feature flagging is simply the ability to show different segments of your user base different things in your app. Experimentation is one use case for feature flagging – where you want to randomize who sees those different things – but it’s definitely not the only one. Engineering teams will often want to split traffic to different app builds by geography, account size, or even age, for anything from gated feature rollouts to geo-specific features. That’s why feature flagging is often owned by engineering, while experimentation and metrics might be owned by a data team.

Randomization, on the other hand, is pretty much an experimentation-specific thing (as far as we’re aware of). While standard feature flagging use cases will take a user attribute like location as the determining factor for which variant they see, with experimentation, you want that factor to be completely random – otherwise your experiment is not really an experiment.

That doesn’t mean you have to run the experiment on your whole user base – often you’ll want to limit it to just users in a specific geo, or another trait – but once you shrink your population to that circle, the determining experiment factor needs to be random.

Once you’ve got feature flagging and randomization set up, all that’s left is creating some sort of switch code that figures out which group a user is in, and serves the right variant based on that information. That will usually take the form of a simple if statement that maps the user’s experiment group to the experience you want to deliver (e.g. change some CSS, remove some code, etc.). This is ephemeral code that you’ll remove during experiment cleanup.

Randomization: a deeper dive

The randomization function is an interesting one. At the core, you’ll need some function somewhere (this could be a third party SaaS tool) that for an individual experiment and an individual user generates a group that the user belongs to. How does that work exactly?

Standard practice here is a stateless (this may surprise you) function that does a few things:

  • Takes an experiment config object as an input (more on this later)
  • Hashes the user ID into a string
  • Converts the string into a number
  • Outputs a variant to assign the user to based on the number

A basic version in semi-pseudo-code might look something like:

Because of the way hashing works, we don’t need this function to be stateful – given an experiment key and entity / user ID, the function will always return the same variant. So this is something you’d run every time the application loads, as opposed to running it once, storing the results, and querying that data at runtime.

Storing assignments for later analysis

Though the randomization function is stateless, we do want to store our assignment data for another purpose: analysis down the road. To do so we’ll want a basic append only table in our warehouse (or whatever database you’re using) that looks something like this:

To reiterate, this isn’t an operational table that gets queried at experiment runtime – it’s an analytical one to look at experiment results later. 

To get data into this table, we’ll add something like the below. If you’re using an ORM, whatever the standard insert syntax is will work.

If you’d rather not write raw SQL, it’s common to use an event logger (such as Segment or Rudderstack) to abstract the flow of data from your app into your data sink. The important thing here is that for a given experiment, a given user always gets the same assignment. To that effect it’s OK if there are duplicate rows, as long as the data matches (you can always dedupe later).

The experiment config object

Underlying all of your feature flagging and experimentation efforts needs to be an experiment config object – it stores state for your running experiments and their parameters. Standard practice here is to create a JSON object and store it somewhere, maybe Redis, etc. Here’s a sample object that has one experiment in it: a simple change to the checkout button color. 

Let’s break this down. Each experiment occupies one entry in the experiments object, with a name as a key (this one is pretty straightforward). The subjectShards key corresponds to that buckets concept we mentioned earlier – this helps our randomization function translate hashes into numbers and numbers into groups. 

The variations key is a list of all of our variants. This experiment has only two, but we can add an arbitrary number. For each, we specify which buckets from our randomization function should be placed into the variant. Note that for this experiment, we’re only selecting (at random) 50% of our user base, delineated by the percentExposure property.

Serving these configurations can be a challenge if there’s decent throughput. Even if the absolute number of users in your experiment isn’t particularly high, depending on how the experiment is designed, you might have several page loads every few seconds that require access to the experiment config object. A simple solution is to cache the object so your users don’t need to download it from the server every time; the downside is, of course, one of the few hard problems in computer science, as they say (cache invalidation). 

Summary and quick takeaways

  • Feature flagging and randomization are two prerequisites for running experiments
  • Feature flagging is the ability to route application features to specific user groups
  • Randomization assigns users at random (surprise) to any number of experiment variants
  • The experiment config object stores operational data about running experiments and their variants

Table of contents

Ready for a 360° experimentation platform?
Turn blind launches into trustworthy experiments
See Eppo in Action

Ready to go from knowledge to action?

Talk to our team of experts and see why companies like Twitch, DraftKings, and Perplexity use Eppo to power experimentation for every team.
Get a demo