Engineering
AB Testing 101 for Engineers
What I wish I knew about AB testing when I started my career
Learn more
Once you’re ready to get your first experiment going, the first, and arguably most important part, is setting it up properly. This blog post will talk about feature flagging and randomization, two of the key cogs in the experiment setup machine. By the end of this post you should understand how to split traffic between user groups, and how to randomize which users are in each of those groups.
Feature flagging is simply the ability to show different segments of your user base different things in your app. Experimentation is one use case for feature flagging – where you want to randomize who sees those different things – but it’s definitely not the only one. Engineering teams will often want to split traffic to different app builds by geography, account size, or even age, for anything from gated feature rollouts to geo-specific features. That’s why feature flagging is often owned by engineering, while experimentation and metrics might be owned by a data team.
Randomization, on the other hand, is pretty much an experimentation-specific thing (as far as we’re aware of). While standard feature flagging use cases will take a user attribute like location as the determining factor for which variant they see, with experimentation, you want that factor to be completely random – otherwise your experiment is not really an experiment.
That doesn’t mean you have to run the experiment on your whole user base – often you’ll want to limit it to just users in a specific geo, or another trait – but once you shrink your population to that circle, the determining experiment factor needs to be random.
Once you’ve got feature flagging and randomization set up, all that’s left is creating some sort of switch code that figures out which group a user is in, and serves the right variant based on that information. That will usually take the form of a simple if statement that maps the user’s experiment group to the experience you want to deliver (e.g. change some CSS, remove some code, etc.). This is ephemeral code that you’ll remove during experiment cleanup.
The randomization function is an interesting one. At the core, you’ll need some function somewhere (this could be a third party SaaS tool) that for an individual experiment and an individual user generates a group that the user belongs to. How does that work exactly?
Standard practice here is a stateless (this may surprise you) function that does a few things:
A basic version in semi-pseudo-code might look something like:
Because of the way hashing works, we don’t need this function to be stateful – given an experiment key and entity / user ID, the function will always return the same variant. So this is something you’d run every time the application loads, as opposed to running it once, storing the results, and querying that data at runtime.
Though the randomization function is stateless, we do want to store our assignment data for another purpose: analysis down the road. To do so we’ll want a basic append only table in our warehouse (or whatever database you’re using) that looks something like this:
To reiterate, this isn’t an operational table that gets queried at experiment runtime – it’s an analytical one to look at experiment results later.
To get data into this table, we’ll add something like the below. If you’re using an ORM, whatever the standard insert syntax is will work.
If you’d rather not write raw SQL, it’s common to use an event logger (such as Segment or Rudderstack) to abstract the flow of data from your app into your data sink. The important thing here is that for a given experiment, a given user always gets the same assignment. To that effect it’s OK if there are duplicate rows, as long as the data matches (you can always dedupe later).
Underlying all of your feature flagging and experimentation efforts needs to be an experiment config object – it stores state for your running experiments and their parameters. Standard practice here is to create a JSON object and store it somewhere, maybe Redis, etc. Here’s a sample object that has one experiment in it: a simple change to the checkout button color.
Let’s break this down. Each experiment occupies one entry in the experiments object, with a name as a key (this one is pretty straightforward). The subjectShards key corresponds to that buckets concept we mentioned earlier – this helps our randomization function translate hashes into numbers and numbers into groups.
The variations key is a list of all of our variants. This experiment has only two, but we can add an arbitrary number. For each, we specify which buckets from our randomization function should be placed into the variant. Note that for this experiment, we’re only selecting (at random) 50% of our user base, delineated by the percentExposure property.
Serving these configurations can be a challenge if there’s decent throughput. Even if the absolute number of users in your experiment isn’t particularly high, depending on how the experiment is designed, you might have several page loads every few seconds that require access to the experiment config object. A simple solution is to cache the object so your users don’t need to download it from the server every time; the downside is, of course, one of the few hard problems in computer science, as they say (cache invalidation).