AI/ML
How Netflix, Lyft, and Yahoo use Contextual Bandits for Personalization
Inspiration from some of the world's biggest tech orgs
Learn more
Personalized user experiences are a ubiquitous part of products and marketing today, to the point that they’re easy to miss. When a personalized approach is missing, however, it can be blatantly obvious. Have you ever gotten multiple sales emails for a product you already owned? Or what if a music streaming service recommended the same songs to every listener?
Any business is sure to have many places where personalization could make all the difference - but the bigger question is how to actually enable that personalization. An A/B test or analytics tool could help uncover heterogenous treatment effects (where preferences vary by user attributes), but won’t actually enable personalization. Rules-based personalization may help solve for very broad use cases with few possible outcomes (e.g. treat returning visitors to a website differently than new visitors), but won’t scale to making many and/or highly-specific decisions.
When Product, Marketing, or Growth teams need to create personalized user experiences, two tools stand out in the machine learning toolkit to help enable them: Contextual Bandit Algorithms and Recommendation Systems. Both can make individualized decisions on which of many possible user experiences to serve based on known information, but they use distinct methods that each have ideal use cases.
For starters, each approach “learns” about optimal decisions in a different way. Recommendation systems consider vast historical datasets specific to each user and the desired action in question, identifying patterns and predicting user preferences using a variety of potential techniques. Contextual bandit algorithms, on the other hand, focus on learning from and adapting to each interaction in real-time, making them more applicable without historical data, but for smaller problem spaces.
In this post, we’ll explore contextual bandit algorithms vs. recommendation systems and the relevant considerations for when to use each. We’ll also compare them both to traditional A/B experiments for a key point of comparison as a gold-standard decision-making methodology.
Perhaps the simplest distinction in applications for recommendation systems vs. contextual bandit algorithms is “how big is the job at hand?”
If you need to choose between thousands of potential experiences (or more), and need to make that choice many times for each user, recommendation systems are highly complex models that will give you the necessary power for the job. Contextual bandits, on the other hand, are for slightly smaller jobs - considering tens to hundreds of experiences, and probably only choosing a few times for each user.
Recommendation systems can handle much larger tasks than contextual bandits, but they require far more in the way of resourcing too: more data, more computational power, and more people working on them. They’re like a cargo ship that can “transport” thousands of treatment arms, staffed by a team of professionals.
Many businesses may only have a single use case that calls for the heft of a recommendation system: an eCommerce brand recommending specific products for users to browse, or a streaming service choosing which titles to feature on a user’s home page.
This still leaves many smaller problems that could benefit from personalized user experiences. They may not warrant an entire team’s effort to optimize, or be actions that users perform many times over, and these use cases are where contextual bandits shine.
Take the example of a streaming video service: in addition to choosing which titles to feature on a user’s home page, they may also want to experiment with different “posters”/featured images for each title. Here, the problem space is much smaller: each title has a small handful of potential posters, not thousands. We also won’t observe users taking the desired action as often: they will usually click on a given title a small handful of times, not hundreds. But it still makes sense that certain images would appeal to some users more than others. Here, a contextual bandit algorithm would be a perfect, far more lightweight tool to personalize the user experience.
In that way, a contextual bandit is like a speedboat - it will carry less cargo, but require less fuel too. You’ll also be able to step on the accelerator a lot quicker, because contextual bandits solve for the cold start problem.
Another immediate consideration you’ll need to make is based on how much data is available that’s relevant to the optimization problem.
For a recommendation system to work, it needs plenty of historical data about the specific action and user in question. This means our audience needs to be known users, not newcomers to an app or website, and that the action we’re optimizing is something we can observe them doing many times. For this reason, recommendation systems have a notable cold start problem: for a new user or action, there is not enough information available to make good recommendations.
Contextual bandits tackle the cold start problem head on, because they dynamically balance exploration and exploitation to efficiently learn which actions are best for specific contexts. This makes contextual bandits more appropriate in scenarios with less historical data to lean on. For example, the available actions might change quickly over time, or there is little interaction with individual users.
The impetus for either of these tools probably originates with the same ask from a marketing or product team: how can we enable 1:1 (as opposed to broader, rules-based) personalization? When you hear a question like this, it’s probably time to start asking questions about the use case to help inform a contextual bandit vs. recommender system decision.
Many commercial tools for both contextual bandits and recommendation systems exist today that are marketed as personalization tools, and sold largely to marketing teams. Rather than tools to build bespoke models, these are usually “off the shelf” models that consider some pre-determined list of characteristics (usually ones that are easily determined on any website) - what browser does the user use, what is their location, time of day, are they a new or returning user?
The real power of either tool, though, is unlocked by building solutions that are specific to each use case. You can use far more informative characteristics or data, which goes a long way towards actually achieving positive business impact.
It’s important to strategically consider which “features” your solution should learn about and optimize for. You want to choose enough features to paint a meaningful picture of each user. But add too many, and your bandit will need to spend more time exploring vs. exploiting, or your recommendation system could become prone to overfitting. Make sure that each feature you add is highly informative, valuable information. An example of a bad feature (usually) would be a user’s country: it is probably not informative to the action being optimized, and it will be very hard to gather sufficient data about users in e.g. Luxembourg, diminishing efficacy.
It’s also worth quickly clarifying that most multi-armed bandits are not contextual bandits. A traditional multi-armed bandit presumes that there is only one optimal action, and their goal is to zero in on that single best choice as quickly as possible (while minimizing error).
While contextual bandit algorithms and recommendation systems work for similar “flavors” of the same problem (personalization), a more apt comparison for a traditional multi-armed bandit would be an A/B test. Both are trying to discover a treatment that globally outperforms the status quo, but while the A/B test maintains a fixed and randomized allocation of users to treatments, the multi-armed bandit tries to move from exploration to exploitation in real-time.
For more on this, you can read Sven Schmit’s “multi-armed bandit vs. A/B testing” guide on this blog.
When you need a truly personalized user experience, the parameters of the proposed personalization will determine whether you reach for a contextual bandit or a recommendation system. To summarize, here are some easy-to-reference bullet points to help you discern the best tool for the job:
How many possible user experiences are we deciding between?
Contextual bandits are well poised to handle 10-100 arms - when you have thousands, use a recommendation system
Do we have large amounts of historical user- and action-specific data to evaluate?
If so, recommendation systems are available to you - but if you’re facing a cold start problem, you’ll be better served by a contextual bandit
How many times do we need to make this decision for each user?
If many times, maybe a recommendation system is best. If only a few times, contextual bandits will work great.
How large is the problem we’re solving, and is it worth the resources required to build a complex model?
For the biggest problems (and biggest opportunities), a recommendation system may be warranted. For everything else, there’s contextual bandits.