
A/B Testing
Accelerate Experimentation Adoption with Protocols and Property Analysis
Learn more
At the core, in gaming, as elsewhere, experimentation and AB testing are about learning from customers, both free and paid, to identify more optimal designs and strategies that drive relepvant performance indicators such as engagement, retention, monetization, and user satisfaction. A successful experimentation program needs a roadmap for hypothesis-driven learning and innovation, centrally ensured rigor in experiment design and analysis, mechanisms for knowledge accumulation and dissemination, executive endorsement, and a culture that allows for risk-taking and failure.
Gaming, however, offers unique challenges and opportunities for experimentation. Important differences exist along different stages of the game development and publishing process. Both pre- and post-launch, experimentation informs important steps for bringing a game to market and managing it. While emergent game creation is the focus during earlier stages, commercial optimization becomes more important as a game is live in the market. With game designers, there can be concerns that experiment-driven optimization may compromise the artform that is game design – something that needs to be carefully managed through cross-departmental collaboration and a shared learning agenda.
Further aspects are unique to experimentation in gaming: Highly engaged player communities can make it hard to run certain types of experiments, e.g., with different prices, matchmaking algorithms, or other variations that might be perceived as unfair. As games are highly immersive and interactive experiences, the set of relevant indicators can be larger and more diverse than in other areas of experimentation. The publishing model (premium versus free-to-play), platform strategy (mobile, console, PC, cross-platform), growth trajectory (paid and/or organic), and ongoing development tactics (LiveOps, content updates) impact to what extent and how you can leverage experimentation. In selecting your game’s technology stack, you should also consider what experimentation and personalization strategies you want to support once the game is live in the market.
This article provides an introduction to touchpoints between game making and experimentation, structured against a matrix of a four-stage game development and publishing process and the four Ps of the marketing mix (product, place, price, promotion). To get started, let’s develop this conceptual matrix and populate it with key tasks essential to getting a game to market. We will then discuss how these tasks interact with experimentation, highlight key challenges and opportunities, and outline pathways that will help you succeed with experimentation in gaming.
A 2023 blog post by Egor Piskunov, an experienced game developer, offers up a detailed account with eight essential stages. For our purposes, let’s simplify it a bit to four stages:
As noted, experimentation in digital media and interactive entertainment is about learning from consumers, i.e., players. Our conceptualization consolidates earlier consumer-distant stages while giving comparatively more attention to later in-market stages.
To identify the areas where experimentation can drive value at each stage, let’s lean into a concept that captures the strategy set available to companies when interacting with consumers: the marketing mix with its 4P product, place (aka distribution), price, and promotion. By combining our game lifecycle with the marketing mix, we obtain a matrix that we can populate with key tasks involved in bringing a game idea to market:
To get on the same page about experimentation, it exposes different versions of a user experience – from button colors over difficulty adaptation systems and promotional strategies to whole product features – to customers to learn about their reactions. Pre-launch, experimentation tends to be qualitative and occurs on small samples, often without randomization, and with mostly qualitative performance measurement, e.g., through player or expert feedback (is it fun?). During soft launch (i.e., the launch of the game in select test markets), sample sizes grow, randomization becomes a crucial ingredient, and qualitative and quantitative performance measurements tend to both be important. When the game is launched, with live games and large audiences, the emphasis shifts noticeably towards quantitative experimentation with large samples, randomization, and quantitative performance measurement.
So now, let’s use our matrix to highlight key tasks that interact significantly with qualitative (so, the pre-launch type with small samples) and quantitative (so, the launch and post-release type with large samples) experimentation, or both. The following matrix will be the backbone for our further discussion:
Working from left to right in our matrix, the pricing and monetization model you choose for your game (1.6) and the intended growth strategy (1.7) will majorly impact your ability to leverage experimentation. Freemium (F2P, free-to-play) games with strong organic growth tend to attract larger audiences, providing a stronger basis and larger sample sizes for experimentation. Monetization design elements such as lootboxes and high-price IAPs (in-app purchases) that skew spending distributions will impact the variance you face in measuring experimental effects. Similarly, if you plan for a paid growth strategy with high-value players and heavy monetization mechanics, you will likely have a (much) smaller player base, impacting your ability to run quantitative experiments. If you also plan to monetize via ads, you may need to run more experiments to optimize overall monetization. If you publish a premium game that players need to buy upfront, you will mostly care about player engagement and retention metrics (and maybe ad monetization).
A type of experimentation, sometimes called pre-release experimentation, often happens in user testing during the game planning and development phases (1.1, 1.2, and 2.2). This type of experimentation is mostly qualitative and quite different from experimentation and AB testing on end users and at scale. Sample sizes are much smaller. Examples of pre-release experiments involve having test users and experts play different prototypes or inviting mock reviews by game journalists and experts. To the extent that generative AI models are available to emulate in-market player preferences, experiments with prototypes (or later, release candidates in LiveOps, 4.4.) can also be run on AI agents. At a minimum, such experimentation can identify bugs and technical problems.
A typical task in the planning stage involving large-scale experiments is marketability testing (1.3), e.g., of game themes and creative strategies. This involves releasing images and videos on social media or digital advertising campaigns and conducting experiments to gather customer feedback. The advanced ad testing capabilities of today’s large ad platforms can facilitate quantitative experimentation on large samples for this task.
When you decide on the technologies to publish and distribute your game/s (2.3 and 2.4), you should watch for solutions that integrate well with AB testing and experimentation at scale. Developing backend services such that they can flexibly place players into different configurations and experiences from the get-go can save substantial time and cost later on.
As your game develops and approaches launch, you will usually beta test and then soft launch the game in a few test markets before entering worldwide release. Many companies experiment heavily during these steps, both with qualitative and quantitative experiments, to verify important hypotheses on onboarding and FTUE (first-time user experience) and game difficulty and balancing (3.2). It is also the time to either experiment with or define the strategy for personalization in product (3.3) and pricing (3.6). Having clarity on the approach here can help you ready the systems to test DDA (dynamic difficulty adaptation), matching (3.3, 4.2), and price and offer targeting strategies (3.6, 4.8) quickly when you reach scale.
Soft launch can be crucial in assessing PMF (product-market fit) and making launch-or-kill decisions. As sample sizes are small and time is limited during soft launch, you should only experiment with changes that you expect to have a major impact, e.g., drastically different FTUEs, balancing scenarios (3.2), and personalization strategies (3.3, 3.6), and keep the number of different experimental conditions to a minimum. This is not the time to look for precise quantification of treatment effects but to test major assumptions and design differences.
Pre-launch is also the time to get in gear to prepare experiments on key advertising and cross-promotion channels (3.7). Your launch and distribution strategy (3.4), e.g., location and budget, will impact what types of ad experiments you can expect to run. Be aware of the possibilities and limitations here and plan accordingly. These experiments can then validate and calibrate your marketing measurement solutions (3.8), e.g., media and marketing mix models. The experimentation strategy here will usually be developed by the company’s publishing organization, e.g., the marketing or user acquisition team. The product-centered experiment strategy (1.1-3.3), on the other hand, is commonly owned by the studio or game team. Our matrix above can structure conversations and align what team should lead what experimentation use cases – more on this later.
Post-release, as your game’s audience grows, experimentation really kicks into gear. You now have the player numbers (and hence sample sizes) to test hypotheses, design elements, and systems that remained untested pre-launch. Typical applications are level and economy balancing, reward systems, currency endowments and giveaways, and algorithmic systems for adaptation and personalization of the user experience (4.1-4.2). Precise and reliable tracking of player feedback and analytics (4.3) is a crucial ingredient to experimentation in gaming as it makes measurement of outcome metrics and diagnostic implementation checks possible.
Live operations in digital gaming (4.4, LiveOps for short) refer to the continuous release of new content, e.g., new level packs, maps, or game events. LiveOps teams often operate on weekly schedules. Experimentation can be an important tool to inform well-balanced and -tailored LiveOps. However, the time pressure can be prohibitive for rigorous experimentation practices. Measurement windows tend to be short (with weekly events, a maximum of a week by definition), and a bias towards action can make it hard to spell out well-rounded hypotheses and set up well-designed experiments. In my experience, rigorous experiment design and analysis, e.g., enforced by a central platform, is crucially important for reliable decision support despite the pace of operations.
Additionally, I have sometimes observed what can be dubbed the “measure-it-all” fallacy, where LiveOps managers want to understand the impact of different actions in detail, ideally every week. This is a hard-to-achieve ambition and education by game analysts and data scientists is important for realistic expectations. Given the speed of iteration and focus on action, and depending on audience size, only a few quantitative experiments with short measurement windows will be possible for each content release. A structured experimentation roadmap and a knowledge repository summarizing design assumptions and hypotheses to be measured and results can help facilitate learning across event cycles and in the long run. Test one thing at a time and improve the designs you release each week using a learning agenda and knowledge repository. Consider conducting a hackathon with experimentation data scientists and the LiveOps team to set up the learning roadmap. Complement quantitative with qualitative experimentation to fill in gaps where you don’t have a sample size or face other constraints.
Ad monetization (4.5) and price setting and personalization (4.8) are further areas where quantitative experimentation can drive much value by helping find revenue-optimal placements and incentive schedules (discounts or bonuses for the case of price, rewards for the case of incentivized in-game ads). Similarly, experimentation can inform what value localization and personalization initiatives provide (4.7) to decide which ones have merit and which ones do not.
Finally, core marketing efforts such as user acquisition (4.6), advertising and cross-promotion (4.9), and measurement (4.10) can benefit significantly from quantitative experimentation, in particular through precise estimates of marketing effects. These precise estimates can precede major marketing investments and calibrate and validate observational analytics models such as media and marketing mix models (4.10).
For continuous LiveOps decision support (4.4), the analytics team could maintain a similarly calibrated observational model that continuously estimates the effects of different actions. This observational model could be regularly and repeatedly calibrated using highly valid experimental results from the AB tests that the LiveOps team is able to run (see above, helping to “measure-it-all”). Analytics users, so LiveOps managers in this instance, should thereby understand that the observational estimates tend to have a much larger risk of being wrong. This understanding can come from analysts and data scientists' appropriate education of analytics users.
As mentioned in the beginning, success with experimentation in gaming requires carefully bridging the artform of game design and analytical commercial aptitude. Operationally, this means that game design, product management, marketing, and analytics (incl. user research) need to work together smoothly and with clear direction. Game design (and art) develop content and systems, product management packages for the market, marketing gets customers to engage with them, and analytics is essential to all three.
Product management and marketing are usually the business owners of experimentation, and analytics is the technological and methods owner. Our matrix can help you coordinate what areas of experimentation should be owned by product management, marketing, or both and what areas require deeper involvement by analytics and data science:
In this stylized representation of ownership structures, it is product management’s job to loop in game design as appropriate. In my experience, this looping-in of designers can be helpful across the full stages 1, 2, 3, and the product dimension of stage 4. It can even unlock high value for highly analytical exercises, e.g., to leverage the full richness of creative ideation when considering applications of algorithmic matching, targeting, and personalization systems (3.3, 3.6, 4.2, 4.7, 4.8).
The right ownership structure for experimentation also depends on the company's wider publishing model. For example, a large portfolio of hypercasual games will likely mean more control for marketing. A small portfolio with one or two games with deep and complex gameplay, on the other hand, will likely mean more control for product.
If LiveOps is a strong focus and revenue driver, product management and marketing may want to make specific arrangements for collaborating on weekly releases. This can take the form of standing sync meetings or personnel embedded in both teams to ensure the messaging and theme of events and related marketing materials are aligned.
I have often seen success when certain areas involved all three experimentation owners, so product, marketing, and analytics. Those are marketability testing (1.3), tech and service delivery stack (2.3, 2.4), setting the personalization strategy (3.3), LiveOps (4.4), and price and offer personalization in the live game (4.8). Having all three stakeholders participate when you develop a strategy in these areas can help ensure you leverage the right external tools, methods, and treatments.
A key point of contention often arises in price personalization. While marketers want to leverage this important revenue optimization technique, game teams may push back because of fairness and community-related concerns. A safe strategy I’ve often seen in practice is to personalize prices at the country-level within a reasonable corridor (e.g., a maximum factor of two between the cheapest and most expensive pricing) and to use targeted offers for consumer-level personalization from there.
To develop effective offers for targeting in free-to-play games, product management should work with game designers to create bundles that truly drive value for players across different stages of the player journey in the game. Each bundle should come with a nice discount or bonus, ranging in magnitude from 30% to a maximum of 70%. The bundles should cover different price points, after discount/bonus, from $1.99 to $99.99, and even $199.99 for long-tail revenue games. I like to give higher discounts on smaller bundles as it can be hard to make small bundles have a clear value proposition (regarding impact on the user experience). Experimentation can then test different targeting strategies such as skimming, country- and device-based, and RFM-style (recency, frequency, monetary value) personalization. A clear learning agenda synced across product, marketing, and analytics can help identify a close-to-optimal setup for the game step-by-step.
When you operate a live game, quantitative experimentation can truly make a difference. In my experience, it is important to be cognizant of four unique characteristics of gaming that present both challenges and opportunities in experimentation. Let’s dive in:
Challenge: Gaming experiments often involve changes to core gameplay elements or the game’s balancing and economy, which can disrupt players’ experiences. Players may notice differences between test groups, potentially leading to dissatisfaction or even churn if the changes are perceived as negative.
Opportunity: Successful games have hugely engaged audiences with a strong sense of community and pervasive peer and social effects. Engaged audiences make detecting subtle effects of experimental changes easier, enabling tests on nuanced design elements like pacing, narrative delivery, or even audio-visual cues. Matchmaking and messaging between players offer ample opportunity to test various mechanisms and approaches.
What to do:
Challenge: Games can be highly complex systems with many interdependent elements, such as mechanics, narratives, economies, and multiplayer dynamics. A change in one area often affects others, complicating the interpretation of test results. Understanding causal relationships in such intricate systems can be difficult as changes may have unintended ripple or delayed effects, making it challenging to capture their full impact within short time frames. For instance, adjustments to progression mechanics might influence long-term retention rather than immediate behavior.
Opportunity: Games are immersive and self-contained. Players often interact with the game world in ways that can be tightly controlled by developers, reducing external confounding factors. This control can allow for precise isolation of variables in AB tests, such as how changing the spawn rate of rare items impacts player satisfaction. Many games feature in-game economies with virtual currencies, items, and other monetizable assets. These economies mimic real-world economic systems but operate in fully controlled environments, yielding immense amounts of rich data and making them ideal for testing pricing, scarcity effects, or promotional tactics.
What to do:
Challenge: Gaming audiences tend to be highly diverse, with varying skill levels, preferences, and play styles. A change that improves the experience for one segment of players may detract from it for another. Player heterogeneity may skew experiment results, leading to suboptimal decisions for specific segments.
Opportunity: Heterogeneity in play styles, skill, monetization behavior, and engagement patterns calls for personalization. Large amounts of granular and rich data are available from gameplay interactions, in-game purchasing, player progression, and social interactions. This data can be used to precisely measure user behavior and experimental outcomes and personalize experiences.
What to do:
Challenge: New game content and features can lead to pervasive novelty effects, i.e., they excite players just because they are new. Early engagement indicators can then severely mislead long-term retention with the feature or design.
Opportunity: Continuous development and the constant release of new game content and features, e.g., in LiveOps events, create ongoing opportunities for experimentation. Developers can deploy AB tests during these updates to refine mechanics, improve monetization, or enhance user experience.
What to do:
To innovate effectively with experimentation in gaming, it is paramount that you learn over time as a game makes its way to and through the market and across games in your portfolio. Facilitate learning with a knowledge repository, collaborative learning agendas, and smart analytical frameworks. E.g., if you have a larger portfolio of games, experiment with more extreme treatments and disruptive changes in your smaller games and transfer learnings to the large games that generate the bulk of your revenue. This strategy mitigates the risk of upsetting or otherwise harming the communities of the largest games and allows for effective innovation across the portfolio of games.
When it comes to analytical frameworks, game economists have modeling frameworks to design and balance economies for various types of games. An analytical tool I often make use of is what I call the Engagement Engineering framework. This framework draws on a general theory of human motivation, the basic needs concept within self-determination theory. According to this concept, humans need to experience competence, relatedness, and autonomy to become intrinsically motivated for an activity. In leveraging this framework, I assume that strong long-term engagement is most sustainably achieved via intrinsic player motivation. Accordingly, you should strive to maximize players’ experiences of competence, relatedness, and autonomy to obtain the highest levels of engagement and then work to optimize the monetization of this engagement via ads and IAPs.
For a top-down application of the framework, consider using it to develop innovations that drive competence, relatedness, or monetization. Then, check that your ideas do not interfere with players’ autonomy and free choice. Innovations fueling player feelings of competence are strong FTUEs, good tutorials, clearly set and communicated goals, e.g., via a continuous mission system, and systems for dynamic difficulty adaptation. Especially with DDA systems, though, apply the autonomy filter to check that the system will not unfairly impact players or foster addictive behaviors with certain player types (see the first case study here for autonomy-safe design of a DDA system for a puzzle game).
Smart matchmaking, messaging, and other social systems like guilds are innovative ideas that elevate feelings of relatedness. Successful approaches often group players based on skill and engagement patterns (e.g., see the second case study here). Also, use the autonomy component of the Engagement Engineering framework to ensure that you don’t create player teams and groupings that are too similar within and too different between groups. Extreme approaches to matching and matchmaking can harm your player community's long-term cohesion, decreasing long-term engagement.
In monetization, offer personalization can drive a lot of value and, if tuned appropriately, cater to the specific choice situations of players and delight them. If you, however, target low-price offers to players who would like to buy large bundles of in-game goods (or high-price offers to players who would like small bundles), you curtail their autonomy to purchase what they want, leading to unsatisfying experiences.
Similarly, fair and balanced lootboxes that reward players commensurate with the overall pricing of in-game goods can create an engaging and fun monetization system. Lootboxes, however, that trick players into sinking money into them in pursuit of one extremely rare (and largely elusive) high-powered item can harmfully interfere with feelings of autonomy and free choice. Especially players who are prone to overspending on lotteries and gambling may end up with unsatisfactory experiences, harming their long-term engagement. By using such a monetization system, you are essentially at risk of trading a high-value gaming brand and long-term revenue and engagement for short-term revenue spikes.
For a bottom-up application of the framework, pull analytics on how your players spend their time in the game, either overall or segmented by key player types (that can, e.g., be identified via clustering). Then, assess if what you find aligns with what you expect and want. And think of ways how you can effectively increase engagement with the game given player behavior:
There are many valid and useful analytical frameworks to support you in experimentation-based innovation in gaming. For accessible examples, consider Luton (2013), Lehdonvirta and Castronova (2014), and other works by these authors. For more advanced readings, consider the section on “Video game design” on C. T. Ryan’s website, e.g., Li et al. (2022) or Li et al. (2013), with backward and forward citations.
The key is to have a behavioral and conceptual understanding of your players and the systems that bring them together to anticipate where changes will impact them and why. On a technology level, you must have an experimentation platform with a knowledge repository that allows for effective and efficient setup, analysis, and documentation of experiments.
The often long development processes for games and the unique nature of live games – with highly engaged communities, complex systems, diverse audiences, and continuous development – present both unique challenges and opportunities for experimentation. This article equips you with key knowledge to set up a successful experimentation practice. Doing so will enable you to create deeply engaging experiences, adapt to changing player preferences, and drive sustained growth in a highly competitive industry.
Current developments, such as the proliferation of Generative AI, will significantly impact the gaming industry, the game development process, and continuous release cycles in LiveOps. Blending the abilities of Generative AI for content production with smart learning from customers through experimentation is bound to become a distinguishing competitive advantage.
Experimentation is here to stay, and its future in gaming is truly exciting.
Julian Runge is a behavioral economist and data scientist. He is an Assistant Professor of Marketing at Northwestern University’s Medill School of Journalism, Media, Integrated Marketing Communications. Previously, Julian worked as a researcher on game data science and marketing analytics at Northeastern, Duke and Stanford University, and at Facebook. His work has appeared in the proceedings of premier machine learning conferences such as IEEE COG and AAAI AIIDE, and in leading journals such as Information Systems Research and Quantitative Marketing and Economics. Julian is also Principal Scientist at Game Data Pros, an advanced analytics consultancy focused on revenue optimization in games.