Culture
How We Ran Successful Experimentation Office Hours at Groupon
Ideas for training and nurturing experimenters across the company
Learn more
As part of Eppo’s new interview series with data leaders, we sat down with John Sears, who has built a tech career leading Data Science teams at Uber and Motive, but has also worked on quantitative strategy across the world of sports.
In this conversation, John unpacks the specific thrills and challenges of developing predictive data models in the front offices of the NBA and MLB.
Can you introduce yourself, what you do, and how you got there?
I'm a special assistant working with the Los Angeles Dodgers on quantitative modeling.
My journey here has been a bit roundabout. After beginning my career as a software engineer doing mostly internal-facing applications at HBO, I ended up taking a job in Washington, DC as a research assistant for the Federal Reserve Board during the financial crisis. My work involved modeling financial derivatives – things like options, futures prices, swaps – to help the Fed understand the consequences of their actions, and especially how people were anticipating their future policy decisions.
Working with Fed economists, I learned the basics of statistics and modeling, and it made me realize there's a super-fascinating field out there that I just didn't know about in undergrad.
I started thinking about where to train that focus in the future. I wasn't sold on pursuing a career on Wall Street. I loved working at the Fed, but the path forward would probably involve getting a finance PhD or going to a big bank to work as a quant, and neither of those seemed enticing.
So I started looking around at more creative ways to apply those modeling skills. All my life I’ve been passionate about sports, and specifically player personnel stuff, like making strategic decisions around trades and free agency.
My favorite sport was basketball. And there was one team doing this sort of work at a very high level: the Houston Rockets. So I just sent in a cold resume and got lucky. They picked it out of a big stack and started talking to me, and I went through their interview process and realized that this really could be a viable career.
And then I did something a little controversial and unexpected. I actually withdrew from the process and enrolled in grad school for a couple of years. I realized that I needed more training to really do this work at a high level.
I went to Stanford for a two-year Masters in Operations Research, which is the study of decision-making under uncertainty. While I was there, I got connected with the Philadelphia 76ers, who were pursuing a really ambitious approach based on evidence-driven decision-making. I worked for them part-time while I was in grad school and then full-time once I got out.
Since then, I’ve gone back and forth between industry and sports. Even while working in industry, I've been able to maintain a part-time foot in the door of the professional sports world, doing stuff on the side. I eventually became VP of Basketball Analytics for the Minnesota Timberwolves, where I led the engineering, data science and analytics departments. And then I started with the Dodgers in October of last year.
Is it fair to assume your job is just like “Moneyball?”
I’ve never actually seen the movie “Moneyball.” But my job involves a lot of the same things. It’s not dissimilar from the world of investing. You need to find the players that are undervalued by the market, who are underappreciated for what they can do, and find ways to acquire those players that other teams haven’t figured out yet.
Back in the “Moneyball” days, it was a little more straightforward, because there wasn't a lot of understanding of how to interpret the data behind sports. It was very heavily skewed towards the ability to infer talent based on how players look performing. And that's a hugely important piece, but it’s not the full picture.
I think there’s a misconception that there’s a stats vs. eye-test conflict going on all the time with sports teams. At this point, it’s not really the case anymore. It’s really just a search for the best information.
Nowadays, the advantage gained by using the statistics that they used in “Moneyball” (I did read the book!) has been gobbled up by market efficiency, and the cutting edges are way further out into the harder-to-find regions of stats.
In today’s environment, we’re doing a lot of the same things, but it takes way more turns of the crank to find those super-valuable insights. People have figured out how to take advantage of most of the insights that used to exist.
Now it’s things like swing trajectories or pitcher windups. The arc of the ball off a shooting guard’s hand on certain types of shots. You have to look a lot harder, but in the end it’s the same scientific method process: you generate hypotheses, you test them in the data, and if you think it could really be something, you run a test in the field.
How do you work with scouts? What types of information are teams looking to develop about players?
We work really closely with scouts. With the Timberwolves, we built draft models to try to forecast future player performance. And then we get into what’s effectively a reconciliation process, where we have a series of models for every player that includes forecasts of their future abilities. And then you have a bunch of scouts who have scouted them in person, have done reference calls, generated all that intel.
We sit in a room and we debate it, and typically there'll be a moderator keeping order. When there's a ton of agreement, we can move quickly.
And when there's disagreement, then we kind of get into it. Then it really comes down to evidence. If a scout rated him as a X on jump shooting, and on the stats side we have him as a Y, what’s the evidence for each case?
It’s a process that takes about a month of sitting in a conference room. But in the end, we hope to end up with a fairly unified view that internalizes all the possible information that we've had access to.
And sometimes the result of that is that we need to go back and get more information. For example, can we look at guys with a similar build or style of play? We would do a deep dive, and the scouts would go back, make more calls, watch more film, and then we'd circle back the next day and try to hammer it home.
"The thing that excites me about sports isn’t that it’s random or predictable. It’s that human beings can excel beyond our expectations."
When assessing players on the basketball side, we’re more focused on things that lead to winning, rather than just the individual accumulation of statistics. Points and rebounds don’t have a ton of correlation to future outcomes, but things like steals and getting assists historically do.
You run into interesting situations. For example, Syracuse has famously always run this 2-3 zone defense, and their wings generate a lot of steals, but the predictive relationship breaks down because it’s not the same type of environment that most other college prospects are operating in.
So you have to think about these domain niches that can cause your predictive model to misperform.
Is baseball a more statistically "pure" sport than basketball?
Basketball and baseball are definitely different, from a data science perspective. There’s certainly been more progress in quantitative strategy on the baseball side. There’s more low-hanging fruit, and teams have been able to develop winning strategies based off of it pretty quickly.
The staffs in baseball can be 5x larger than in basketball, and to a degree it’s a simpler set of problems to solve. It’s a series of one-on-one interactions for the most part: pitcher vs. batter, batted ball vs. fielder, runner vs. catcher, etc.
Whereas in basketball, it’s so fluid – it’s continuous space, continuous time, and players can do whatever they want on the court. There aren’t defined positions, batting order, etc. And so it’s been easier to find market-moving opportunities in baseball. In basketball, there’s no lack of opportunity to find strategically valuable insights, but it’s just harder.
So if you’re motivated by that kind of stuff, I think they both offer tremendously challenging problems to solve that can have a very important influence on the teams’ outcomes over time.
Do predictive data models take the fun out of sports? Can you be a fan and a data scientist?
In the end, the thing that excites me about sports isn’t that it’s random or predictable. It’s that human beings can excel beyond our expectations.
I remember the 2021 NBA Finals when Giannis Antetokounmpo just went absolutely crazy in the elimination game. He blocked five shots, he was dominant offensively, he made great reads passing-wise… He was a two-time MVP and I still didn’t know he could do that. I’m getting goosebumps thinking about it.
To me, those are the magical moments of sports. And I don’t think modeling has any influence on the ability to appreciate that.
How is a sports data org different from a tech data org?
Sports teams are loosely analogous to a Series B or Series C startup, with an enterprise value of roughly $1-4 billion. Typically, there’s not a Product team. Maybe there will be one or two Data Engineers, a Software Engineer or two, some folks who build up predictive models, and others who conduct ad hoc analysis. It’s not unlike a typical analytics team at a tech company. The main difference is that there’s not so much emphasis on the product; it’s really about the insights generated.
Do you ever run experiments?
An experiment in our world is called a game. The difference between sports and tech is that oftentimes, even a medium-sized tech company will get millions of bites at the apple every day to run an experiment, for something like landing-page optimization or a new product feature.
In our world, we get 82 or 162 games per year. So n is just so much smaller. Our tests are highly analog, and they require making a significant change. So maybe we start defending this type of pick-and-roll action in a different way, or we change our substitution patterns. Those are not easy decisions, and frankly, the coach has ownership over those. We can give advice there, but we’ll never be able to just A/B test “who does better, this guy or this guy?”
What’s in the Dodgers’ data stack?
The Dodgers do have quite a modern data stack – everything is in the cloud, neatly orchestrated. I can’t get into the specifics, but it’s the kind of stuff you would expect to see in a well-run Series C startup, where there’s been a data team in place for several years.
You’re also an investor with the InvestInData collective. What's most exciting about the data space right now?
It’s fun again. Back when I was at Uber, we would run Hive queries that would take hours. And before that, I spent a summer interning at Amazon, writing big Hadoop map-reduce jobs in Java to try to extract data. Everything has just gotten a lot nicer. It’s crazy to see the proliferation of startups that are each tackling important chunks of the Data Scientist’s day-to-day.
I’m also a very firm believer in warehouse-native. Eppo is really leading that charge on the A/B testing side, and Hightouch is really disrupting CDPs.
The era of balkanized data, where you have all these third-party places where your data lands, and then you have to bring it back into your warehouse… that’s really hard to manage. It’s too easy for the system to get out of sync.
Snowflake has really changed the game by making a super-performant data warehouse that’s fast enough to power all these tools under the hood.
"We're hungry for talented people who are excited about this work. It is a hard career, but to get started you can still just apply."
What advice would you offer someone looking to work with sports and data?
There’s still a misconception that sports is a particularly hard industry to get into.
When I was a college student, I started a sabermetrics club at my undergrad, but didn't even seriously consider making a career out of it. I thought that sports was such a small world that only people with connections would be able to access.
In fact, now having been on the other side, scaling sports organizations is the opposite. We're hungry for talented people who are excited about this work. It is a hard career, but to get started you can still just apply.
We look at every single resume that comes in. And if you have the skills that we're looking for – interest in data, ability to work independently, willingness to make hard judgments about nebulous data – we'll give you a call.