Goldbelly transforms experimentation culture with the transparency and rigor of Eppo

About

Goldbelly is an online delivery platform for specialty food makers and artisans to sell and distribute their goods to audiences across the United States. With Goldbelly, authentic Chicago deep dish pizza can be sent to hungry folks in San Francisco. Sourdough bread in San Francisco can be bought and shipped to watering mouths in Dallas, Texas.

The Problem

We used another full-stack experimentation service in the past, sending them our raw data to analyze experiments in their backend. But the results were questionable. We saw major discrepancies between their results and our calculations. Our leadership team started questioning our work and we quickly realized this fallibility would not convince the leadership team of the benefits of AB testing.

Mun Kim
ML Engineering Manager at Goldbelly

Mum Kim joined Goldbelly as the first Machine Learning Engineer, focusing on Search, Recommendation and Discovery. At the onset, he understood Machine Learning was a new practice within the company and he would need to prove the worth of ML models and systems to the rest of the company. To do this he turned to A/B testing.

Trust Deficit

Mun and his team tried using an experimentation service initially but it backfired: “We used another full-stack experimentation service in the past, sending them our raw data to analyze experiments in their backend. But the results were questionable. We saw major discrepancies between their results and our calculations. Our leadership team started questioning our work and we quickly realized this fallibility would not convince the leadership team of the benefits of AB testing.”

With each experiment they ran, it took Mun and his team hours to recalculate the metrics and achieve internal consensus. This taxing process was the only way the stakeholders could trust the interpretation of every experiment.

Mun knew they needed something different. His frustration surfaced in conversations with colleagues and the prospect of faster, more trustworthy experiments, led him to Eppo.

Right, From the Start

The value of Eppo was immediately obvious: “Integration and setup was done within an hour and we were able to run initial experiments the same day.”

Being warehouse-native allowed Eppo to directly query Snowflake source-of-truth tables and calculate results, bringing transparency to a process that was previously a black box. Anyone with access to Snowflake could now audit the data and associated metrics at every step. “Since Eppo wasn’t egressing data out of Snowflake, it was easy to alleviate our security and compliance concerns as well.”

Additionally, Eppo’s Randomization SDK allowed Goldbelly to streamline their entire experimentation workflow end-to-end. They launch experiments, randomize users into each experiment, and drive traffic to the winning variant, all within Eppo.

Renewed Confidence in Experimentation

The discrepancies between internal data and experiment results disappeared with Eppo, allowing Mun and his team to save hours not needing to recalculate experiment results internally. Eppo also allowed various stakeholders at Goldbelly to build trust in the data and has given them a lot of more assurance about their metrics. The ML team now focuses its attention on shipping ML models and system changes faster, instead of internal consensus building.

Mun specifically credits Goldbelly’s renewed confidence in experimentation to Eppo’s stats engine and intuitive front-end: “My team wants to share the most trustworthy analysis with stakeholders and Eppo’s rigorous stats engine provides us just that. It’s like having a statistician on the team. I also find Eppo’s interface intuitive where stakeholders can easily check confidence bounds and other details around the experiment”.

The Path Forward

With a solid foundation to build on, Mun looks forward to continuing to develop a culture of experimentation at Goldbelly. His immediate plan is to focus on increasing experiment velocity by using Eppo’s CUPED feature across all experiments. He estimates it could decrease experiment runtimes by 10 - 20%.

Our team is very happy with the way we're running experiments with Eppo across front-end, back-end and Machine Learning use cases. Our business stakeholders now utilize the experiment results without questioning it, and our data team can self-serve using Eppo’s intuitive interface. I'm confident that Eppo is going to be the leader in the experimentation and product analytics space.

Mun Kim
ML Engineering Manager at Goldbelly