AI Evaluations Crash Course in 50 Minutes (2025) | Hamel Husain

AI Evaluations Crash Course in 50 Minutes (2025) | Hamel Husain

Author: Peter Yang September 28, 2025 Duration: 52:29

Today, I want to share a new episode with Hamel Husain.


Hamel has trained 2,000+ PMs and engineers from companies like OpenAI, Anthropic, and Google on how to run AI evals. In my new episode, he shares a free master class on how to build evals for a real AI agent in just 50 minutes using a simple spreadsheet. I learned a lot from Hamel and I think you will too.


Hamel and I talked about:

(00:00) What the most valuable part of evals is

(01:25) Live walkthrough: Analyzing 100 real production traces

(09:50) Creating the eval criteria using a simple spreadsheet

(24:44) Why binary pass/fail ratings beat 1-5 scores every time

(28:52) The agreement metric trap that fools most PMs

(30:08) True positive and negative rates explained

(36:00) How to set up continuous evals in production


Get the takeaways: https://creatoreconomy.so/p/ai-evaluations-crash-course-in-50-minutes-hamel-husain


Where to find Hamel:

X: https://x.com/HamelHusain

Website: https://hamel.dev/


📌 Subscribe to this channel – more interviews coming soon!


For anyone building the future, Behind the Craft is a conversation with Peter Yang that moves beyond theory and into the tangible details of creation. This podcast lives in the messy, rewarding space where ideas become real products. Each episode is built on candid interviews with experts who have been in the trenches, dissecting the pivotal decisions, the unexpected hurdles, and the hard-won lessons that rarely make it into a polished case study. You’ll hear the unvarnished stories behind the features and companies shaping our world, focusing on the practical frameworks and mental models that effective product leaders and creators rely on daily. It’s about understanding the craft from the inside out-the strategic shifts, the team dynamics, the user insights that truly move the needle. Tune in for a direct, no-fluff dialogue designed to accelerate your own journey, providing actionable guidance you can apply immediately to level up your own work. This is where the blueprint meets the build.
Author: Language: English Episodes: 100

Behind the Craft
Podcast Episodes
Master Google AI Studio in 40 Minutes | Logan Kilpatrick [not-audio_url] [/not-audio_url]

Duration: 39:17
Logan is the Product Lead for Google AI Studio. I got him to give us an inside look at how he uses AI Studio to build AI Studio (very meta) and how his team ships at startup speed inside Google. AI Studio is the feature…
Build an AI Analyst with Claude Code in 50 Min | Sumeet Marwaha [not-audio_url] [/not-audio_url]

Duration: 51:47
Sumeet is the Head of Data at Brex and came personally recommended by the Claude Code team. In our episode, he showed me how to use Claude Code to build a data explorer that lets anyone ask questions and get insights wit…
A Founder's Playbook for Shipping 10x Faster with AI | Yana Welinder [not-audio_url] [/not-audio_url]

Duration: 41:53
Yana is Head of AI at Amplitude and a good friend. We had some real talk about how to stay scrappy inside a big company, including how to avoid decision by committee and endless internal debates. Yana also demoed her fav…
A Proven 5-Step System to Prototype Apps with AI | Xinran Ma [not-audio_url] [/not-audio_url]

Duration: 38:14
Xinran is a top AI instructor who has taught 100s of product teams how to design and prototype with AI. He shared 5 practical techniques to avoid generating AI slop — from side-by-side exploration to reverse prompting to…