903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

Author: Jon Krohn July 8, 2025 Duration: 1:28:20

Super Data Science: ML & AI Podcast with Jon Krohn

Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models. Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/903⁠⁠⁠⁠ This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (16:48) Sinan’s new podcast, Practically Intelligent (21:54) What to know about the limits of AI benchmarking (53:22) Alternatives to AI benchmarks (1:01:23) The difficulties in getting a model to recognize its mistakes

Super Data Science: ML & AI Podcast with Jon Krohn

Hosted by Dr. Jon Krohn, Super Data Science: ML & AI Podcast with Jon Krohn is a deep and accessible exploration of how artificial intelligence and machine learning are reshaping our world. Each episode features conversations with leading researchers, engineers, and entrepreneurs from both academia and industry, breaking down complex ideas into something tangible and relevant. You'll hear firsthand about emerging techniques, practical applications, and the evolving landscape of data-driven careers. The sheer volume of data in our world is growing at a staggering rate, and this podcast serves as a guide to understanding that expansion and finding your place within it. Rather than offering abstract theory, these discussions focus on real-world impact, from cutting-edge algorithms to the human stories behind major breakthroughs. Tune in for a thoughtful, nuanced look at the tools and trends that are defining the future, all through the lens of experts who are building that future every day. Whether you're actively working in the field or simply curious about the forces driving technological change, this podcast provides a consistent source of insight and inspiration, demystifying the science that is quietly transforming every aspect of our lives.

Author: Jon Krohn Language: English Episodes: 100

Official website RSS

Super Data Science: ML & AI Podcast with Jon Krohn

Podcast Episodes

[not-audio_url]

[/not-audio_url]

976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems

20.03.2026

Duration: 10:12

NVIDIA just dropped Nemotron 3 Super, a 120-billion-parameter open-weight model that only activates 12 billion parameters at a time and it’s built for the agentic AI era. In this Five-Minute Friday, Jon Krohn breaks down…

[not-audio_url]

[/not-audio_url]

975: Unmetered Intelligence is Heralding the Next Renaissance, with Zack Kass

17.03.2026

Duration: 1:12:50

Zack Kass speaks to Jon Krohn about his bestselling, tech-positive book, The Next Renaissance, that charts the rapid progress of humanity and the benefits that artificial intelligence will bring to us, as well as why a f…

[not-audio_url]

[/not-audio_url]

974: When Will The AI Bubble Burst? How Bad Will It Be?

13.03.2026

Duration: 13:56

In this week’s Five-Minute Friday, Jon Krohn holds the AI bubble up to the light. He points to the deep greyzone found in AI startups like Cluely that are established on dubious ideas (Cluely’s tagline was “cheat on ever…

[not-audio_url]

[/not-audio_url]

973: AI Systems Performance Engineering, with Chris Fregly

10.03.2026

Duration: 1:12:10

No one should be manually writing code in 2026, thinks Chris Fregly, Jon Krohn’s guest on this week’s episode. In this interview about Chris’ latest book, AI Systems Performance Engineering, he explains why it’s so impor…

[not-audio_url]

[/not-audio_url]

972: In Case You Missed It in February 2026

06.03.2026

Duration: 26:44

Jon Krohn recaps the month of February in this episode of In Case You Missed It. Across four interviews with Will Falcon (Episode 965), Tom Griffiths (Episode 969), Antje Barth (Episode 963), and Praveen Murugesan (Episo…

[not-audio_url]

[/not-audio_url]

971: 90% of The World’s Data is Private; Lin Qiao’s Fireworks AI is Unlocking It

03.03.2026

Duration: 59:47

Lin Qiao, CEO of Fireworks AI, talks to Jon Krohn about how she builds effective models quickly, why coding agents can perform at the level of a junior engineer, and what she attributes to the success of Fireworks AI: Tr…

[not-audio_url]

[/not-audio_url]

970: The “100x Engineer”: How to Be One, But Should You?

27.02.2026

Duration: 14:37

Working with code-gen models and Claude Code: In this Five-Minute Friday, Jon Krohn addresses how AI superstars like Andrej Karpathy are using AI agents in their coding work, the outlook for code-gen in 2026, and how you…

[not-audio_url]

[/not-audio_url]

969: The Laws of Thought: The Math of Minds and Machines, with Prof. Tom Griffiths

24.02.2026

Duration: 1:11:15

Princeton Professor Tom Griffiths talks to Jon Krohn about his new book, The Laws of Thought, which grapples with the mathematical models behind biological and artificial intelligence, and what makes the human brain so f…

[not-audio_url]

[/not-audio_url]

968: Is AI Automating Away All Coding Jobs?

20.02.2026

Duration: 14:56

Now that AI agents can develop new apps from product development to delivery, do AI developers have reason to worry about their careers? Podcast host Jon Krohn addresses the stark predictions that AI could “eliminate hal…

[not-audio_url]

[/not-audio_url]

967: AI for the Physical World, with Samsara's Praveen Murugesan

17.02.2026

Duration: 55:10

VP of Engineering at Samsara Praveen Murugesan talks to Jon Krohn about processing 20 trillion data points covering 90 billion miles across private and public sectors, how the company helps truckers who operate long hour…