903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

Author: Jon Krohn July 8, 2025 Duration: 1:28:20

Super Data Science: ML & AI Podcast with Jon Krohn

Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models. Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/903⁠⁠⁠⁠ This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (16:48) Sinan’s new podcast, Practically Intelligent (21:54) What to know about the limits of AI benchmarking (53:22) Alternatives to AI benchmarks (1:01:23) The difficulties in getting a model to recognize its mistakes

Super Data Science: ML & AI Podcast with Jon Krohn

Hosted by Dr. Jon Krohn, Super Data Science: ML & AI Podcast with Jon Krohn is a deep and accessible exploration of how artificial intelligence and machine learning are reshaping our world. Each episode features conversations with leading researchers, engineers, and entrepreneurs from both academia and industry, breaking down complex ideas into something tangible and relevant. You'll hear firsthand about emerging techniques, practical applications, and the evolving landscape of data-driven careers. The sheer volume of data in our world is growing at a staggering rate, and this podcast serves as a guide to understanding that expansion and finding your place within it. Rather than offering abstract theory, these discussions focus on real-world impact, from cutting-edge algorithms to the human stories behind major breakthroughs. Tune in for a thoughtful, nuanced look at the tools and trends that are defining the future, all through the lens of experts who are building that future every day. Whether you're actively working in the field or simply curious about the forces driving technological change, this podcast provides a consistent source of insight and inspiration, demystifying the science that is quietly transforming every aspect of our lives.

Author: Jon Krohn Language: English Episodes: 100

Official website RSS

Super Data Science: ML & AI Podcast with Jon Krohn

Podcast Episodes

[not-audio_url]

[/not-audio_url]

966: The Moltbook Phenomenon: OpenClaw Unleashed

13.02.2026

Duration: 10:13

Jon Krohn gives Five-Minute Friday listeners all the details about the new social network causing a stir, Moltbook. What makes Moltbook so unique is that this is the first network designed just for AI agents. It’s an exc…

[not-audio_url]

[/not-audio_url]

965: From PhD Side Project to $500M ARR: Will Falcon’s PyTorch Lightning Story

10.02.2026

Duration: 1:17:11

CEO of Lightning AI Will Falcon speaks to podcast host and Lightning AI fellow Jon Krohn about the company’s merger with Voltage Park, and why Will has named it the “full-stack AI neo-cloud for enterprises and frontier l…

[not-audio_url]

[/not-audio_url]

964: In Case You Missed It in January 2026

06.02.2026

Duration: 25:20

In this first of the year ICYMI episode, Jon Krohn selects his favorite moments from January’s SuperDataScience interviews. Listen to why incentivizing workers is the best way to get them to disclose their use of AI tool…

[not-audio_url]

[/not-audio_url]

963: Reinforcement Learning for Agents, with Amazon AGI Labs’ Antje Barth

03.02.2026

Duration: 51:09

Bestselling author and Gen AI instructor Antje Barth talks to Jon Krohn about her work at Amazon’s AGI Labs and their newest product Nova Act, as well as where we will see the most success with AI agents and how AI devel…

[not-audio_url]

[/not-audio_url]

962: Wharton Prof Ethan Mollick on Why Your AI Strategy Is Already Obsolete

30.01.2026

Duration: 12:23

Bestselling author of Co-Intelligence: Living and Working with AI Ethan Mollick speaks to Jon Krohn about just how much US firms have to gain from a willingness to adopt and experiment with AI, as well as the reality beh…

[not-audio_url]

[/not-audio_url]

961: Distributed Artificial Superintelligence, with Dr. Vijoy Pandey

27.01.2026

Duration: 1:09:24

Dr. Vijoy Pandey returns to the show to talk to Jon Krohn about Cisco’s work to advance medicine and mitigate the impact of climate change with distributed artificial super-intelligence. Dr. Vijoy Pandey believes in a fu…

[not-audio_url]

[/not-audio_url]

960: In Case You Missed It in December 2025

23.01.2026

Duration: 40:45

For 2026’s first episode of In Case You Missed It (ICYMI), Jon Krohn selects 6 clips from December for a wide-ranging look at the current state of AI in business and beyond. Hear from Joel Beasley (Episode 945), Jeff Li…

[not-audio_url]

[/not-audio_url]

959: Building Agents 101: Design Patterns, Evals and Optimization (with Sinan Ozdemir)

20.01.2026

Duration: 1:04:49

AI entrepreneur and bestselling author Sinan Ozdemir speaks to Jon Krohn about the practical differences between agentic AI and AI workflows, why evaluating accuracy on its own won’t tell you enough about AI models, and…

[not-audio_url]

[/not-audio_url]

958: Without Trusted Context, Agents are Stupid (featuring Salesforce’s Rahul Auradkar)

16.01.2026

Duration: 23:59

In this #sponsored Feature Friday episode, Salesforce’s Rahul Auradkar speaks to Jon Krohn about the company’s unified data engine and how its acquisition of Informatica provides the missing context layer for AI models a…

[not-audio_url]

[/not-audio_url]

957: How AI Agents Are Automating Enterprise Data Operations, with Ashwin Rajeeva

13.01.2026

Duration: 59:36

AI agents, data lakes, and managing data sprawl: Ashwin Rajeeva, cofounder and CTO of Acceldata, speaks to Jon Krohn about how the agentic data management startup raised over $100 million in venture capital to expand its…