903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

Author: Jon Krohn July 8, 2025 Duration: 1:28:20
Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models. Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/903⁠⁠⁠⁠ This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: (16:48) Sinan’s new podcast, Practically Intelligent (21:54) What to know about the limits of AI benchmarking (53:22) Alternatives to AI benchmarks (1:01:23) The difficulties in getting a model to recognize its mistakes

Hosted by Dr. Jon Krohn, Super Data Science: ML & AI Podcast with Jon Krohn is a deep and accessible exploration of how artificial intelligence and machine learning are reshaping our world. Each episode features conversations with leading researchers, engineers, and entrepreneurs from both academia and industry, breaking down complex ideas into something tangible and relevant. You'll hear firsthand about emerging techniques, practical applications, and the evolving landscape of data-driven careers. The sheer volume of data in our world is growing at a staggering rate, and this podcast serves as a guide to understanding that expansion and finding your place within it. Rather than offering abstract theory, these discussions focus on real-world impact, from cutting-edge algorithms to the human stories behind major breakthroughs. Tune in for a thoughtful, nuanced look at the tools and trends that are defining the future, all through the lens of experts who are building that future every day. Whether you're actively working in the field or simply curious about the forces driving technological change, this podcast provides a consistent source of insight and inspiration, demystifying the science that is quietly transforming every aspect of our lives.
Author: Language: English Episodes: 100

Super Data Science: ML & AI Podcast with Jon Krohn
Podcast Episodes
966: The Moltbook Phenomenon: OpenClaw Unleashed [not-audio_url] [/not-audio_url]

Duration: 10:13
Jon Krohn gives Five-Minute Friday listeners all the details about the new social network causing a stir, Moltbook. What makes Moltbook so unique is that this is the first network designed just for AI agents. It’s an exc…
964: In Case You Missed It in January 2026 [not-audio_url] [/not-audio_url]

Duration: 25:20
In this first of the year ICYMI episode, Jon Krohn selects his favorite moments from January’s SuperDataScience interviews. Listen to why incentivizing workers is the best way to get them to disclose their use of AI tool…
961: Distributed Artificial Superintelligence, with Dr. Vijoy Pandey [not-audio_url] [/not-audio_url]

Duration: 1:09:24
Dr. Vijoy Pandey returns to the show to talk to Jon Krohn about Cisco’s work to advance medicine and mitigate the impact of climate change with distributed artificial super-intelligence. Dr. Vijoy Pandey believes in a fu…
960: In Case You Missed It in December 2025 [not-audio_url] [/not-audio_url]

Duration: 40:45
For 2026’s first episode of In Case You Missed It (ICYMI), Jon Krohn selects 6 clips from December for a wide-ranging look at the current state of AI in business and beyond. Hear from Joel Beasley (Episode 945), Jeff Li…