Are Evals Dead?

Are Evals Dead?

Author: Demetrios September 26, 2025 Duration: 25:24

AI Conversations Powered by Prosus Group 


Your AI agent isn’t failing because it’s dumb—it’s failing because you refuse to test it. Chiara Caratelli cuts through the hype to show why evaluations—not bigger models or fancier prompts—decide whether agents succeed in the real world. If you’re not stress-testing, simulating, and iterating on failures, you’re not building AI—you’re shipping experiments disguised as products.


Guest speaker: Chiara Caratelli - Data Scientist @ Prosus Group

Host: Demetrios Brinkmann - Founder of MLOps Community


~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

Sign up for the next meetup: [https://go.mlops.community/register]

MLOps Swag/Merch: [https://shop.mlops.community/]


Hosted by Demetrios, MLOps.community is a space for honest, meandering talks about the real work of making artificial intelligence systems actually work. This isn't about hype or theoretical papers; it's about the messy, practical, and often surprising journey of taking models from a notebook into a live environment. You'll hear from engineers and practitioners who are in the trenches, discussing the tools, the frustrations, and the occasional breakthroughs that define the day-to-day. The conversations are deliberately relaxed, covering everything from traditional machine learning pipelines to the new world of large language models and even the intangible "vibes" of team culture and process. Each episode peels back a layer on what "production" really means, whether that involves deploying a predictive service, managing an agentic system, or maintaining reliability as everything scales. Tuning into this podcast feels like grabbing a coffee with colleagues who aren't afraid to dig into the technical nitty-gritty while keeping the tone conversational and accessible. It's for anyone who builds, manages, or is just curious about the operational backbone that allows AI to deliver value, offering a grounded perspective often missing from the broader conversation.
Author: Language: en-us Episodes: 100

MLOps.community
Podcast Episodes
Real time features, AI search, Agentic similarities [not-audio_url] [/not-audio_url]

Duration: 29:27
Varant Zanoyan is the Co-founder & CEO at Zipline AI, working on building a next-generation AI/ML infrastructure platform that streamlines data pipelines, model deployment, observability, and governance to accelerate ent…
Tool definitions are the new Prompt Engineering [not-audio_url] [/not-audio_url]

Duration: 58:08
Alex Salazar is the CEO and Co-Founder of Arcade.dev, working on secure AI agents and real-world automation integrations.Chiara Caratelli is a Data Scientist at Prosus Group, working on AI agents, web automation, and eva…
The Future of AI Agents is Sandboxed [not-audio_url] [/not-audio_url]

Duration: 58:03
Jonathan Wall is the CEO at Runloop.ai, working on enterprise-grade infrastructure and execution environments for AI coding agents.The Future of AI Agents is Sandboxed // MLOps Podcast #353 with Jonathan Wall, CEO at Run…
Does AgenticRAG Really Work? [not-audio_url] [/not-audio_url]

Duration: 1:01:39
Satish Bhambri is a Sr Data Scientist at Walmart Labs, working on large-scale recommendation systems and conversational AI, including RAG-powered GroceryBot agents, vector-search personalization, and transformer-based ad…
How Sierra AI Does Context Engineering [not-audio_url] [/not-audio_url]

Duration: 1:04:03
Zack Reneau-Wedeen is the Head of Product at Sierra, leading the development of enterprise-ready AI agents — from Agent Studio 2.0 to the Agent Data Platform — with a focus on richer workflows, persistent memory, and hig…
Building Cursor: A Fireside Chat with VP Solutions Ricky Doar [not-audio_url] [/not-audio_url]

Duration: 26:44
Ricky Doar is the VP of Solutions at Cursor, where he leads forward-deployed engineers. A seasoned product and technical leader with over a decade of experience in developer tools and data platforms, Ricky previously ser…