Optimizing Agent Behavior in Production with Gideon Mendels

Optimizing Agent Behavior in Production with Gideon Mendels

Author: softwareengineeringdaily.com February 17, 2026 Duration: 52:25
LLM -powered systems continue to move steadily into production, but this process is presenting teams with challenges that traditional software practices don't commonly encounter. Models and agents are non-deterministic systems, which makes it difficult to test changes, reason about failures, and confidently ship updates. This has created the need for new evaluation tooling designed specifically around the properties of LLMs. Comet is a platform with Roots and MLOps, to the rapidly evolving world of agent-based systems by treating prompts, tools, and workflows as optimizable components that can be evaluated and improved over time. Gideon Mendels is the co -founder and CEO of Comet. He previously worked at Google on hate speech and deception detection, and he founded GroupWise, which trained and deployed NLP models processing billions of chats. In this episode, Gideon joins Kevin Ball to discuss how agent development sits between software engineering and ML, why eVals are the missing foundation for most AI teams, prompt optimization as a search problem, and the future for continuously improving agents in production. Full Disclosure: This episode is sponsored by Comet. Kevin Ball or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space.   Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com

For anyone curious about how the code running our world actually gets built, Software Engineering Daily offers a clear and consistent look behind the curtain. This isn't about hype cycles or surface-level news; it's a deep, technical conversation with the engineers, architects, and thinkers who are shaping our digital infrastructure. Each episode focuses on a specific technology, practice, or problem, breaking down complex systems into understandable parts. You'll hear detailed discussions on everything from database architectures and programming language design to the organizational challenges of scaling teams and the real-world trade-offs made in production systems. Hosted by softwareengineeringdaily.com, the podcast serves as a reliable source for developers who want to stay informed and inspired, translating the rapid pace of technological change into substantive, lasting knowledge. It’s for professionals who believe that understanding the "how" and "why" is just as important as knowing the "what." By dedicating time to thorough exploration, this podcast provides context that shorter formats simply cannot, making it an essential resource for anyone building the future, one line of code at a time. Tune in to hear unfiltered insights from the people on the front lines, discussing the tools and decisions that define modern software engineering.
Author: Language: en-us Episodes: 100

Software Engineering Daily
Podcast Episodes
Running Doom in TypeScript with Dimitri Mitropoulos [not-audio_url] [/not-audio_url]

Duration: 1:01:25
Doom has seemingly been ported to every electronic device imaginable, including picture frames, lamps, and coffee machines. The meme of “it runs Doom” has become so widespread that it spawned the r/itrunsdoom sub-Reddit.…
Drone Warfare in Ukraine with Simon Shuster [not-audio_url] [/not-audio_url]

Duration: 55:13
Simon Shuster is a journalist who has reported on Russia and Ukraine for over 15 years, most of that time as a staff correspondent for TIME Magazine. He was born in Moscow, and he and his family came to the United States…
Radix UI with Chance Strickland [not-audio_url] [/not-audio_url]

Duration: 57:56
Radix UI is an open-source library of React components. Its “headless” primitives handle the complex logic and accessibility concerns—like dialogs, dropdowns, and tabs—while leaving styling completely up to the developer…
Building an Open-Source Laptop with Byran Huang [not-audio_url] [/not-audio_url]

Duration: 54:58
Byran Huang is a full stack developer who recently made headlines in the hacker space when he created the anyon_e, which is a highly integrated, open source laptop. The effort was a massive undertaking and showcased grea…
The Architecture of the Internet with Erik Seidel [not-audio_url] [/not-audio_url]

Duration: 51:46
The modern internet is a vast web of independent networks bound together by billions of routing decisions made every second. It’s an architecture so reliable we mostly take it for granted, but behind the scenes it repres…
Building AI Agents on the Frontend with Sam Bhagwat and Abhi Aiyer [not-audio_url] [/not-audio_url]

Duration: 58:08
Most AI agent frameworks are backend-focused and written in Python, which introduces complexity when building full-stack AI applications with JavaScript or TypeScript frontends. This gap makes it harder for frontend deve…
The X-Plane Flight Simulator with Ben Supnik [not-audio_url] [/not-audio_url]

Duration: 57:39
X-Plane is a popular flight simulator developed by Laminar Research. It features a first-principles physics engine, realistic aircraft systems, and a wide variety of aircraft. We wanted to understand the engineering that…
Turning Agent Autonomy into Productivity with Chris Weichel [not-audio_url] [/not-audio_url]

Duration: 1:01:21
A common challenge in software development is creating and maintaining robust development environments. The rise of AI agents has amplified this complexity by adding new demands around permission controls, environment is…