SE Radio 703: Sahaj Garg on Low Latency AI

SE Radio 703: Sahaj Garg on Low Latency AI

Author: team@se-radio.net (SE-Radio Team) January 14, 2026 Duration: 54:47

In this episode, Sahaj Garg, CTO of wispr.ai, joins SE Radio host Robert Blumen to talk about the challenges of building low-latency AI applications. They discuss latency's effect on consumer behavior as well as interactive applications. The conversation explores how to measure latency and how scale impacts it. Then Sahaj and Robert shift to themes around AI, including whether "AI" means LLMs or something broader, as they look at latency requirements and challenges around subtypes of AI applications. The final part of the episode explores techniques for managing latency in AI: speed vs accuracy trade-offs; speed vs cost; latency vs cost; choosing the right model; reducing quantization; distillation; and guessing + validating.

Brought to you by IEEE Computer Society and IEEE Software magazine.


For developers who build the world's most critical systems, Software Engineering Radio offers deep, substantive conversations that move beyond the hype cycle. This isn't about quick tips or news flashes; it's a dedicated audio library for career engineers seeking to solidify their foundational knowledge and explore advanced concepts. Each episode is crafted as an enduring resource, featuring either a comprehensive tutorial breaking down a specific technology or methodology, or a detailed interview with a leading practitioner shaping the field. You'll hear focused discussions on everything from low-level systems architecture and programming language design to team dynamics and project management, all through the lens of professional software creation. The content is exclusively produced for this podcast, ensuring thoughtful, in-depth analysis you won't find simply repackaged from conference talks. If your work demands a rigorous understanding of the craft, this is the podcast for you.
Author: Language: en-us Episodes: 100

Software Engineering Radio - the podcast for professional software developers
Podcast Episodes
SE Radio 686: François Daoust on W3C [not-audio_url] [/not-audio_url]

Duration: 1:02:36
François Daoust, W3C staff member and co-chair of the Web Developer Experience Community Group, discusses the origins of the W3C, the browser standardization process, and how it relates to other organizations like TC39,…
SE Radio 685: Will Wilson on Deterministic Simulation Testing [not-audio_url] [/not-audio_url]

Duration: 1:01:14
In this episode, Will Wilson, CEO and co-founder of Antithesis, explores Deterministic Simulation Testing (DST) with host Sriram Panyam. Wilson was part of the pioneering team at FoundationDB that developed this revoluti…
SE Radio 679: Wesley Beary on API Design [not-audio_url] [/not-audio_url]

Duration: 47:51
Wesley Beary of Anchor speaks with host Sam Taggart about designing APIs with a particular emphasis on user experience. Wesley discusses what it means to be an "API connoisseur"— paying attention to what makes the APIs w…
SE Radio 678: Chris Love on Kubernetes Security [not-audio_url] [/not-audio_url]

Duration: 54:36
Chris Love, co-author of the book Core Kubernetes, joins host Robert Blumen for a conversation about kubernetes security. Chris identifies the node layer, secrets management, the network layer, contains, and pods as the…