On-Device AI Agents in Production: Privacy, Performance, and Scale // Varun Khare & Neeraj Poddar // #340

Author: Demetrios September 30, 2025 Duration: 46:10

Technology

On-Device AI Agents in Production: Privacy, Performance, and Scale // MLOps Podcast #340 with NimbleEdge's Varun Khare, Founder/CEO and Neeraj Poddar, Co-founder & CTO.

Join the Community:

https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

// Abstract

AI agents are transitioning from experimental stages to performing real work in production; however, they have largely been limited to backend task automation. A critical frontier in this evolution is the on-device AI agent, enabling sophisticated, AI-native experiences directly on mobile and embedded devices. While cloud-based AI faces challenges like constant connectivity demands, increased latency, privacy risks, and high operational costs, on-device breaks through these trade-offs.

We'll delve into the practical side of building and deploying AI agents with “DeliteAI”, an open-source on-device AI agentic framework. We'll explore how lightweight Python runtimes facilitate the seamless orchestration of end-to-end workflows directly on devices, allowing AI/ML teams to define data preprocessing, feature computation, model execution, and post-processing logic independently of frontend code. This architecture empowers agents to adapt to varying tasks and user contexts through an ecosystem of tools natively supported on Android/iOS platforms, handling all the permissions, model lifecycles, and many more.

// Bio

Varun Khare

Varun is the Founder and CEO of NimbleEdge, an AI startup pioneering privacy-first, on-device intelligence. With an academic foundation in AI and neuroscience from UC Berkeley, MPI Frankfurt, and IIT Kanpur, Varun brings deep expertise at the intersection of technology and science. Before founding NimbleEdge, Varun led open-source projects at OpenMined, focusing on privacy-aware AI, and published research in computer vision.

Neeraj Poddar

Neeraj Poddar is the Co-founder and CTO at NimbleEdge. Prior to NimbleEdge, he was the Co-founder of Aspen Mesh, VP of Engineering at Solo.io, and led the Istio open source community. He has worked on various aspects of AI, networking, security, and distributed systems over the span of his career. Neeraj focuses on the application of open source technologies across different industries in terms of scalability and security. When not working on AI, you can find him playing racquetball and gaining back the calories spent playing by trying out new restaurants.

// Related Links

Website: https://www.nimbleedge.com/

https://www.nimbleedge.com/blog/why-ai-is-not-working-for-you

https://www.nimbleedge.com/blog/state-of-on-device-ai

https://www.youtube.com/watch?v=Qqj_Nl2MihE

https://www.linkedin.com/events/7343237917982527488/comments/

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Varun on LinkedIn: /vkkhare/

Connect with Neeraj on LinkedIn: /nrjpoddar/

Timestamps:

[00:00] On-device AI skepticism

[02:47] Word suggestion for AI

[06:40] Optimizing unique challenges

[13:39] LLM on-device challenges

[20:34] Agent overlord tension

[23:56] AI app constraints

[29:23] Siri limitations and trust gap

[32:01] Voice-driven app privacy

[35:49] Platform lock-in vs aggregation

[42:26] On-device AI optimizations

[45:38] Wrap up

MLOps.community

Hosted by Demetrios, MLOps.community is a space for honest, meandering talks about the real work of making artificial intelligence systems actually work. This isn't about hype or theoretical papers; it's about the messy, practical, and often surprising journey of taking models from a notebook into a live environment. You'll hear from engineers and practitioners who are in the trenches, discussing the tools, the frustrations, and the occasional breakthroughs that define the day-to-day. The conversations are deliberately relaxed, covering everything from traditional machine learning pipelines to the new world of large language models and even the intangible "vibes" of team culture and process. Each episode peels back a layer on what "production" really means, whether that involves deploying a predictive service, managing an agentic system, or maintaining reliability as everything scales. Tuning into this podcast feels like grabbing a coffee with colleagues who aren't afraid to dig into the technical nitty-gritty while keeping the tone conversational and accessible. It's for anyone who builds, manages, or is just curious about the operational backbone that allows AI to deliver value, offering a grounded perspective often missing from the broader conversation.

Author: Demetrios Language: en-us Episodes: 100

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

Real time features, AI search, Agentic similarities

28.12.2025

Duration: 29:27

Varant Zanoyan is the Co-founder & CEO at Zipline AI, working on building a next-generation AI/ML infrastructure platform that streamlines data pipelines, model deployment, observability, and governance to accelerate ent…

[not-audio_url]

[/not-audio_url]

Tool definitions are the new Prompt Engineering

23.12.2025

Duration: 58:08

Alex Salazar is the CEO and Co-Founder of Arcade.dev, working on secure AI agents and real-world automation integrations.Chiara Caratelli is a Data Scientist at Prosus Group, working on AI agents, web automation, and eva…

[not-audio_url]

[/not-audio_url]

The Future of AI Agents is Sandboxed

19.12.2025

Duration: 58:03

Jonathan Wall is the CEO at Runloop.ai, working on enterprise-grade infrastructure and execution environments for AI coding agents.The Future of AI Agents is Sandboxed // MLOps Podcast #353 with Jonathan Wall, CEO at Run…

[not-audio_url]

[/not-audio_url]

Context engineering 2.0, Agents + Structured Data, and the Redis Context Engine

16.12.2025

Duration: 45:33

Simba Khadder is the founder and CEO of Featureform, now at Redis, working on real-time feature orchestration and building a context engine for AI and agents.Context Engineering 2.0, Simba Khadder // MLOps Podcast #352Jo…

[not-audio_url]

[/not-audio_url]

Does AgenticRAG Really Work?

12.12.2025

Duration: 1:01:39

Satish Bhambri is a Sr Data Scientist at Walmart Labs, working on large-scale recommendation systems and conversational AI, including RAG-powered GroceryBot agents, vector-search personalization, and transformer-based ad…

[not-audio_url]

[/not-audio_url]

How Sierra AI Does Context Engineering

10.12.2025

Duration: 1:04:03

Zack Reneau-Wedeen is the Head of Product at Sierra, leading the development of enterprise-ready AI agents — from Agent Studio 2.0 to the Agent Data Platform — with a focus on richer workflows, persistent memory, and hig…

[not-audio_url]

[/not-audio_url]

Overcoming Challenges in AI Agent Deployment: The Sweet Spot for Governance and Security // Spencer Reagan // #349

05.12.2025

Duration: 54:17

Spencer Reagan leads R&D at Airia, working on secure AI-agent orchestration, data governance systems, and real-time signal fusion technologies for regulated and defense environments.Overcoming Challenges in AI Agent Depl…

[not-audio_url]

[/not-audio_url]

Hardening Agents for E-commerce Scale: From RL Alignment to Reliability // Panel 2

02.12.2025

Duration: 29:16

Thanks to Prosus Group for collaborating on the Agents in Production Virtual Conference 2025.Abstract //The discussion centers on highly technical yet practical themes, such as the use of advanced post-training technique…

[not-audio_url]

[/not-audio_url]

Building Cursor: A Fireside Chat with VP Solutions Ricky Doar

27.11.2025

Duration: 26:44

Ricky Doar is the VP of Solutions at Cursor, where he leads forward-deployed engineers. A seasoned product and technical leader with over a decade of experience in developer tools and data platforms, Ricky previously ser…

[not-audio_url]

[/not-audio_url]

Relational Foundation Models: Unlocking the Next Frontier of Enterprise AI // Jure Leskovec // #348

25.11.2025

Duration: 49:00

Dr. Jure Leskovec is the Chief Scientist at Kumo.AI and a Stanford professor, working on relational foundation models and graph-transformer systems that bring enterprise databases into the foundation-model era.Relational…