On-Device AI Agents in Production: Privacy, Performance, and Scale // Varun Khare & Neeraj Poddar // #340

On-Device AI Agents in Production: Privacy, Performance, and Scale // Varun Khare & Neeraj Poddar // #340

Author: Demetrios September 30, 2025 Duration: 46:10

On-Device AI Agents in Production: Privacy, Performance, and Scale // MLOps Podcast #340 with NimbleEdge's Varun Khare, Founder/CEO and Neeraj Poddar, Co-founder & CTO.


Join the Community:

https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter


// Abstract

AI agents are transitioning from experimental stages to performing real work in production; however, they have largely been limited to backend task automation. A critical frontier in this evolution is the on-device AI agent, enabling sophisticated, AI-native experiences directly on mobile and embedded devices. While cloud-based AI faces challenges like constant connectivity demands, increased latency, privacy risks, and high operational costs, on-device breaks through these trade-offs.


We'll delve into the practical side of building and deploying AI agents with “DeliteAI”, an open-source on-device AI agentic framework. We'll explore how lightweight Python runtimes facilitate the seamless orchestration of end-to-end workflows directly on devices, allowing AI/ML teams to define data preprocessing, feature computation, model execution, and post-processing logic independently of frontend code. This architecture empowers agents to adapt to varying tasks and user contexts through an ecosystem of tools natively supported on Android/iOS platforms, handling all the permissions, model lifecycles, and many more.


// Bio

Varun Khare

Varun is the Founder and CEO of NimbleEdge, an AI startup pioneering privacy-first, on-device intelligence. With an academic foundation in AI and neuroscience from UC Berkeley, MPI Frankfurt, and IIT Kanpur, Varun brings deep expertise at the intersection of technology and science. Before founding NimbleEdge, Varun led open-source projects at OpenMined, focusing on privacy-aware AI, and published research in computer vision.


Neeraj Poddar

Neeraj Poddar is the Co-founder and CTO at NimbleEdge. Prior to NimbleEdge, he was the Co-founder of Aspen Mesh, VP of Engineering at Solo.io, and led the Istio open source community. He has worked on various aspects of AI, networking, security, and distributed systems over the span of his career. Neeraj focuses on the application of open source technologies across different industries in terms of scalability and security. When not working on AI, you can find him playing racquetball and gaining back the calories spent playing by trying out new restaurants.


// Related Links

Website: https://www.nimbleedge.com/

https://www.nimbleedge.com/blog/why-ai-is-not-working-for-you

https://www.nimbleedge.com/blog/state-of-on-device-ai

https://www.youtube.com/watch?v=Qqj_Nl2MihE

https://www.linkedin.com/events/7343237917982527488/comments/


~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

Sign up for the next meetup: [https://go.mlops.community/register]

MLOps Swag/Merch: [https://shop.mlops.community/]


Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Varun on LinkedIn: /vkkhare/

Connect with Neeraj on LinkedIn: /nrjpoddar/


Timestamps:

[00:00] On-device AI skepticism

[02:47] Word suggestion for AI

[06:40] Optimizing unique challenges

[13:39] LLM on-device challenges

[20:34] Agent overlord tension

[23:56] AI app constraints

[29:23] Siri limitations and trust gap

[32:01] Voice-driven app privacy

[35:49] Platform lock-in vs aggregation

[42:26] On-device AI optimizations

[45:38] Wrap up


Hosted by Demetrios, MLOps.community is a space for honest, meandering talks about the real work of making artificial intelligence systems actually work. This isn't about hype or theoretical papers; it's about the messy, practical, and often surprising journey of taking models from a notebook into a live environment. You'll hear from engineers and practitioners who are in the trenches, discussing the tools, the frustrations, and the occasional breakthroughs that define the day-to-day. The conversations are deliberately relaxed, covering everything from traditional machine learning pipelines to the new world of large language models and even the intangible "vibes" of team culture and process. Each episode peels back a layer on what "production" really means, whether that involves deploying a predictive service, managing an agentic system, or maintaining reliability as everything scales. Tuning into this podcast feels like grabbing a coffee with colleagues who aren't afraid to dig into the technical nitty-gritty while keeping the tone conversational and accessible. It's for anyone who builds, manages, or is just curious about the operational backbone that allows AI to deliver value, offering a grounded perspective often missing from the broader conversation.
Author: Language: en-us Episodes: 100

MLOps.community
Podcast Episodes
Real time features, AI search, Agentic similarities [not-audio_url] [/not-audio_url]

Duration: 29:27
Varant Zanoyan is the Co-founder & CEO at Zipline AI, working on building a next-generation AI/ML infrastructure platform that streamlines data pipelines, model deployment, observability, and governance to accelerate ent…
Tool definitions are the new Prompt Engineering [not-audio_url] [/not-audio_url]

Duration: 58:08
Alex Salazar is the CEO and Co-Founder of Arcade.dev, working on secure AI agents and real-world automation integrations.Chiara Caratelli is a Data Scientist at Prosus Group, working on AI agents, web automation, and eva…
The Future of AI Agents is Sandboxed [not-audio_url] [/not-audio_url]

Duration: 58:03
Jonathan Wall is the CEO at Runloop.ai, working on enterprise-grade infrastructure and execution environments for AI coding agents.The Future of AI Agents is Sandboxed // MLOps Podcast #353 with Jonathan Wall, CEO at Run…
Does AgenticRAG Really Work? [not-audio_url] [/not-audio_url]

Duration: 1:01:39
Satish Bhambri is a Sr Data Scientist at Walmart Labs, working on large-scale recommendation systems and conversational AI, including RAG-powered GroceryBot agents, vector-search personalization, and transformer-based ad…
How Sierra AI Does Context Engineering [not-audio_url] [/not-audio_url]

Duration: 1:04:03
Zack Reneau-Wedeen is the Head of Product at Sierra, leading the development of enterprise-ready AI agents — from Agent Studio 2.0 to the Agent Data Platform — with a focus on richer workflows, persistent memory, and hig…
Building Cursor: A Fireside Chat with VP Solutions Ricky Doar [not-audio_url] [/not-audio_url]

Duration: 26:44
Ricky Doar is the VP of Solutions at Cursor, where he leads forward-deployed engineers. A seasoned product and technical leader with over a decade of experience in developer tools and data platforms, Ricky previously ser…