GPU Uptime with VAST Data CTO

Author: Demetrios November 11, 2025 Duration: 1:33:45

Technology

Andy Pernsteiner is the Field CTO at VAST Data, working on large-scale AI infrastructure, serverless compute near data, and the rollout of VAST’s AI Operating System.

The GPU Uptime Battle // MLOps Podcast #346 with Andy Pernsteiner, Field CTO of VAST Data.Huge thanks to VAST Data for supporting this episode!

Join the Community:

https://go.mlops.community/YTJoinIn

Get the newsletter:

https://go.mlops.community/YTNewsletter

// Abstract

Most AI projects don’t fail because of bad models; they fail because of bad data plumbing. Andy Pernsteiner joins the podcast to talk about what it actually takes to build production-grade AI systems that aren’t held together by brittle ETL scripts and data copies. He unpacks why unifying data - rather than moving it - is key to real-time, secure inference, and how event-driven, Kubernetes-native pipelines are reshaping the way developers build AI applications. It’s a conversation about cutting out the complexity, keeping data live, and building systems smart enough to keep up with your models.

// Bio

Andy is the Field Chief Technology Officer at VAST, helping customers build, deploy, and scale some of the world’s largest and most demanding computing environments.

Andy has spent the past 15 years focused on supporting and building large-scale, high-performance data platform solutions. From humble beginnings as an escalations engineer at pre-IPO Isilon, to leading a team of technical Ninjas at MapR, he’s consistently been in the frontlines solving some of the toughest challenges that customers face when implementing Big Data Analytics and next-generation AI solutions.

// Related Links

Website: www.vastdata.com

https://www.youtube.com/watch?v=HYIEgFyHaxk

https://www.youtube.com/watch?v=RyDHIMniLro

The Mom Test by Rob Fitzpatrick: https://www.momtestbook.com/

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community

[https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Andy on LinkedIn: /andypernsteiner

Timestamps:

[00:00] Prototype to production gap

[00:21] AI expectations vs reality

[03:00] Prototype vs production costs

[07:47] Technical debt awareness

[10:13] The Mom Test

[15:40] Chaos engineering

[22:25] Data messiness reflection

[26:50] Small data value

[30:53] Platform engineer mindset shift

[34:26] Gradient description comparison

[38:12] Empathy in MLOps

[45:48] Empathy in Engineering

[51:04] GPU clusters rolling updates

[1:03:14] Checkpointing strategy comparison

[1:09:44] Predictive vs Generative AI

[1:17:51] On Growth, Community, and New Directions

[1:24:21] UX of agents

[1:32:05] Wrap up

MLOps.community

Hosted by Demetrios, MLOps.community is a space for honest, meandering talks about the real work of making artificial intelligence systems actually work. This isn't about hype or theoretical papers; it's about the messy, practical, and often surprising journey of taking models from a notebook into a live environment. You'll hear from engineers and practitioners who are in the trenches, discussing the tools, the frustrations, and the occasional breakthroughs that define the day-to-day. The conversations are deliberately relaxed, covering everything from traditional machine learning pipelines to the new world of large language models and even the intangible "vibes" of team culture and process. Each episode peels back a layer on what "production" really means, whether that involves deploying a predictive service, managing an agentic system, or maintaining reliability as everything scales. Tuning into this podcast feels like grabbing a coffee with colleagues who aren't afraid to dig into the technical nitty-gritty while keeping the tone conversational and accessible. It's for anyone who builds, manages, or is just curious about the operational backbone that allows AI to deliver value, offering a grounded perspective often missing from the broader conversation.

Author: Demetrios Language: en-us Episodes: 100

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

Building Out GPU Clouds // Mohan Atreya // #317

24.05.2025

Duration: 47:57

Demetrios and Mohan Atreya break down the GPU madness behind AI — from supply headaches and sky-high prices to the rise of nimble GPU clouds trying to outsmart the giants. They cover power-hungry hardware, failed experim…

[not-audio_url]

[/not-audio_url]

A Candid Conversation Around MCP and A2A // Rahul Parundekar and Sam Partee // #316 SF Live

21.05.2025

Duration: 1:04:42

Demetrios, Sam Partee, and Rahul Parundekar unpack the chaos of AI agent tools and the evolving world of MCP (Model Context Protocol). With sharp insights and plenty of laughs, they dig into tool permissions, security qu…

[not-audio_url]

[/not-audio_url]

AI in M&A: Building, Buying, and the Future of Dealmaking // Kison Patel // #315

16.05.2025

Duration: 55:32

AI in M&A: Building, Buying, and the Future of Dealmaking // MLOps Podcast #315 with Kison Patel, CEO and M&A Science at DealRoom.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlop…

[not-audio_url]

[/not-audio_url]

AI, Marketing, and Human Decision Making // Fausto Albers // #313

14.05.2025

Duration: 49:40

AI, Marketing, and Human Decision Making // MLOps Podcast #313 with Fausto Albers, AI Engineer & Community Lead at AI Builders Club.Join the Community: https://go.mlops.community/YTJoinIn Get the newsletter: https://go.m…

[not-audio_url]

[/not-audio_url]

MLOps with Databricks // Maria Vechtomova // #314

13.05.2025

Duration: 52:43

MLOps with Databricks // MLOps Podcast #314 with Maria Vechtomova, MLOps Tech Lead | Founder at Ahold Delhaize | Marvelous MLOps.Join the Community: https://go.mlops.community/YTJoinIn Get the newsletter: https://go.mlop…

[not-audio_url]

[/not-audio_url]

Making AI Reliable is the Greatest Challenge of the 2020s // Alon Bochman // #312

06.05.2025

Duration: 1:01:37

Making AI Reliable is the Greatest Challenge of the 2020s // MLOps Podcast #312 with Alon Bochman, CEO of RagMetrics.Join the Community: https://go.mlops.community/YTJoinIn Get the newsletter: https://go.mlops.community/…

[not-audio_url]

[/not-audio_url]

Behavior Modeling, Secondary AI Effects, Bias Reduction & Synthetic Data // Devansh Devansh // #311

02.05.2025

Duration: 1:01:35

Behavior Modeling, Secondary AI Effects, Bias Reduction & Synthetic Data // MLOps Podcast #311 with Devansh Devansh, Head of AI at Stealth AI Startup.Join the Community: https://go.mlops.community/YTJoinIn Get the newsle…

[not-audio_url]

[/not-audio_url]

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310

29.04.2025

Duration: 1:14:01

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // MLOps Podcast #310 with Paco Nathan, Principal DevRel Engineer at Senzing & Weidong Yang, CEO of Kineviz.Join the Co…

[not-audio_url]

[/not-audio_url]

AI Data Engineers - Data Engineering After AI // Vikram Chennai // #309

25.04.2025

Duration: 49:40

AI Data Engineers - Data Engineering after AI // MLOps Podcast #309 with Vikram Chennai, Founder/CEO of Ardent AI.Join the Community: https://go.mlops.community/YTJoinIn Get the newsletter: https://go.mlops.community/YTN…