Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Author: Demetrios February 24, 2026 Duration: 1:25:49

Technology

March 3rd, Computer History Museum CODING AGENTS CONFERENCE, come join us while there are still tickets left.

Chris Fregly is currently focused on building and scaling high-performance AI systems, writing and teaching about AI infrastructure, helping organizations adopt generative AI and performance engineering principles on AWS, and fostering large developer communities around these topics.

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs // MLOps Podcast #363 with Chris Fregly, Founder, AI Performance Engineer, and Investor

Join the Community: https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

MLOps GPU Guide: https://go.mlops.community/gpuguide

// Abstract

In today’s era of massive generative models, it's important to understand the full scope of AI systems' performance engineering. This talk discusses the new O'Reilly book, AI Systems Performance Engineering, and the accompanying GitHub repo (https://github.com/cfregly/ai-performance-engineering).

This talk provides engineers, researchers, and developers with a set of actionable optimization strategies. You'll learn techniques to co-design and co-optimize hardware, software, and algorithms to build resilient, scalable, and cost-effective AI systems for both training and inference.

// Bio

Chris Fregly is an AI performance engineer and startup founder with experience at AWS, Databricks, and Netflix. He's the author of three (3) O'Reilly books, including Data Science on AWS (2021), Generative AI on AWS (2023), and AI Systems Performance Engineering (2025). He also runs the global AI Performance Engineering meetup and speaks at many AI-related conferences, including Nvidia GTC, ODSC, Big Data London, and more.

// Related Links

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch 1st Edition by Chris Fregly: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

Coding Agents Conference: https://luma.com/codingagents

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Chris on LinkedIn: /cfregly

Timestamps:

[00:00] SageMaker HyperPod Resilience

[00:27] Book Creation and Software Engineering

[04:57] Software Engineers and Maintenance

[11:49] AI Systems Performance Engineering

[22:03] Cognitive Biases and Optimization / "Mechanical Sympathy"

[29:36] GPU Rack-Scale Architecture

[33:58] Data Center Reliability Issues

[43:52] AI Compute Platforms

[49:05] Hardware vs Ecosystem Choice

[1:00:05] Claude vs Codex vs Gemini

[1:14:53] Kernel Budget Allocation

[1:18:49] Steerable Reasoning Challenges

[1:24:18] Data Chain Value Awareness

MLOps.community

Hosted by Demetrios, MLOps.community is a space for honest, meandering talks about the real work of making artificial intelligence systems actually work. This isn't about hype or theoretical papers; it's about the messy, practical, and often surprising journey of taking models from a notebook into a live environment. You'll hear from engineers and practitioners who are in the trenches, discussing the tools, the frustrations, and the occasional breakthroughs that define the day-to-day. The conversations are deliberately relaxed, covering everything from traditional machine learning pipelines to the new world of large language models and even the intangible "vibes" of team culture and process. Each episode peels back a layer on what "production" really means, whether that involves deploying a predictive service, managing an agentic system, or maintaining reliability as everything scales. Tuning into this podcast feels like grabbing a coffee with colleagues who aren't afraid to dig into the technical nitty-gritty while keeping the tone conversational and accessible. It's for anyone who builds, manages, or is just curious about the operational backbone that allows AI to deliver value, offering a grounded perspective often missing from the broader conversation.

Author: Demetrios Language: en-us Episodes: 100

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

How Sierra AI Does Context Engineering

10.12.2025

Duration: 1:04:03

Zack Reneau-Wedeen is the Head of Product at Sierra, leading the development of enterprise-ready AI agents — from Agent Studio 2.0 to the Agent Data Platform — with a focus on richer workflows, persistent memory, and hig…

[not-audio_url]

[/not-audio_url]

Overcoming Challenges in AI Agent Deployment: The Sweet Spot for Governance and Security // Spencer Reagan // #349

05.12.2025

Duration: 54:17

Spencer Reagan leads R&D at Airia, working on secure AI-agent orchestration, data governance systems, and real-time signal fusion technologies for regulated and defense environments.Overcoming Challenges in AI Agent Depl…

[not-audio_url]

[/not-audio_url]

Hardening Agents for E-commerce Scale: From RL Alignment to Reliability // Panel 2

02.12.2025

Duration: 29:16

Thanks to Prosus Group for collaborating on the Agents in Production Virtual Conference 2025.Abstract //The discussion centers on highly technical yet practical themes, such as the use of advanced post-training technique…

[not-audio_url]

[/not-audio_url]

Building Cursor: A Fireside Chat with VP Solutions Ricky Doar

27.11.2025

Duration: 26:44

Ricky Doar is the VP of Solutions at Cursor, where he leads forward-deployed engineers. A seasoned product and technical leader with over a decade of experience in developer tools and data platforms, Ricky previously ser…

[not-audio_url]

[/not-audio_url]

Relational Foundation Models: Unlocking the Next Frontier of Enterprise AI // Jure Leskovec // #348

25.11.2025

Duration: 49:00

Dr. Jure Leskovec is the Chief Scientist at Kumo.AI and a Stanford professor, working on relational foundation models and graph-transformer systems that bring enterprise databases into the foundation-model era.Relational…

[not-audio_url]

[/not-audio_url]

Context Engineering, Context Rot, & Agentic Search with the CEO of Chroma, Jeff Huber

21.11.2025

Duration: 44:55

Jeff Huber is the CEO of Chroma, working on context engineering and building reliable retrieval infrastructure for AI systems. Context Engineering, Context Rot, & Agentic Search with the CEO of Chroma, Jeff Huber // MLO…

[not-audio_url]

[/not-audio_url]

Reliable Voice Agents

18.11.2025

Duration: 38:21

Brooke Hopkins is the CEO of Coval, a company making voice agents more reliable. Reliable Voice Agents // MLOps Podcast #347 with Brooke Hopkins, Founder of Coval.Join the Community: https://go.mlops.community/YTJoinInGe…

[not-audio_url]

[/not-audio_url]

The Future of AI Operations: Insights from PwC AI Managed Services

14.11.2025

Duration: 41:27

Rani Radhakrishnan is a Principal at PwC US, leading work on AI-managed services, autonomous agents, and data-driven transformation for enterprises.The Future of AI Operations: Insights from PwC AI Managed Services // ML…

[not-audio_url]

[/not-audio_url]

GPU Uptime with VAST Data CTO

11.11.2025

Duration: 1:33:45

Andy Pernsteiner is the Field CTO at VAST Data, working on large-scale AI infrastructure, serverless compute near data, and the rollout of VAST’s AI Operating System.The GPU Uptime Battle // MLOps Podcast #346 with Andy…

[not-audio_url]

[/not-audio_url]

The Evolution of AI in Cyber Security // Jeff Schwartzentruber // #344

04.11.2025

Duration: 35:14

Dr. Jeff Schwartzentruber is a Senior Machine Learning Scientist at eSentire, working on anomaly detection pipelines and the use of large language models to enhance cybersecurity operations.The Evolution of AI in Cyber S…