Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

Author: NVIDIA May 13, 2026 Duration: 23:35
Snap processes more than 10 petabytes of experimentation data every single morning—and with NVIDIA GPU-accelerated Apache Spark on Google Cloud, Snap cut job costs by 76%, reduced memory usage by 80%, and eliminated 120 terabytes of disk spill from its pipelines. Prudhvi Vatala, head of engineering platforms at Snap, joins the NVIDIA AI Podcast to break down how he and his team completely modernized data infrastructure for a social platform serving nearly a billion monthly active users—using NVIDIA cuDF plugin (formerly referred to as NVIDIA RAPIDS plugin) for Apache Spark on Google Kubernetes Engine, with zero application code changes. 🔬Topics covered: How Snap runs A/B tests at planetary scale using rigorous statistical methods like heterogeneous treatment effect detection and variance reduction Why Snap reuses idle inference GPUs between 1–5 a.m. for batch data processing—and how it built a Kubernetes-based platform to do it How NVIDIA cuDF delivered 3x+ speedups on join-heavy Spark jobs with no code rewrites The full business impact: 76% cost reduction, 62% fewer cores, 80% less memory, 120 TB of spill eliminated How a three-way partnership between Snap, NVIDIA, and Google Cloud made it possible in just 8–9 months Chapters: 0:00 Introduction and Snap overview 3:35 What is Snap’s experimentation platform? 4:05 Why experimentation, safety, and privacy are core at Snap 4:52 How A/B testing works at billion-user scale 8:14 Discovering NVIDIA cuDF plugin 9:06 Benchmarking results: join, union, and aggregation jobs 12:00 Reusing idle GPUs overnight via GKE 13:24 Building a bottom-up GPU data platform at Snap 17:48 Results: 76% cost reduction and partnership impact 20:56 Snap’s evolution and what’s next Learn more: NVIDIA cuDF: https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf#accel-apache

Behind every major shift in how we live and work, there's a story about the technology that made it possible. The NVIDIA AI Podcast, produced by NVIDIA, delves into those narratives, moving beyond headlines to explore the human and technical ingenuity driving progress. Each episode connects with creators, researchers, and pioneers who are applying artificial intelligence and accelerated computing in surprising ways. You'll hear conversations that unpack complex ideas, from how AI is accelerating scientific discovery in medicine and climate science to its role in reimagining creative industries and building more sustainable systems. This isn't about abstract futures; it's a grounded look at the tools and collaborations solving real-world problems today. The discussions are crafted to be accessible, offering clarity on transformative topics without oversimplifying the profound work being done. Tuning into this podcast provides a unique vantage point into the ecosystem of innovation, where the focus is on practical applications and the thinkers turning possibility into reality. It's an ongoing series for anyone curious about the mechanics of change and how computational power is being harnessed to tackle some of our most pressing challenges and unlock new opportunities across every field.
Author: Language: English Episodes: 100

NVIDIA AI Podcast
Podcast Episodes
One Brain, Any Robot: Skild AI's Skild Brain Explained - Ep. 295 [not-audio_url] [/not-audio_url]

Duration: 29:47
What if one AI brain could run every robot on the planet—a humanoid, a warehouse arm, and a dog-like inspection bot—all at once? That's not a thought experiment. That's what Skild AI is building right now. Deepak Pathak…
How AI Will Change Quantum Computing - Ep. 294 [not-audio_url] [/not-audio_url]

Duration: 31:28
What happens when you combine AI with quantum computing? NVIDIA's Nic Harrigan joins the AI Podcast to break down the state of quantum, explain why error correction is the pivotal challenge, and reveal how NVIDIA Ising—t…
Powering the AI Inference Wave with EPRI's Ben Sooter - Ep. 292 [not-audio_url] [/not-audio_url]

Duration: 32:20
AI is reshaping electricity demand. What does increased demand, and the shape of that demand, mean for the electric grid? Ben Sooter, Director of R&D at EPRI joins the podcast to explain why most of an AI model’s lifetim…