Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

Author: NVIDIA May 13, 2026 Duration: 23:35
Snap processes more than 10 petabytes of experimentation data every single morning—and with NVIDIA GPU-accelerated Apache Spark on Google Cloud, Snap cut job costs by 76%, reduced memory usage by 80%, and eliminated 120 terabytes of disk spill from its pipelines. Prudhvi Vatala, head of engineering platforms at Snap, joins the NVIDIA AI Podcast to break down how he and his team completely modernized data infrastructure for a social platform serving nearly a billion monthly active users—using NVIDIA cuDF plugin (formerly referred to as NVIDIA RAPIDS plugin) for Apache Spark on Google Kubernetes Engine, with zero application code changes. 🔬Topics covered: How Snap runs A/B tests at planetary scale using rigorous statistical methods like heterogeneous treatment effect detection and variance reduction Why Snap reuses idle inference GPUs between 1–5 a.m. for batch data processing—and how it built a Kubernetes-based platform to do it How NVIDIA cuDF delivered 3x+ speedups on join-heavy Spark jobs with no code rewrites The full business impact: 76% cost reduction, 62% fewer cores, 80% less memory, 120 TB of spill eliminated How a three-way partnership between Snap, NVIDIA, and Google Cloud made it possible in just 8–9 months Chapters: 0:00 Introduction and Snap overview 3:35 What is Snap’s experimentation platform? 4:05 Why experimentation, safety, and privacy are core at Snap 4:52 How A/B testing works at billion-user scale 8:14 Discovering NVIDIA cuDF plugin 9:06 Benchmarking results: join, union, and aggregation jobs 12:00 Reusing idle GPUs overnight via GKE 13:24 Building a bottom-up GPU data platform at Snap 17:48 Results: 76% cost reduction and partnership impact 20:56 Snap’s evolution and what’s next Learn more: NVIDIA cuDF: https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf#accel-apache

Behind every major shift in how we live and work, there's a story about the technology that made it possible. The NVIDIA AI Podcast, produced by NVIDIA, delves into those narratives, moving beyond headlines to explore the human and technical ingenuity driving progress. Each episode connects with creators, researchers, and pioneers who are applying artificial intelligence and accelerated computing in surprising ways. You'll hear conversations that unpack complex ideas, from how AI is accelerating scientific discovery in medicine and climate science to its role in reimagining creative industries and building more sustainable systems. This isn't about abstract futures; it's a grounded look at the tools and collaborations solving real-world problems today. The discussions are crafted to be accessible, offering clarity on transformative topics without oversimplifying the profound work being done. Tuning into this podcast provides a unique vantage point into the ecosystem of innovation, where the focus is on practical applications and the thinkers turning possibility into reality. It's an ongoing series for anyone curious about the mechanics of change and how computational power is being harnessed to tackle some of our most pressing challenges and unlock new opportunities across every field.
Author: Language: English Episodes: 100

NVIDIA AI Podcast
Podcast Episodes
Lowering the Cost of Intelligence With NVIDIA's Ian Buck - Ep. 284 [not-audio_url] [/not-audio_url]

Duration: 38:15
Discover how mixture‑of‑experts (MoE) architecture is enabling smarter AI models without a proportional increase in the required compute and cost. Using vivid analogies and real-world examples, NVIDIA’s Ian Buck breaks d…
How Anyone Can Build Meaningful AI Without Code - Ep. 283 [not-audio_url] [/not-audio_url]

Duration: 40:29
Empromptu CEO Shanea Leven shares how her company helps people without coding experience build meaningful, production-ready AI applications — fast and accurately. Powered by NVIDIA CUDA, Empromptu’s “AI that builds AI” p…
AI in 2025: From Agents to Factories - Ep. 282 [not-audio_url] [/not-audio_url]

Duration: 29:39
The year in AI began with agents and brought us creative superpowers, robots on farms and in operating rooms, and so much more. Look back on AI in 2025 through the voices of the people who created it in this recap episod…
Mayor Matt Mahan on How AI Is Changing City Life in San Jose - Ep. 280 [not-audio_url] [/not-audio_url]

Duration: 46:38
Mayor Matt Mahan and NVIDIA’s Jumbi Edulbehram reveal how AI is making San Jose smarter—optimizing transit, translating meetings in real time, upskilling city staff, and powering pioneering civic programs. Learn how AI i…
AI for Science | GTC Live Washington, D.C. Chapter 4 [not-audio_url] [/not-audio_url]

Duration: 34:08
Coverage from keynote pregame show, GTC Live Washington D.C. Chapter 4: AI for Science In laboratories and research centers, AI is becoming a core instrument of discovery. Scientists and technologists explore how computa…