Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

Author: NVIDIA May 13, 2026 Duration: 23:35

Technology

Snap processes more than 10 petabytes of experimentation data every single morning—and with NVIDIA GPU-accelerated Apache Spark on Google Cloud, Snap cut job costs by 76%, reduced memory usage by 80%, and eliminated 120 terabytes of disk spill from its pipelines. Prudhvi Vatala, head of engineering platforms at Snap, joins the NVIDIA AI Podcast to break down how he and his team completely modernized data infrastructure for a social platform serving nearly a billion monthly active users—using NVIDIA cuDF plugin (formerly referred to as NVIDIA RAPIDS plugin) for Apache Spark on Google Kubernetes Engine, with zero application code changes. 🔬Topics covered: How Snap runs A/B tests at planetary scale using rigorous statistical methods like heterogeneous treatment effect detection and variance reduction Why Snap reuses idle inference GPUs between 1–5 a.m. for batch data processing—and how it built a Kubernetes-based platform to do it How NVIDIA cuDF delivered 3x+ speedups on join-heavy Spark jobs with no code rewrites The full business impact: 76% cost reduction, 62% fewer cores, 80% less memory, 120 TB of spill eliminated How a three-way partnership between Snap, NVIDIA, and Google Cloud made it possible in just 8–9 months Chapters: 0:00 Introduction and Snap overview 3:35 What is Snap’s experimentation platform? 4:05 Why experimentation, safety, and privacy are core at Snap 4:52 How A/B testing works at billion-user scale 8:14 Discovering NVIDIA cuDF plugin 9:06 Benchmarking results: join, union, and aggregation jobs 12:00 Reusing idle GPUs overnight via GKE 13:24 Building a bottom-up GPU data platform at Snap 17:48 Results: 76% cost reduction and partnership impact 20:56 Snap’s evolution and what’s next Learn more: NVIDIA cuDF: https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf#accel-apache

NVIDIA AI Podcast

Behind every major shift in how we live and work, there's a story about the technology that made it possible. The NVIDIA AI Podcast, produced by NVIDIA, delves into those narratives, moving beyond headlines to explore the human and technical ingenuity driving progress. Each episode connects with creators, researchers, and pioneers who are applying artificial intelligence and accelerated computing in surprising ways. You'll hear conversations that unpack complex ideas, from how AI is accelerating scientific discovery in medicine and climate science to its role in reimagining creative industries and building more sustainable systems. This isn't about abstract futures; it's a grounded look at the tools and collaborations solving real-world problems today. The discussions are crafted to be accessible, offering clarity on transformative topics without oversimplifying the profound work being done. Tuning into this podcast provides a unique vantage point into the ecosystem of innovation, where the focus is on practical applications and the thinkers turning possibility into reality. It's an ongoing series for anyone curious about the mechanics of change and how computational power is being harnessed to tackle some of our most pressing challenges and unlock new opportunities across every field.

Author: NVIDIA Language: English Episodes: 100

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

Accelerating Disaster Response with GiveDirectly's Nick Allardice - Ep. 287

28.01.2026

Duration: 48:35

GiveDirectly president and CEO Nick Allardice explains how his team uses AI, mobile money, and satellite imagery to send cash directly to people living in poverty and crisis, often within days of a disaster. He describes…

[not-audio_url]

[/not-audio_url]

From Warehouses to Robot Shoppers: Jason Goldberg Talks Retail’s AI Makeover - Ep. 286

21.01.2026

Duration: 49:52

Jason “Retailgeek” Goldberg, Chief Commerce Strategy Officer at Publicis Groupe, discusses how AI is optimizes retail operations and is rewriting the consumer shopping experience. Learn why AI acceleration is able to rei…

[not-audio_url]

[/not-audio_url]

Safer, Smarter Construction Sites with Edge AI and Caterpillar Autonomous Machines - Ep. 285

14.01.2026

Duration: 39:39

Brandon Hootman, Vice President of Data and Artificial Intelligence at Caterpillar, joins the AI Podcast to discuss how the company uses NVIDIA’s AI Factory, Omniverse digital twins, and edge AI to streamline manufacturi…

[not-audio_url]

[/not-audio_url]

Lowering the Cost of Intelligence With NVIDIA's Ian Buck - Ep. 284

29.12.2025

Duration: 38:15

Discover how mixture‑of‑experts (MoE) architecture is enabling smarter AI models without a proportional increase in the required compute and cost. Using vivid analogies and real-world examples, NVIDIA’s Ian Buck breaks d…

[not-audio_url]

[/not-audio_url]

How Anyone Can Build Meaningful AI Without Code - Ep. 283

17.12.2025

Duration: 40:29

Empromptu CEO Shanea Leven shares how her company helps people without coding experience build meaningful, production-ready AI applications — fast and accurately. Powered by NVIDIA CUDA, Empromptu’s “AI that builds AI” p…

[not-audio_url]

[/not-audio_url]

AI in 2025: From Agents to Factories - Ep. 282

10.12.2025

Duration: 29:39

The year in AI began with agents and brought us creative superpowers, robots on farms and in operating rooms, and so much more. Look back on AI in 2025 through the voices of the people who created it in this recap episod…

[not-audio_url]

[/not-audio_url]

How AI Data Platforms Are Shaping the Future of Enterprise Storage - Ep. 281

18.11.2025

Duration: 35:18

Bringing GPUs to your data is a game changer for the modern enterprise. Jacob Liberman, Director of Enterprise Product Management at NVIDIA, details the AI Data Platform, a GPU-accelerated storage platform built for AI.…

[not-audio_url]

[/not-audio_url]

Mayor Matt Mahan on How AI Is Changing City Life in San Jose - Ep. 280

12.11.2025

Duration: 46:38

Mayor Matt Mahan and NVIDIA’s Jumbi Edulbehram reveal how AI is making San Jose smarter—optimizing transit, translating meetings in real time, upskilling city staff, and powering pioneering civic programs. Learn how AI i…

[not-audio_url]

[/not-audio_url]

AI for Robotics and Manufacturing | GTC Live Washington, D.C. Chapter 5

11.11.2025

Duration: 26:16

Coverage from keynote pregame show, GTC Live Washington D.C. Chapter 5: AI for Robotics and Manufacturing The boundary between digital intelligence and physical action is disappearing. Industry pioneers show how robotics…

[not-audio_url]

[/not-audio_url]

AI for Science | GTC Live Washington, D.C. Chapter 4

11.11.2025

Duration: 34:08

Coverage from keynote pregame show, GTC Live Washington D.C. Chapter 4: AI for Science In laboratories and research centers, AI is becoming a core instrument of discovery. Scientists and technologists explore how computa…