Lessons from Transcribing and Indexing 3.5 Million Podcasts with Arvid Kahl

Author: Software Huddle July 8, 2025 Duration: 1:18:00

Technology

Big time guest today as Arvid Kahl joins us. Arvid is my favorite type of guest -- a deeply technical founder that can talk about both the technical and business challenges of a startup. Lots to enjoy from this episode. Arvid is known as the Bootstrapped Founder and has documented his path to selling Feedback Panda back in 2019. He's now building Podscan and sharing his journey as he goes. Podscan is a fascinating project. It's making the content of *every* podcast episode around the world fully searchable. He currently has 3.5 million episodes transcribed and adds another 30,000 - 50,000 episodes every day. This involves a ton of technical challenges, including how to get the best transcription results from the latest LLMs, whether you should use APIs from public providers or run your own LLMs, and how to efficiently provide full-text search across terabytes of transcription data. Arvid shares the lessons he's learned and the various strategies he's tried over the years. But there are also unique business challenges. For most technical businesses, your infrastructure costs grow in line with your customers. More customers == more data == more servers. With Podscan, Arvid has to index the entire podcast ecosystem regardless of his customers. This means a lot of upfront investment as he looks to grow his customer base. Arvid tells us how he's optimized his infrastructure to account for this unique challenge.

Software Huddle

Every week on Software Huddle, Alex DeBrie and Sean Falconer sit down with a different expert from across the tech landscape. The conversations are less about quick tips and more about substantive discussions, digging into the real challenges and decisions behind building software, launching products, and navigating the industry's constant shifts. You'll hear from practitioners who have been in the trenches, offering perspectives that blend deep technical knowledge with hard-won business and entrepreneurial experience. Alex brings his specialized expertise as the author of The DynamoDB Book and an AWS Data Hero, while Sean contributes a unique viewpoint shaped by over two decades as an engineer, founder, and marketing executive, recognized as a Snowflake Data Superhero. Together, they create a space where complex topics in software development and technology trends become accessible and genuinely engaging. This podcast is for anyone who wants to move beyond surface-level news and understand the "why" behind the tools and strategies shaping our digital world. Tune in for a thoughtful huddle that feels more like a candid conversation between colleagues than a formal interview.

Author: Software Huddle Language: en-us Episodes: 79

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

Rewriting in Rust + Being a Learning Machine with AJ Stuyvenberg

06.05.2025

Duration: 1:21:36

Today's guest is AJ Stuyvenberg, a Staff Engineer at Datadog working on their Serverless observability project. He had a great article recently about how they rewrote their AWS Lambda extension in Rust. It's a really int…

[not-audio_url]

[/not-audio_url]

Software Reliability Agents with Amal Kiran

29.04.2025

Duration: 51:07

So if you're writing code or keeping systems running, you probably know the drill. Late night pages, chasing down weird bugs, dealing with alert storms. It's tough! It costs money when things break, and honestly, nobody…

[not-audio_url]

[/not-audio_url]

From ORM to Infra: Prisma Postgres with Søren Bramer Schmidt

22.04.2025

Duration: 1:02:22

Today we have Søren from Prisma on the show. Prisma has been the most popular ORM in the TypeScript world for a while, and now they’re moving more into hosted infrastructure. We spend a lot of time talking about their ne…

[not-audio_url]

[/not-audio_url]

Fast Inference with Hassan El Mghari

08.04.2025

Duration: 53:06

Today we have Hassan back on the show. Hassan was one of our first guests for Huddle when he was working at Vercel, but since then, he's joined Together AI, one of the hottest companies in the world. They just raised a m…

[not-audio_url]

[/not-audio_url]

Seattle Startups, AI’s Future & Big Acquisitions with Yujian Tang

14.03.2025

Duration: 1:02:54

Today on the show, we talked with Yujian Tang. He was on the show previously when he worked at Zilliz, when we talked about vector databases and RAG. He's since branched out on his own, building the tech startup scene in…

[not-audio_url]

[/not-audio_url]

Faster & Cheaper on PlanetScale Metal with Sam Lambert

12.03.2025

Duration: 1:19:43

Today, we have Sam Lambert back on the show! Sam is the CEO of PlanetScale, and if you follow him on X, you know he’s one of the sharpest voices in the database space—cutting through the hype with deep experience and a n…

[not-audio_url]

[/not-audio_url]

Redis but Faster With Roman Gershman

04.03.2025

Duration: 1:00:51

Redis is consistently one of the most beloved pieces of infrastructure for developers. And in the last few years, we've seen a number of new Redis-compatible projects that aim to improve on the core of Redis in some way.…

[not-audio_url]

[/not-audio_url]

Lessons from Building Tagged.com + AI-Driven Database Optimization with Johann Schleier-Smith

11.12.2024

Duration: 56:13

Today, we’re joined by Johann Schleier-Smith. Johann co-founded Tagged during the early days of social media, a time when building scalable systems for the web was uncharted territory. Back then, cloud computing didn’t e…

[not-audio_url]

[/not-audio_url]

Building + Evolving Sentry's Architecture and Funding Open Source with David Cramer

13.11.2024

Duration: 1:13:06

Today, we have David Cramer on the show. David is one of the co-founders of Sentry, an application monitoring tool that's one of the most widely-adopted tools for developers. Sentry does over 300,000 events per second on…

[not-audio_url]

[/not-audio_url]

Deep Dive into Inference Optimization for LLMs with Philip Kiely

06.11.2024

Duration: 1:04:05

Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads. We go deep on Inference Optimization. We cover choosing a model, discuss the hype a…