SQL Meets Vector Search with Linpeng Tang of MyScale

SQL Meets Vector Search with Linpeng Tang of MyScale

Author: Software Huddle April 2, 2024 Duration: 1:01:38
Welcome back to an episode where we're talking Vectors, Vector Databases, and AI with Linpeng Tang, CTO and co-founder of MyScale. MyScale is a super interesting technology. They're combining the best of OLAP databases with Vector Search. The project started back in 2019 where they forked ClickHouse and then adapted it to support Vector Storage, Indexing, and Search. The really unique and cool thing is you get the familiarity and usability of SQL with the power of being able to compare the similarity between unstructured data. We think this has really fascinating use cases for analytics well beyond what we're seeing with other vector database technology that's mostly restricted to building RAG models for LLMs. Also, because it's built on ClickHouse, MyScale is massively scalable, which is an area that many of the dedicated vector databases actually struggle with. We cover a lot about how vector databases work, why they decided to build off of ClickHouse, and how they plan to open source the database. Timestamps 02:29 Introduction 06:22 Value of a Vector Database 12:40 Forking ClickHouse 18:53 Transforming Clickhouse into a SQL vector database 32:08 Data modeling 32:56 What data can be Vectorized 38:37 Indexing 43:35 Achieving Scale 46:35 Bottlenecks 48:41 MyScale vs other dedicated Vector Databases 51:38 Going Open Source 56:04 Closing thoughts

Every week on Software Huddle, Alex DeBrie and Sean Falconer sit down with a different expert from across the tech landscape. The conversations are less about quick tips and more about substantive discussions, digging into the real challenges and decisions behind building software, launching products, and navigating the industry's constant shifts. You'll hear from practitioners who have been in the trenches, offering perspectives that blend deep technical knowledge with hard-won business and entrepreneurial experience. Alex brings his specialized expertise as the author of The DynamoDB Book and an AWS Data Hero, while Sean contributes a unique viewpoint shaped by over two decades as an engineer, founder, and marketing executive, recognized as a Snowflake Data Superhero. Together, they create a space where complex topics in software development and technology trends become accessible and genuinely engaging. This podcast is for anyone who wants to move beyond surface-level news and understand the "why" behind the tools and strategies shaping our digital world. Tune in for a thoughtful huddle that feels more like a candid conversation between colleagues than a formal interview.
Author: Language: en-us Episodes: 79

Software Huddle
Podcast Episodes
Why Building an API for Email is Hard with Christine Spang [not-audio_url] [/not-audio_url]

Duration: 56:33
Today, on the show we have Christine Spang, Co-founder and CTO of Nylas. Christine was the keynote at the recent Shift Developer Conference in Miami, and we caught up with her there. Nylas is a unified API for email, cal…
Enterprise-grade Dev Environments with Ivan Burazin [not-audio_url] [/not-audio_url]

Duration: 51:33
Today’s guest is Ivan Burazin, the co-founder and CEO of Daytona, an actual creator of the Shift Developer Conference that he sold some time ago to Infobip. Ivan has tons of experience building developer tools, he has be…
Operational Data Warehouse with Nikhil Benesch [not-audio_url] [/not-audio_url]

Duration: 1:05:56
Today's episode is with Nikhil Benesch, who's the co-founder and CTO at Materialize, an Operational Data Warehouse. Materialize gets you the best of both worlds, combining the capabilities of your data warehouse with the…
Multi-tenancy with Khawaja Shams [not-audio_url] [/not-audio_url]

Duration: 1:09:04
Today's episode is with Khawaja Shams. Khawaja is the CEO and co-founder of Momento, which is a Serverless Cache. He used to lead the DynamoDB team at AWS and now he's doing Memento. We talk about a lot of different thin…
All about Rust with Tim McNamara [not-audio_url] [/not-audio_url]

Duration: 1:51:56
In today's episode with Tim McNamara, we talk all about Rust. Tim is one of the leading educators in the whole Rust educational space. He wrote the Rust in Action book, which is probably the best Rust book out there. He…
Becoming an Epic Web Developer with Kent C Dodds [not-audio_url] [/not-audio_url]

Duration: 55:39
Today, we have Kent C Dodds on the show. If you don't know Kent, he's a well known expert in JavaScript, Web Development and Teaching. His courses like Testing JavaScript, Epic React, and Epic Web Dev have helped countle…
What is a Vector Database with Yujian Tang [not-audio_url] [/not-audio_url]

Duration: 50:44
Today's guest is Yujian Tang from Zilliz, one of the big players in the vector database market. This is the first episode in a series of episodes we’re doing on vectors and vector databases. We start with the basics, wha…
Serverless Clickhouse with Tyler Wells [not-audio_url] [/not-audio_url]

Duration: 1:12:12
Today's episode is with Tyler Wells. Tyler is the CTO and co-founder at Propel. He was an early employee at Skype (and Microsoft after the acquisition) as well as Twilio. While at Twilio, Tyler helped build a data platfo…
Elasticsearch Fundamentals with Philipp Krenn [not-audio_url] [/not-audio_url]

Duration: 1:20:09
Today, we have Philipp Krenn on the show. He's the head of DevRel for Elastic, and we took a deep dive on all the Elasticsearch stuff like Indexes, Mappings, Shards and Replicas and how to think about performance and all…
Building a Better C with Loris Cro from Zig Software Foundation [not-audio_url] [/not-audio_url]

Duration: 1:10:25
Zig is a new programming language with big ambitions: to be a better C. Loris Cro is the VP of Community at the Zig Software Foundation, and he takes us through the ins and outs of Zig -- how was it created, what problem…