SQL Meets Vector Search with Linpeng Tang of MyScale

SQL Meets Vector Search with Linpeng Tang of MyScale

Author: Software Huddle April 2, 2024 Duration: 1:01:38
Welcome back to an episode where we're talking Vectors, Vector Databases, and AI with Linpeng Tang, CTO and co-founder of MyScale. MyScale is a super interesting technology. They're combining the best of OLAP databases with Vector Search. The project started back in 2019 where they forked ClickHouse and then adapted it to support Vector Storage, Indexing, and Search. The really unique and cool thing is you get the familiarity and usability of SQL with the power of being able to compare the similarity between unstructured data. We think this has really fascinating use cases for analytics well beyond what we're seeing with other vector database technology that's mostly restricted to building RAG models for LLMs. Also, because it's built on ClickHouse, MyScale is massively scalable, which is an area that many of the dedicated vector databases actually struggle with. We cover a lot about how vector databases work, why they decided to build off of ClickHouse, and how they plan to open source the database. Timestamps 02:29 Introduction 06:22 Value of a Vector Database 12:40 Forking ClickHouse 18:53 Transforming Clickhouse into a SQL vector database 32:08 Data modeling 32:56 What data can be Vectorized 38:37 Indexing 43:35 Achieving Scale 46:35 Bottlenecks 48:41 MyScale vs other dedicated Vector Databases 51:38 Going Open Source 56:04 Closing thoughts

Every week on Software Huddle, Alex DeBrie and Sean Falconer sit down with a different expert from across the tech landscape. The conversations are less about quick tips and more about substantive discussions, digging into the real challenges and decisions behind building software, launching products, and navigating the industry's constant shifts. You'll hear from practitioners who have been in the trenches, offering perspectives that blend deep technical knowledge with hard-won business and entrepreneurial experience. Alex brings his specialized expertise as the author of The DynamoDB Book and an AWS Data Hero, while Sean contributes a unique viewpoint shaped by over two decades as an engineer, founder, and marketing executive, recognized as a Snowflake Data Superhero. Together, they create a space where complex topics in software development and technology trends become accessible and genuinely engaging. This podcast is for anyone who wants to move beyond surface-level news and understand the "why" behind the tools and strategies shaping our digital world. Tune in for a thoughtful huddle that feels more like a candid conversation between colleagues than a formal interview.
Author: Language: en-us Episodes: 79

Software Huddle
Podcast Episodes
Navigating Large Language Models with Vino Duraisamy from Snowflake [not-audio_url] [/not-audio_url]

Duration: 59:42
In this episode, we spoke with Vino Duraisamy, Developer advocate at Snowflake. Vino has been working as a data and AI engineer for her entire career across companies like Apple, Treeverse, and now Snowflake. And in this…
AGI is Surely Coming with Former Snowflake CEO Bob Muglia [not-audio_url] [/not-audio_url]

Duration: 59:14
Today we have the former CEO of Snowflake, a 23 year veteran of Microsoft, Bob Muglia on the show. In this interview, we discuss Bob's book, Datapreneurs, which takes you on a journey about the people behind the first re…
reInvent BTS, Sam Altman, SEC on Solarwinds, Apple RCS, and more [not-audio_url] [/not-audio_url]

Duration: 51:19
Our special episode is back, and we have a special guest this time. Join Sean, Alex & Merritt in this fun conversation. Timestamps: 00:00 Introduction 01:19 What is a CISO 08:10 Balance of Power 13:50 reInvent BTS 19:45…
AI-driven Database Cache with Ben Hagan from PolyScale [not-audio_url] [/not-audio_url]

Duration: 57:10
PolyScale is a database cache, specifically designed to cache just your database. It is completely Plug and Play and it allows you to scale a database without a huge amount of effort, cost, and complexity. PolyScale curr…
Building for Scale with Mario Žagar from Infobip [not-audio_url] [/not-audio_url]

Duration: 50:00
In this episode, we spoke with Mario Žagar, a Distinguished Engineer at Infobip. Infobip is a tech unicorn based out of Croatia that is a global leader in omnichannel communication, bootstrapping its way to a staggering…
Distributed Financial Databases with Joran Dirk Greef of TigerBeetle [not-audio_url] [/not-audio_url]

Duration: 1:03:39
In this episode we spoke with Joran Dirk Greef, who's the co-founder at TigerBeetle. TigerBeetle is a Financial Transactions Database that's focused on correctness and safety while hitting orders of magnitude more perfor…
First Year as a Startup Founder and CEO with Nucleus's Evis Drenova [not-audio_url] [/not-audio_url]

Duration: 44:42
In this episode, we spoke with Evis Drenova, CEO and co-founder of Nucleus, a Y Combinator graduate from 2022 focused on making it easy to deploy, build, and manage on Kubernetes. Evis left Skyflow, where he was one of t…
Architecting Real-time Analytics with Dhruba Borthakur of Rockset [not-audio_url] [/not-audio_url]

Duration: 1:09:30
In this episode, we spoke with Dhruba Borthakur, Dhruba is the CTO and Co-founder at Rockset. Rockset is a search and analytics database hosted on the cloud. Dhruba was the founding engineer of the RocksDB project at Fac…