SQL Meets Vector Search with Linpeng Tang of MyScale

SQL Meets Vector Search with Linpeng Tang of MyScale

Author: Software Huddle April 2, 2024 Duration: 1:01:38
Welcome back to an episode where we're talking Vectors, Vector Databases, and AI with Linpeng Tang, CTO and co-founder of MyScale. MyScale is a super interesting technology. They're combining the best of OLAP databases with Vector Search. The project started back in 2019 where they forked ClickHouse and then adapted it to support Vector Storage, Indexing, and Search. The really unique and cool thing is you get the familiarity and usability of SQL with the power of being able to compare the similarity between unstructured data. We think this has really fascinating use cases for analytics well beyond what we're seeing with other vector database technology that's mostly restricted to building RAG models for LLMs. Also, because it's built on ClickHouse, MyScale is massively scalable, which is an area that many of the dedicated vector databases actually struggle with. We cover a lot about how vector databases work, why they decided to build off of ClickHouse, and how they plan to open source the database. Timestamps 02:29 Introduction 06:22 Value of a Vector Database 12:40 Forking ClickHouse 18:53 Transforming Clickhouse into a SQL vector database 32:08 Data modeling 32:56 What data can be Vectorized 38:37 Indexing 43:35 Achieving Scale 46:35 Bottlenecks 48:41 MyScale vs other dedicated Vector Databases 51:38 Going Open Source 56:04 Closing thoughts

Every week on Software Huddle, Alex DeBrie and Sean Falconer sit down with a different expert from across the tech landscape. The conversations are less about quick tips and more about substantive discussions, digging into the real challenges and decisions behind building software, launching products, and navigating the industry's constant shifts. You'll hear from practitioners who have been in the trenches, offering perspectives that blend deep technical knowledge with hard-won business and entrepreneurial experience. Alex brings his specialized expertise as the author of The DynamoDB Book and an AWS Data Hero, while Sean contributes a unique viewpoint shaped by over two decades as an engineer, founder, and marketing executive, recognized as a Snowflake Data Superhero. Together, they create a space where complex topics in software development and technology trends become accessible and genuinely engaging. This podcast is for anyone who wants to move beyond surface-level news and understand the "why" behind the tools and strategies shaping our digital world. Tune in for a thoughtful huddle that feels more like a candid conversation between colleagues than a formal interview.
Author: Language: en-us Episodes: 79

Software Huddle
Podcast Episodes
Deep Dive into Inference Optimization for LLMs with Philip Kiely [not-audio_url] [/not-audio_url]

Duration: 1:04:05
Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads. We go deep on Inference Optimization. We cover choosing a model, discuss the hype a…
Java and Building AI Applications with Kevin Dubois [not-audio_url] [/not-audio_url]

Duration: 56:58
Today on the show, we have Kevin Dubois. Kevin is a Senior Principal Developer Advocate at Red Hat, Java Champion, and well known open source contributor. In our conversation with Kevin, we talk about his history with Ja…
SQLite, Turso, and the State of Databases with Glauber Costa [not-audio_url] [/not-audio_url]

Duration: 1:12:07
Today we have Glauber Costa on the show, who's the CEO and founder at Turso. They provide a managed SQLite service with some really interesting capabilities that's changing some of the application patterns you can do. He…
Blocking Bots & Moving from Redis to SQLite with Mike Buckbee [not-audio_url] [/not-audio_url]

Duration: 53:00
Today, we have Mike Buckbee on the show. Mike is the co-founder of Wafris, and he wrote a really insightful article last week about moving from Redis to SQLite for an aspect of their architecture. The article was nuanced…
AI Engineer, Web Frameworks, & more with Tejas Kumar [not-audio_url] [/not-audio_url]

Duration: 1:21:58
Today we have Tejas Kumar on the show. Tejas is part of the Developer Relations team at Datastax. He's really good at frontend, got a great podcast and he has written a book called Fluent React. He spoke recently at the…
The Data Engineering Landscape with Peter Hanssens [not-audio_url] [/not-audio_url]

Duration: 54:52
Today on the show, we have Peter Hanssens, the CEO and founder of Cloud Shuttle and creator of the DataEngBytes Conference. Peter has helped build an incredible data engineering community in Australia. He runs meetups, u…
Infrastructure, AWS, AI and Jobs, HTMX & more [not-audio_url] [/not-audio_url]

Duration: 1:34:19
Today we have a special guest. We have Jeremy Daly, who’s been in the cloud space for a while. Jeremy is the co-founder of Ampt, which is building an abstraction infrastructure layer on top of AWS, just to make it simple…
Introduction to GraphRAG with Stephen Chin [not-audio_url] [/not-audio_url]

Duration: 1:03:10
Today we have Stephen Chin, VP of developer relations at Neo4j on the show. Stephen is an author, speaker, and Java expert, we’ll actually be crossing paths in person at the upcoming Infobip Shift conference in September…
Infrastructure as Code with Dax Raad [not-audio_url] [/not-audio_url]

Duration: 1:17:47
Today, we have Dax Raad on the show. Dax is a must-follow on tech Twitter, known for his blend of humor and insightful tech opinions. We talked a lot about SST, which is the infrastructure as code tool that he works on.…