AI Agents: Substance or Snake Oil with Arvind Narayanan - #704

Author: Sam Charrington October 7, 2024 Duration: 54:22

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil. In “AI Agents That Matter”, we explore the range of agentic behaviors, the challenges in benchmarking agents, and the ‘capability and reliability gap’, which creates risks when deploying AI agents in real-world applications. We also discuss the importance of verifiers as a technique for safeguarding agent behavior. We then dig into the AI Snake Oil book, which uncovers examples of problematic and overhyped claims in AI. Arvind shares various use cases of failed applications of AI, outlines a taxonomy of AI risks, and shares his insights on AI’s catastrophic risks. Additionally, we also touched on different approaches to LLM-based reasoning, his views on tech policy and regulation, and his work on CORE-Bench, a benchmark designed to measure AI agents' accuracy in computational reproducibility tasks. The complete show notes for this episode can be found at https://twimlai.com/go/704.

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Hosted by industry analyst and commentator Sam Charrington, The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) serves as a vital conduit between cutting-edge research and its real-world implications. This isn't just a series of technical lectures; it's a series of conversations that unpack how AI and machine learning are actively reshaping industries and societal structures. Each episode connects you directly with leading researchers, engineers, and innovative thinkers who are defining the frontiers of the field. The discussions go beyond abstract theory to explore the practical challenges, ethical considerations, and business transformations driven by these technologies. Whether you're a data scientist deep in the code, a tech-savvy leader strategizing implementation, or simply fascinated by the future of intelligent systems, this podcast provides the context and depth needed to stay informed. By focusing on the people behind the algorithms and the ideas powering the platforms, Sam creates a resource that is both intellectually substantive and genuinely engaging, building a thoughtful community around one of the most significant technological shifts of our time.

Author: Sam Charrington Language: English Episodes: 100

Official website RSS

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Podcast Episodes

[not-audio_url]

[/not-audio_url]

Building Real-World LLM Products with Fine-Tuning and More with Hamel Husain - #694

24.07.2024

Duration: 1:20:05

Today, we're joined by Hamel Husain, founder of Parlance Labs, to discuss the ins and outs of building real-world products using large language models (LLMs). We kick things off discussing novel applications of LLMs and…

[not-audio_url]

[/not-audio_url]

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

17.07.2024

Duration: 57:54

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in gene…

[not-audio_url]

[/not-audio_url]

Decoding Animal Behavior to Train Robots with EgoPet with Amir Bar - #692

09.07.2024

Duration: 43:16

Today, we're joined by Amir Bar, a PhD candidate at Tel Aviv University and UC Berkeley to discuss his research on visual-based learning, including his recent paper, “EgoPet: Egomotion and Interaction Data from an Animal…

[not-audio_url]

[/not-audio_url]

How Microsoft Scales Testing and Safety for Generative AI with Sarah Bird - #691

01.07.2024

Duration: 57:12

Today, we're joined by Sarah Bird, chief product officer of responsible AI at Microsoft. We discuss the testing and evaluation techniques Microsoft applies to ensure safe deployment and use of generative AI, large langua…

[not-audio_url]

[/not-audio_url]

Long Context Language Models and their Biological Applications with Eric Nguyen - #690

25.06.2024

Duration: 45:41

Today, we're joined by Eric Nguyen, PhD student at Stanford University. In our conversation, we explore his research on long context foundation models and their application to biology particularly Hyena, and its evolutio…

[not-audio_url]

[/not-audio_url]

Accelerating Sustainability with AI with Andres Ravinet - #689

18.06.2024

Duration: 47:46

Today, we're joined by Andres Ravinet, sustainability global black belt at Microsoft, to discuss the role of AI in sustainability. We explore real-world use cases where AI-driven solutions are leveraged to help tackle en…

[not-audio_url]

[/not-audio_url]

Gen AI at the Edge: Qualcomm AI Research at CVPR 2024 with Fatih Porikli - #688

11.06.2024

Duration: 1:10:41

Today we’re joined by Fatih Porikli, senior director of technology at Qualcomm AI Research. In our conversation, we covered several of the Qualcomm team’s 16 accepted main track and workshop papers at this year’s CVPR co…

[not-audio_url]

[/not-audio_url]

Energy Star Ratings for AI Models with Sasha Luccioni - #687

04.06.2024

Duration: 48:26

Today, we're joined by Sasha Luccioni, AI and Climate lead at Hugging Face, to discuss the environmental impact of AI models. We dig into her recent research into the relative energy consumption of general purpose pre-tr…

[not-audio_url]

[/not-audio_url]

Language Understanding and LLMs with Christopher Manning - #686

27.05.2024

Duration: 56:10

Today, we're joined by Christopher Manning, the Thomas M. Siebel professor in Machine Learning at Stanford University and a recent recipient of the 2024 IEEE John von Neumann medal. In our conversation with Chris, we dis…

[not-audio_url]

[/not-audio_url]

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - #685

20.05.2024

Duration: 43:05

Today we're joined by Abdul Fatir Ansari, a machine learning scientist at AWS AI Labs in Berlin, to discuss his paper, "Chronos: Learning the Language of Time Series." Fatir explains the challenges of leveraging pre-trai…