#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

Author: Skynet Today March 5, 2025 Duration: 58:37
Our 201st episode with a summary and discussion of last week's big AI news! Recorded on 03/02/2025 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and guest host Sharon Zhou Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: - The release of GPT-4.5 from OpenAI, Anthropic's Claude 3.7, and Grok 3 from XAI, comparing their features, costs, and capabilities.  - Discussion on new tools and applications including Sesame's new voice assistant and Google's AI coding assistant, Gemini Code Assist, highlighting their unique benefits.  - OpenAI's continued user growth despite competition, pricing models for Google's text-to-video platform, and HP acquiring and shutting down Humane's AI pin.  - Insights into new research on alignment and specification gaming in LLMs, including papers on fine-tuning causing broad misalignment and Google's multi-agent system for scientific collaboration. Timestamps + Links: (00:00:00) Intro / Banter  (00:01:36) News Preview Tools & Apps (00:02:33) OpenAI announces GPT-4.5, warns it’s not a frontier AI model (00:07:22) Anthropic launches a new AI model that ‘thinks’ as long as you want (00:11:14) New Grok 3 release tops LLM leaderboards (00:16:43) Sesame is the first voice assistant I’ve ever wanted to talk to more than once (00:18:30) Google launches a free AI coding assistant with very high usage caps (00:20:45) Rabbit shows off the AI agent it should have launched with (00:22:23) Mistral’s Le Chat tops 1M downloads in just 14 days Applications & Business (00:24:06) OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence (00:27:37) Google’s new AI video model Veo 2 will cost 50 cents per second (00:29:52) HP is buying Humane and shutting down the AI Pin Projects & Open Source (00:31:44) Microsoft launches next-gen Phi AI models. (00:33:47) OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work (00:37:12) SWE-Bench+: Enhanced Coding Benchmark for LLMs Research & Advancements (00:40:00) Towards an AI co-scientist (00:42:52) Magma: A Foundation Model for Multimodal AI Agents Policy & Safety (00:47:32) Demonstrating specification gaming in reasoning models (00:51:03) Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Keeping up with artificial intelligence can feel like drinking from a firehose. Every week brings a new breakthrough, a surprising application, or an urgent ethical debate. Last Week in AI, from the team at Skynet Today, is here to turn that torrent into a clear, digestible stream. Instead of getting lost in the noise, you'll get a thoughtful rundown of the developments that actually have impact, explained without unnecessary jargon. Each episode feels like a conversation with well-informed friends who have done the homework for you, sifting through research papers, product launches, and industry announcements to highlight what's substantive. You'll hear nuanced discussions that go beyond the headlines, considering the real-world implications of new models, policy shifts, and corporate moves in the tech landscape. This podcast doesn't just tell you what happened; it provides context on why it matters for developers, businesses, and society at large. It’s an efficient way to stay informed and critically engaged with a field that is reshaping our world at a breathtaking pace. Tune in for a consistently insightful analysis that makes the complex world of AI feel accessible and relevant, week after week.
Author: Language: English Episodes: 100

Last Week in AI
Podcast Episodes