Open LLM Upgrades 🆕 // Gemma 2 Performance 💎 // SeaKR's Self-aware Learning 🧠

Author: Earkind June 28, 2024 Duration: 13:48

News Daily

HuggingFace has upgraded the Open LLM Leaderboard to v2, adding new benchmarks and improving the evaluation suite for easier reproducibility.

Gemma 2, a new addition to the Gemma family of lightweight open models, delivers the best performance for its size and offers competitive alternatives to models that are 2-3× bigger.

SeaKR is a new model that re-ranks retrieved knowledge based on the LLM's self-aware uncertainty, outperforming existing adaptive RAG methods in generating text with relevant and accurate information.

Step-DPO is a new method that enhances the robustness and factuality of LLMs by learning from human feedback, achieving impressive results in long-chain mathematical reasoning.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:21 HuggingFace Updates Open LLM Leaderboard

03:19 Gemma 2: Improving Open Language Models at a Practical Size

04:16 From bare metal to a 70B model: infrastructure set-up and scripts

05:21 Fake sponsor

07:11 SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

08:47 Simulating Classroom Education with LLM-Empowered Agents

10:16 Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

12:31 Outro

GPT Reviews

Each morning, GPT Reviews serves up a fresh, slightly chaotic conversation about everything happening in artificial intelligence. This daily podcast from Earkind is actually crafted by AI, offering a unique blend of the latest headlines, major announcements, and intriguing research plucked from sources like arXiv. But it’s far from a dry briefing. The dynamic comes from its four distinct hosts: Giovani Pete Tizzano brings relentless optimism as an AI enthusiast, while Robert, the analyst, provides a grounded and often skeptical counterpoint. Olivia, who’s deeply embedded in online communities, shares the buzz and broader reactions, and Belinda, the witty research expert, helps unpack the technical details with clarity and a sharp sense of humor. Tuning in feels like dropping into a lively roundtable where complex ideas are debated, explained, and occasionally laughed about. You’ll get a comprehensive yet digestible overview of the AI landscape, all wrapped in a format that’s as entertaining as it is informative. The result is a consistently engaging listen that keeps you updated without feeling like homework, making it a standout in the daily news podcast space.

Author: Earkind Language: English Episodes: 100

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

Amazon AI Detects Damaged Goods 📦 // Musk Prioritizes xAI 🚘 // Uncertainty in LLMs 🤔

05.06.2024

Duration: 14:14

Amazon's new AI system to detect damaged or incorrect items before they ship. Elon Musk's controversial decision to prioritize X and xAI over Tesla for AI chips. "To Believe or Not to Believe Your LLM" paper on uncertain…

[not-audio_url]

[/not-audio_url]

Microsoft's Latest $3.2B AI Investment 🇸🇪 // Grokfast Algorithm 💪 // Zipper Decoder Architecture 🎧

04.06.2024

Duration: 14:53

Microsoft is investing $3.2 billion in Sweden for cloud and AI infrastructure, deploying 20,000 advanced graphics processing units and training 250,000 Swedes with AI skills over three years. "Grokfast" is a new algorith…

[not-audio_url]

[/not-audio_url]

Nvidia's AI Factories 🏭 // AI Gadget for Recycling 🌍 // Intellectual Obesity Crisis 📚

03.06.2024

Duration: 14:50

Nvidia unveils plans to accelerate the advance of artificial intelligence, partnering with companies and countries to build AI factories and releasing Nvidia ACE generative AI. Finnish startup Binit develops an AI gadget…

[not-audio_url]

[/not-audio_url]

Google's Apology 🤖 // Nvidia's Top-Ranked Embedding Model 🥇 // Matryoshka Query Transformer 🌟

31.05.2024

Duration: 14:50

Google's AI Overviews are improving to provide accurate and helpful information. Nvidia's new embedding model, NV-Embed-v1, ranks number one on the Massive Text Embedding Benchmark. Matryoshka Query Transformer (MQT) off…

[not-audio_url]

[/not-audio_url]

OpenAI Partnerships 🤝 // Codestral Model for Coding 🤖 // Transparent Language Models 🔍

30.05.2024

Duration: 14:38

OpenAI announces new content and product partnerships with Vox Media and The Atlantic, making their reporting and stories more discoverable to millions of OpenAI users. Mistral AI releases Codestral, a 22B parameter, ope…

[not-audio_url]

[/not-audio_url]

OpenAI's starts training GPT-5 🤖 // Jan Leike joins Anthropic's Superalignment Team 👥 // MoEUT Outperforms Standard Transformers 💥

29.05.2024

Duration: 15:02

OpenAI has formed a new safety team to address concerns about AI safety and ethics, led by CEO Sam Altman and board members Adam D’Angelo and Nicole Seligman. Jan Leike, a leading AI researcher, has left OpenAI and joine…

[not-audio_url]

[/not-audio_url]

xAI Raises $6B 🚀 // Google's AI Overviews Controversy 🤔 // Transformers Master Arithmetic 🧮

28.05.2024

Duration: 12:57

xAI, founded by Elon Musk, raises $6 billion in funding to accelerate the research and development of future technologies in the AI race. Google's new 'AI Overviews' search feature causes uproar with bizarre and inaccura…

[not-audio_url]

[/not-audio_url]

OpenAI Drama 💥 // Synthetic Data Theorem Proving 🧪 // Dense Vision-Language Connector 🤝

27.05.2024

Duration: 13:48

OpenAI drama: Leaked documents and a resignation from a policy researcher. DeepSeek-Prover: A new approach to formal theorem proving using synthetic data. Dense Connector for MLLMs: A plug-and-play vision-language connec…

[not-audio_url]

[/not-audio_url]

Cohere's Open-source Aya 🌎 // Anthropic Interpretability 🧠 // Video Editing AI 🎥

24.05.2024

Duration: 13:50

Cohere's Aya model and dataset for multilingual AI in 101 languages through open science. "Mapping the Mind of a Large Language Model" paper by Anthropic Blog, providing a detailed look inside a modern, production-grade…

[not-audio_url]

[/not-audio_url]

Nvidia's Record Revenue 📈 // OpenAI's News Corp Deal 📰 // Your Transformer is Secretly Linear 🔍

23.05.2024

Duration: 15:05

Nvidia's Q1 revenue up 262% to $26.0B, beating estimates. OpenAI's News Corp deal licenses content from WSJ, New York Post and more. PyramidInfer compresses KV cache to save memory during inference for Large Language Mod…