Multi-Task Language Understanding 📈 // Composable Interventions 🤝 // ARMT Sets Performance Record 💪

Author: Earkind July 10, 2024 Duration: 14:40

News Daily

The MNLU-Pro dataset is a more robust and challenging massive multi-task language understanding dataset that's tailored to more rigorously benchmark large language models' capabilities.

The Composable Interventions framework allows researchers to study the effects of using multiple interventions on a language model, and the order in which interventions are applied can have a significant impact on their effectiveness.

The MJ-Bench benchmark evaluates the effectiveness of different types of multimodal judges in providing feedback for text-to-image generation models, and the experiments reveal that close-source VLMs generally provide better feedback.

The Associative Recurrent Memory Transformer (ARMT) is an approach that combines transformer self-attention for local context with segment-level recurrence for storage of task-specific information distributed over a long context, and it sets a new performance record in the recent BABILong multi-task long-context benchmark.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:32 MNLU-Pro Release on HuggingFace Datasets

03:48 Extrinsic Hallucinations in LLMs

04:53 RouteLLM

06:13 Fake sponsor

08:14 Composable Interventions for Language Models

09:45 MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

11:31 Associative Recurrent Memory Transformer

13:30 Outro

GPT Reviews

Each morning, GPT Reviews serves up a fresh, slightly chaotic conversation about everything happening in artificial intelligence. This daily podcast from Earkind is actually crafted by AI, offering a unique blend of the latest headlines, major announcements, and intriguing research plucked from sources like arXiv. But it’s far from a dry briefing. The dynamic comes from its four distinct hosts: Giovani Pete Tizzano brings relentless optimism as an AI enthusiast, while Robert, the analyst, provides a grounded and often skeptical counterpoint. Olivia, who’s deeply embedded in online communities, shares the buzz and broader reactions, and Belinda, the witty research expert, helps unpack the technical details with clarity and a sharp sense of humor. Tuning in feels like dropping into a lively roundtable where complex ideas are debated, explained, and occasionally laughed about. You’ll get a comprehensive yet digestible overview of the AI landscape, all wrapped in a format that’s as entertaining as it is informative. The result is a consistently engaging listen that keeps you updated without feeling like homework, making it a standout in the daily news podcast space.

Author: Earkind Language: English Episodes: 100

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

Amazon AI Detects Damaged Goods 📦 // Musk Prioritizes xAI 🚘 // Uncertainty in LLMs 🤔

05.06.2024

Duration: 14:14

Amazon's new AI system to detect damaged or incorrect items before they ship. Elon Musk's controversial decision to prioritize X and xAI over Tesla for AI chips. "To Believe or Not to Believe Your LLM" paper on uncertain…

[not-audio_url]

[/not-audio_url]

Microsoft's Latest $3.2B AI Investment 🇸🇪 // Grokfast Algorithm 💪 // Zipper Decoder Architecture 🎧

04.06.2024

Duration: 14:53

Microsoft is investing $3.2 billion in Sweden for cloud and AI infrastructure, deploying 20,000 advanced graphics processing units and training 250,000 Swedes with AI skills over three years. "Grokfast" is a new algorith…

[not-audio_url]

[/not-audio_url]

Nvidia's AI Factories 🏭 // AI Gadget for Recycling 🌍 // Intellectual Obesity Crisis 📚

03.06.2024

Duration: 14:50

Nvidia unveils plans to accelerate the advance of artificial intelligence, partnering with companies and countries to build AI factories and releasing Nvidia ACE generative AI. Finnish startup Binit develops an AI gadget…

[not-audio_url]

[/not-audio_url]

Google's Apology 🤖 // Nvidia's Top-Ranked Embedding Model 🥇 // Matryoshka Query Transformer 🌟

31.05.2024

Duration: 14:50

Google's AI Overviews are improving to provide accurate and helpful information. Nvidia's new embedding model, NV-Embed-v1, ranks number one on the Massive Text Embedding Benchmark. Matryoshka Query Transformer (MQT) off…

[not-audio_url]

[/not-audio_url]

OpenAI Partnerships 🤝 // Codestral Model for Coding 🤖 // Transparent Language Models 🔍

30.05.2024

Duration: 14:38

OpenAI announces new content and product partnerships with Vox Media and The Atlantic, making their reporting and stories more discoverable to millions of OpenAI users. Mistral AI releases Codestral, a 22B parameter, ope…

[not-audio_url]

[/not-audio_url]

OpenAI's starts training GPT-5 🤖 // Jan Leike joins Anthropic's Superalignment Team 👥 // MoEUT Outperforms Standard Transformers 💥

29.05.2024

Duration: 15:02

OpenAI has formed a new safety team to address concerns about AI safety and ethics, led by CEO Sam Altman and board members Adam D’Angelo and Nicole Seligman. Jan Leike, a leading AI researcher, has left OpenAI and joine…

[not-audio_url]

[/not-audio_url]

xAI Raises $6B 🚀 // Google's AI Overviews Controversy 🤔 // Transformers Master Arithmetic 🧮

28.05.2024

Duration: 12:57

xAI, founded by Elon Musk, raises $6 billion in funding to accelerate the research and development of future technologies in the AI race. Google's new 'AI Overviews' search feature causes uproar with bizarre and inaccura…

[not-audio_url]

[/not-audio_url]

OpenAI Drama 💥 // Synthetic Data Theorem Proving 🧪 // Dense Vision-Language Connector 🤝

27.05.2024

Duration: 13:48

OpenAI drama: Leaked documents and a resignation from a policy researcher. DeepSeek-Prover: A new approach to formal theorem proving using synthetic data. Dense Connector for MLLMs: A plug-and-play vision-language connec…

[not-audio_url]

[/not-audio_url]

Cohere's Open-source Aya 🌎 // Anthropic Interpretability 🧠 // Video Editing AI 🎥

24.05.2024

Duration: 13:50

Cohere's Aya model and dataset for multilingual AI in 101 languages through open science. "Mapping the Mind of a Large Language Model" paper by Anthropic Blog, providing a detailed look inside a modern, production-grade…

[not-audio_url]

[/not-audio_url]

Nvidia's Record Revenue 📈 // OpenAI's News Corp Deal 📰 // Your Transformer is Secretly Linear 🔍

23.05.2024

Duration: 15:05

Nvidia's Q1 revenue up 262% to $26.0B, beating estimates. OpenAI's News Corp deal licenses content from WSJ, New York Post and more. PyramidInfer compresses KV cache to save memory during inference for Large Language Mod…