GPT Reviews
Introducing Devin, the first AI software engineer that can plan and execute complex engineering tasks requiring thousands of decisions.
Google's AI chatbot won't answer questions about upcoming elections to prevent inaccurate or misleading responses.
WorkArena, a benchmark measuring the ability of large language model-based agents to perform tasks that align with the daily work of knowledge workers using enterprise software systems.
Synth$^2$, a novel approach that leverages Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective Visual-Language Model (VLM) training.
Contact:ย ย sergi@earkind.com
Timestamps:
00:34 Introduction
01:59ย Introducing Devin, the first AI software engineer
05:35ย AI Datacenter Energy Dilemma - Race for AI Datacenter Space
06:45 Fake sponsor
09:08ย Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
10:26ย WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
11:52ย Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
13:38 Outro