GPT Reviews
Nvidia's Q1 revenue up 262% to $26.0B, beating estimates.
OpenAI's News Corp deal licenses content from WSJ, New York Post and more.
PyramidInfer compresses KV cache to save memory during inference for Large Language Models.
Your Transformer is Secretly Linear challenges our existing understanding of transformer architectures.
Contact:Β Β sergi@earkind.com
Timestamps:
00:34 Introduction
01:55Β Nvidia's Q1 revenue up 262% to $26.0B, beating estimates
03:23Β OpenAIβs News Corp deal licenses content from WSJ, New York Post, and more
04:57Β Systematically Improving Your RAG
06:18 Fake sponsor
08:17Β PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
09:49Β Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
11:48Β Your Transformer is Secretly Linear
13:26 Outro