GPT Reviews
Nvidia's Q1 revenue up 262% to $26.0B, beating estimates.
OpenAI's News Corp deal licenses content from WSJ, New York Post and more.
PyramidInfer compresses KV cache to save memory during inference for Large Language Models.
Your Transformer is Secretly Linear challenges our existing understanding of transformer architectures.
Contact:ย ย sergi@earkind.com
Timestamps:
00:34 Introduction
01:55ย Nvidia's Q1 revenue up 262% to $26.0B, beating estimates
03:23ย OpenAIโs News Corp deal licenses content from WSJ, New York Post, and more
04:57ย Systematically Improving Your RAG
06:18 Fake sponsor
08:17ย PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
09:49ย Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
11:48ย Your Transformer is Secretly Linear
13:26 Outro