Deep Papers
Deep Papers
Arize AI
Sleep-time Compute: Beyond Inference Scaling at Test-time
30 minutes Posted May 2, 2025 at 9:00 pm.
0:00
30:24
Download MP3
Show notes
What if your LLM could think ahead—preparing answers before questions are even asked? In this week's paper read, we dive into a groundbreaking new paper from researchers at Letta, introducing sleep-time compute: a novel technique that lets models do their heavy lifting offline, well before the user query arrives. By predicting likely questions and precomputing key reasoning steps, sleep-time compute dramatically reduces test-time latency and cost—without sacrificing performance. ​We explore n...