Deep Papers
Deep Papers
Arize AI
Accurate KV Cache Quantization with Outlier Tokens Tracing
25 minutes Posted Jun 4, 2025 at 2:00 pm.
0:00
25:11
Download MP3
Show notes
Join us as we discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance. Paper: https://arxiv.org/abs/2505.10938 Slides: https://bit.ly/45wolpr Join us for Ar...