PLAY PODCASTS
The K-V Cache: Solving AI’s Invisible Memory Tax
Season 2 · Episode 1081

The K-V Cache: Solving AI’s Invisible Memory Tax

Why does your AI get slower as you chat? Discover the K-V cache, the invisible bottleneck of generative AI, and how we're fixing it in 2026.

My Weird Prompts · Daniel Rosehill

March 10, 202623m 50s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

Ever wonder why long AI conversations suddenly crawl or crash your GPU? Join the discussion as we dive into the "invisible tax" of the generative era: the K-V cache. We explore the cutting-edge architectural breakthroughs, from PagedAttention to Flash KV, that are keeping 2026’s million-token models running smoothly. Learn how the industry is winning the memory wars to make high-speed, local agentic AI a reality for everyone.