PLAY PODCASTS
Google's TurboQuant will ease bottlenecks, not cut memory demand: Analysts
Episode 37

Google's TurboQuant will ease bottlenecks, not cut memory demand: Analysts

This article is by Lee Jae-lim and read by an artificial voice. [NEWS ANALYSIS] TurboQuant, Google's latest AI efficiency breakthrough, has rattled memory semiconductor markets — dragging down shares of Samsung Electronics and SK hynix and Micron — am...

Korea JoongAng Daily - Daily News from Korea · LEE JAE-LIM

April 1, 20267m 19s

Show Notes

This article is by Lee Jae-lim and read by an artificial voice.

[NEWS ANALYSIS]
TurboQuant, Google's latest AI efficiency breakthrough, has rattled memory semiconductor markets — dragging down shares of Samsung Electronics and SK hynix and Micron — amid concerns that its compression technology could dampen memory demand.
Those concerns have intensified on the belief that easing memory bottlenecks in data processing could reduce the need for additional capacity.
Samsung Electronics slipped 4.7 percent and SK hynix shares fell 6.2 percent on March 26 compared to the day before, following Google Research's dissertations about the breakthrough posted on its blog. The shares spiraled after the announcement, but rebounded sharply on Wednesday amid signs of a potential end to the Iran war. The shares of U.S. memory suppliers such as Micron and SanDisk also plummeted 6.9 and 11 percent, respectively, during the same period.
Analysts and academics, however, say the reaction is overblown, arguing that the technology should be better understood as a more efficient way to process data rather than a factor that would significantly curb long-term memory demand or the ongoing supply shortage.
TurboQuant compresses an AI model's short-term memory, known as the Key-Value (KV) cache, reducing the amount of data that must be stored and transferred. The technology cuts KV cache usage to one-sixth while maintaining near-original accuracy, according to Google, resulting in up to an eightfold boost in inference speed on Nvidia H100 GPUs. This allows AI systems to run faster, handle longer inputs and serve more users simultaneously without needing more hardware.
The KV cache has long been a major bottleneck in AI inference, contributing to memory latency and rising compute costs as models process larger volumes of information with longer interactions with users as the technology advances. Since models must retain prior interactions to generate contextually relevant responses, memory demands grow with longer conversations.

Will TurboQuant reduce memory demand?
The market consensus maintains that the memory upcycle will persist, supported by long-term supply agreements — often three years or longer — with major tech companies such as Google and Microsoft, which are already being finalized. Such commitments would be unlikely if a near-term price decline were expected.
However, some investors point to the possibility that a scale-back in price hikes could dampen the appeal of memory stocks. Even so, with supply still tight and higher memory prices constraining consumer electronics production, prices are likely to remain elevated. Moreover, some argue that relieving key bottlenecks in AI infrastructure will drive memory demand higher, as improved efficiency allows for a broader range of applications, from agents to more advanced AI models, to be scaled up.

"By reducing memory usage during inference, TurboQuant lowers the cost of running AI models, which in turn reduces the overall cost of AI services," said KB Securities analyst Kim Il-hyuk. "At a time when AI demand is outpacing the construction of new data centers, this kind of software-level innovation could significantly boost infrastructure efficiency. For hyperscalers, it effectively allows existing data centers to process more workloads, delivering benefits comparable to building entirely new facilities."
Experts say memory demand will continue to rise with AI, driven by KV cache advancements. Kim Jung-ho, a professor of electrical engineering at KAIST, said these technologies may slow growth, but won't reduce overall demand.
"Memory demand in AI will keep rising," the professor said. "Technologies like this may moderate the pace, but they won't change the direction. KV cache usage is structurally tied to AI evolution. As models handle longer contexts — whether in physical AI or agent-based systems — memory requirements will inevitably scale with them."
Academics also point out that the KV ca...