Memory and KV cache become a new battlefield! SK Hynix, SanDisk, Samsung compete for HBF layout

7:59am, 6 November 2025

Kim Jung-Ho, a professor at South Korea's KAIST and the father of HBM (High Bandwidth Memory), bluntly stated on a Youtube program that "the dominance in the AI era is shifting from GPU to memory!" In view of the increasing importance of memory, NVIDIA may acquire memory companies such as Micron or SanDisk. He also shared the importance of HBF (High Bandwidth Flash). It is expected that there will be new progress in January and February next year, and it will officially debut from 2027 to 2028.

Previously, American foreign investors also mentioned that as the supply of traditional hard disks (HDD) may experience bottlenecks from the end of 2026 to early 2027, the nearline storage (Nearline SSD) solution is expected to accelerate the introduction and become a more cost-effective alternative. At the same time, HBF is regarded as a key technology to solve the memory capacity bottleneck of AI clusters.

As mentioned in the previous article, memory capacity has become more important in the era of AI inference. How large manufacturers can effectively use memory has become a more critical focus. Among them, the KV cache (key-value cache) serves as the short-term memory of the AI model and also controls the response speed of the AI answer. Therefore, Kim Jung-Ho believes that HBF is expected to become an important memory technology in the next generation of AI era and will develop in parallel with HBM to jointly promote the performance growth of major chip manufacturers.

What is HBF? How are the Korean memory duo laid out?

The design concept of HBF is similar to HBM. Multi-layer chips are stacked and connected through silicon through holes (TSV). HBF uses NAND flash memory for stacking, which has the characteristics of "larger capacity and more cost advantage".

Kim Joung-ho pointed out that although NAND is not as fast as DRAM, its capacity is often more than 10 times higher. If it is constructed with hundreds or even thousands of layers of stacking, it can effectively meet the huge storage requirements of AI models and can become a NAND version of HBM. In the program, he even predicted that "the HBM era is coming to an end, and the HBF era is coming!"

Kim Joung-ho predicts that in the future, the memory architecture of AI will become a multi-layered design, like an entire smart library. The SRAM in the GPU is like a desktop notebook, the fastest, but with the smallest capacity; He also believes that in the future, GPUs will be equipped with both HBM and HBF to form a complementary architecture.

In the HBF progress section, SanDisk and SK Hynix are cooperating to promote HBF and promote the global standard of HBF, with the goal of supplying samples in 2026 and mass production in 2027; Samsung has also recently joined the battle and begun the conceptual design and initial development of HBF products.

Limited memory supply is in short supply, Huawei starts with software to free up space

Because it is difficult for China to obtain key resources such as HBM, Huawei has developed a new software tool "Unified Cache Manager" (UCM) to accelerate the training and inference of large language models (LLM) without using HBM. It is worth noting that this software is also an inference acceleration suite centered on "KV Cache", which mainly manages the KV cache memory data generated during the inference process in a hierarchical manner, expands the inference context window, achieves a high-throughput, low-latency inference experience, and reduces the cost of inference per Token.

Huawei introduced that this software will allocate AI data between HBM, standard DRAM and SSD based on the latency characteristics of different memory types and the latency requirements of various AI applications.

UCM is divided into three parts. The top layer is through the "Connector", which flexibly connects to the industry's diverse engines and diverse computing power, such as Huawei Ascend, NVIDIA, etc.; and through the middle layer "Accelerator", KV Dynamic multi-level cache management splits the algorithm into a method suitable for fast calculations, making the calculations more efficient; and finally, the "storage collaboration" (Adapter), an access interface card combined with professional shared storage, can read and write storage data in a more efficient way, reducing waiting time.

Enfabrica, a new chip startup currently supported by NVIDIA, also starts with software and uses innovative architecture to reduce memory costs. Enfabrica uses self-developed dedicated software to transmit data between AI chips and a large number of low-cost memories, thereby effectively controlling costs while ensuring data center performance.

In order to solve the memory bottleneck, NVIDIA does not rule out further expansion into the hardware field, and it is also possible to invest in memory companies. Especially currently, SanDisk is actively investing in HBF R&D and layout. The views of the father of HBM are not unreasonable.

Further reading: "AI dominance shifts to memory!" The father of HBM boldly speculates: NVIDIA may buy memory companies, SanDisk and Micron are on the list Break through the HBM capacity problem! Huawei UMC technology and NVIDIA invest in new ventures to find new solutions from "KV cache"

Last：A brief discussion of Apple M5: Recreating the pinnacle of personal computer computing
Next：None