High Bandwidth Memory (HBM) is a specialized type of DRAM that stacks multiple memory chips vertically to deliver dramatically higher data throughput than conventional memory. It has become the critical ingredient in modern AI accelerators — without it, the GPUs and chips that train and run large language models would spend most of their time waiting for data instead of computing.
How HBM Works Differently from Regular Memory
Conventional DRAM — the kind used in laptops and servers — moves data through a relatively narrow interface: 64 bits wide per module. HBM takes a different approach. It stacks up to 16 DRAM dies on top of each other, connected by thousands of tiny vertical channels called through-silicon vias (TSVs), and uses an ultra-wide 1,024-bit interface to move data in parallel.
The result: HBM3E, the generation shipping in today’s flagship AI chips, delivers over 1.2 terabytes per second of bandwidth per memory stack. A typical DDR5 module manages roughly 50–100 GB/s. HBM is roughly 10–20 times faster per unit.
The stacked dies sit directly beside the processor on the same package — an arrangement called 2.5D packaging — shortening the data path and cutting power consumption compared to memory banks mounted far away on a circuit board.
Why AI Chips Are Starved for Memory Bandwidth
Training and running large language models involves moving enormous amounts of data: model weights, activations, and during inference, a structure called a key-value (KV) cache that grows with the length of a conversation.
The problem is that GPU compute cores can crunch numbers far faster than data can arrive from memory. A GPU that could theoretically perform quadrillions of operations per second often runs at 30–50% utilization in practice — not because the arithmetic is slow, but because the cores sit idle waiting for the next batch of data.
This is the memory bandwidth wall. During the token-by-token decode phase of LLM inference, the entire set of model weights must be read from memory for each generated token. For a 70-billion-parameter model, that is roughly 140 GB of data per token — a bottleneck that limits how quickly responses can flow, regardless of how fast the chip computes.
HBM directly attacks this bottleneck. By placing terabytes-per-second of bandwidth beside the chip, it keeps compute cores fed, enabling higher token throughput, longer context windows, and more efficient training runs.
HBM Generations: from HBM1 to HBM4
The standard has evolved rapidly since Samsung, AMD, and SK Hynix developed the first HBM specification in 2013:
| Generation | Bandwidth per stack | Notable use |
|---|---|---|
| HBM1 (2013) | ~128 GB/s | AMD Fiji GPU |
| HBM2 (2016) | ~256 GB/s | NVIDIA V100 |
| HBM2E (2017) | ~384 GB/s | NVIDIA A100 |
| HBM3 (2022) | ~819 GB/s | NVIDIA H100 |
| HBM3E (2023) | ~1,229 GB/s | NVIDIA H200, B200; AMD MI300X |
| HBM4 (2025) | ~2,048 GB/s | Next-generation AI platforms |
HBM4 doubles the interface width to 2,048 bits. SK Hynix completed HBM4 development in 2025, and it is expected to appear in next-generation AI accelerators from 2026 onward.
Who Makes HBM — and Which AI Chips Use It
Only three companies currently manufacture HBM at scale: SK Hynix, Samsung, and Micron. SK Hynix holds roughly half the global market and was first to mass-produce HBM3E; it has secured a large share of orders for HBM4 as well.
Every major AI training accelerator depends on HBM:
- NVIDIA H100: 80 GB of HBM3, 3.35 TB/s bandwidth
- NVIDIA H200: 141 GB of HBM3E, 4.8 TB/s bandwidth
- NVIDIA B200: 180 GB of HBM3E, 7.7 TB/s bandwidth
- AMD MI300X: 192 GB of HBM3, 5.3 TB/s bandwidth
Because only a handful of factories worldwide can produce HBM, and demand from AI data centers has outpaced supply, HBM has become a strategic resource — influencing chip availability, GPU pricing, and competition over semiconductor supply chains.
In the News
SK Hynix, the world’s largest HBM producer, announced plans for a $29 billion Nasdaq listing in June 2026 to fund expanded production — a sign of how central HBM has become to the AI hardware industry.
FAQ
Can AI models run without HBM?
Yes — smaller models run on devices with conventional LPDDR or GDDR memory, and many consumer AI applications do exactly that. But large-scale training and inference for frontier models (with hundreds of billions of parameters) requires the bandwidth that only HBM provides.
Why don’t all computers use HBM?
HBM is expensive to manufacture and requires a specialized packaging process that places memory and processor on the same substrate. For general-purpose computing, the cost-per-gigabyte advantage of DDR memory outweighs the bandwidth benefit.
What is HBM4 and when will it ship?
HBM4 doubles the interface width to 2,048 bits and offers around 2 TB/s per stack — nearly double HBM3E. SK Hynix completed development in 2025; it is expected in next-generation AI accelerators through 2026.
How much does HBM cost?
Pricing is negotiated in bulk contracts and fluctuates with demand. As of mid-2026, HBM3E stacks are estimated at roughly $8–10 per gigabyte — making memory one of the largest cost components in a high-end AI accelerator. See SK Hynix’s product page for current product details.
Sources: High Bandwidth Memory — Wikipedia; SK Hynix product documentation; NVIDIA H100/H200/B200 datasheets; AMD MI300X specifications.