Scale-up fabrics
Interconnect Catalog
Comparison of node-local and rack-scale interconnects used to connect CPUs, GPUs, accelerators, memory expanders, and switches.
How to read this table
Bandwidth can mean unidirectional, bidirectional, per-link, per-device, or aggregate fabric bandwidth. This table keeps the wording explicit.
Topology and software stack matter as much as link rate for collective communication and model-parallel training.
| Metric | NVIDIA NVLink 5 / NVSwitch | AMD Infinity Fabric | PCIe Gen5 | CXL 3.x | UALink |
|---|---|---|---|---|---|
| Scope | GPU-to-GPU and rack-scale accelerator fabric | GPU package, CPU socket, and accelerator baseboard fabric | Host I/O bus | Coherent host-device and memory expansion fabric | Open accelerator scale-up fabric |
| Topology | Direct GPU links plus switched NVLink domains | Product-specific mesh, die-to-die, and board-level links | Root complex to endpoints and switches | PCIe physical layer with switching and fabric capabilities in newer generations | Accelerator-to-accelerator scale-up network |
| Coherence | GPU memory fabric semantics; coherent CPU-GPU via NVLink-C2C in Grace Hopper class systems | Coherent within AMD CPU complexes; accelerator semantics vary by platform | Non-coherent by default; CXL layers add coherent protocols where supported | CXL.cache and CXL.mem provide coherent semantics | Intended for AI accelerator memory and collective communication semantics |
| Example bandwidth | Blackwell-class GPUs quote up to 1.8 TB/s per GPU | MI300X platform lists 896 GB/s bidirectional Infinity Fabric bandwidth per GPU | 32 GT/s per lane; about 128 GB/s bidirectional for x16 | Follows PCIe generation and lane width | Version and implementation dependent |
| Common usage | DGX B200, GB200/GB300 NVL systems, large model training and inference | EPYC chiplet fabrics, Instinct accelerators, multi-GPU platforms | GPUs, NICs, SSDs, CXL devices, accelerator attachment | Memory expansion, memory pooling, coherent accelerators | Emerging alternative for multi-vendor accelerator scale-up systems |
| Notes | Best thought of as a scale-up accelerator fabric rather than a general-purpose I/O bus. | The same brand covers several related fabrics, so always cite the specific product context. | PCIe is universal and flexible, but lower bandwidth and higher software overhead than dedicated scale-up fabrics. | CXL changes the memory hierarchy more than raw link bandwidth; latency and coherency model are the interesting bits. | Worth tracking because open scale-up fabrics may matter for non-NVIDIA AI systems. |
| Sources |