Scale-up fabrics

Interconnect Catalog

Comparison of node-local and rack-scale interconnects used to connect CPUs, GPUs, accelerators, memory expanders, and switches.

How to read this table

Bandwidth can mean unidirectional, bidirectional, per-link, per-device, or aggregate fabric bandwidth. This table keeps the wording explicit.

Topology and software stack matter as much as link rate for collective communication and model-parallel training.

Metric NVIDIA NVLink 5 / NVSwitchAMD Infinity FabricPCIe Gen5CXL 3.xUALink
Scope GPU-to-GPU and rack-scale accelerator fabricGPU package, CPU socket, and accelerator baseboard fabricHost I/O busCoherent host-device and memory expansion fabricOpen accelerator scale-up fabric
Topology Direct GPU links plus switched NVLink domainsProduct-specific mesh, die-to-die, and board-level linksRoot complex to endpoints and switchesPCIe physical layer with switching and fabric capabilities in newer generationsAccelerator-to-accelerator scale-up network
Coherence GPU memory fabric semantics; coherent CPU-GPU via NVLink-C2C in Grace Hopper class systemsCoherent within AMD CPU complexes; accelerator semantics vary by platformNon-coherent by default; CXL layers add coherent protocols where supportedCXL.cache and CXL.mem provide coherent semanticsIntended for AI accelerator memory and collective communication semantics
Example bandwidth Blackwell-class GPUs quote up to 1.8 TB/s per GPUMI300X platform lists 896 GB/s bidirectional Infinity Fabric bandwidth per GPU32 GT/s per lane; about 128 GB/s bidirectional for x16Follows PCIe generation and lane widthVersion and implementation dependent
Common usage DGX B200, GB200/GB300 NVL systems, large model training and inferenceEPYC chiplet fabrics, Instinct accelerators, multi-GPU platformsGPUs, NICs, SSDs, CXL devices, accelerator attachmentMemory expansion, memory pooling, coherent acceleratorsEmerging alternative for multi-vendor accelerator scale-up systems
Notes Best thought of as a scale-up accelerator fabric rather than a general-purpose I/O bus.The same brand covers several related fabrics, so always cite the specific product context.PCIe is universal and flexible, but lower bandwidth and higher software overhead than dedicated scale-up fabrics.CXL changes the memory hierarchy more than raw link bandwidth; latency and coherency model are the interesting bits.Worth tracking because open scale-up fabrics may matter for non-NVIDIA AI systems.
Sources