Accelerator

B200 SXM

NVIDIA Blackwell sxm gpu summary for training, inference, and roofline-style performance analysis.

Vendor

NVIDIA

Architecture

Blackwell

Unit

SXM GPU

Form factor

SXM

Launch

2024-03-18

Memory

180 GB HBM3E

HBM bandwidth

8 TB/s

BF16 peak

2.25 PFLOPS

FP16 peak

2.25 PFLOPS

FP8 dense peak

4.5 PFLOPS

FP8 sparse peak

9 PFLOPS

FP4 dense peak

9 PFLOPS

FP4 sparse peak

18 PFLOPS

FP64 peak

40 TFLOPS

INT8 peak

n/a

Interconnect

NVIDIA NVLink 5 / NVSwitch - 1.8 TB/s per GPU, derived from DGX B200 aggregate

Power

Platform dependent; DGX B200 is ~14.3 kW max for 8 GPUs

Software stack

CUDA, TensorRT-LLM, NVIDIA AI Enterprise

Notes

Per-GPU memory, bandwidth, FP4, and FP8 entries are derived from NVIDIA DGX B200 8-GPU aggregate specifications.
NVIDIA Blackwell adds FP4 tensor core support and a two-die GPU package connected by a 10 TB/s chip-to-chip link.