Accelerator

Trainium2

AWS NeuronCore-v3 cloud accelerator chip summary for training, inference, and roofline-style performance analysis.

Back to Accelerator Catalog

Vendor
AWS
Architecture
NeuronCore-v3
Unit
Cloud accelerator chip
Form factor
Trn2 instance chip
Launch
2024-12-03
Memory
96 GiB HBM
HBM bandwidth
2.9 TB/s
BF16 peak
667 TFLOPS
FP16 peak
667 TFLOPS
FP8 dense peak
1.3 PFLOPS
FP8 sparse peak
2.56 PFLOPS
FP4 dense peak
n/a
FP4 sparse peak
n/a
FP64 peak
n/a
INT8 peak
n/a
Interconnect
NeuronLink-v3 - 1.28 TB/s per chip
Power
Not published per chip
Software stack
AWS Neuron SDK

Notes

  • A trn2.48xlarge instance contains 16 Trainium2 chips and 1.5 TB of accelerator memory.
  • AWS sparse throughput figure applies across FP8/FP16/BF16/TF32 sparse modes, not only FP8.

Sources