Accelerator
Trainium
AWS NeuronCore-v2 cloud accelerator chip summary for training,
inference, and roofline-style performance analysis.
Back to Accelerator Catalog
Vendor AWS
Architecture NeuronCore-v2
Unit Cloud accelerator chip
Form factor Trn1 instance chip
Launch 2022-11-28
Memory 32 GiB HBM
HBM bandwidth 0.82 TB/s
BF16 peak 190 TFLOPS
FP16 peak 190 TFLOPS
FP8 dense peak n/a
FP8 sparse peak n/a
FP4 dense peak n/a
FP4 sparse peak n/a
FP64 peak n/a
INT8 peak 380 TOPS
Interconnect NeuronLink-v2 - 384 GB/s inter-chip
Power Not published per chip
Software stack AWS Neuron SDK Notes
- Trainium appears to users through EC2 Trn1 instances, normally 16 chips per full instance.
- AWS quotes FP16/BF16/cFP8/TF32 together for Trainium1 peak throughput.
Sources