Accelerator

Trainium2

AWS NeuronCore-v3 cloud accelerator chip summary for training, inference, and roofline-style performance analysis.

Vendor

AWS

Architecture

NeuronCore-v3

Unit

Cloud accelerator chip

Form factor

Trn2 instance chip

Launch

2024-12-03

Memory

96 GiB HBM

HBM bandwidth

2.9 TB/s

BF16 peak

667 TFLOPS

FP16 peak

667 TFLOPS

FP8 dense peak

1.3 PFLOPS

FP8 sparse peak

2.56 PFLOPS

FP4 dense peak

n/a

FP4 sparse peak

n/a

FP64 peak

n/a

INT8 peak

n/a

Interconnect

NeuronLink-v3 - 1.28 TB/s per chip

Power

Not published per chip

Software stack

AWS Neuron SDK

Notes

A trn2.48xlarge instance contains 16 Trainium2 chips and 1.5 TB of accelerator memory.
AWS sparse throughput figure applies across FP8/FP16/BF16/TF32 sparse modes, not only FP8.