Accelerator

Cloud TPU v6e / Trillium

Google TPU v6e TensorCore cloud tpu chip summary for training, inference, and roofline-style performance analysis.

Vendor

Google

Architecture

TPU v6e TensorCore

Unit

Cloud TPU chip

Form factor

Cloud TPU slice chip

Launch

2024-12-11

Memory

32 GB HBM

HBM bandwidth

1.6 TB/s

BF16 peak

918 TFLOPS

FP16 peak

n/a

FP8 dense peak

n/a

FP8 sparse peak

n/a

FP4 dense peak

n/a

FP4 sparse peak

n/a

FP64 peak

n/a

INT8 peak

1.84 POPS

Interconnect

ICI 2D torus - 800 GB/s bidirectional per chip

Power

Not published per chip

Software stack

JAX, XLA, TensorFlow, PyTorch/XLA

Notes

Google positions v6e as the Trillium generation for transformer, text-to-image, CNN training, fine-tuning, and serving.
A v6e pod has 256 chips and 234.9 PFLOPS of BF16 peak compute.