Skip to content

TorchTPU: PyTorch Now Runs Natively on Google’s TPUs

At a glance: Google presents TorchTPU, a native integration between PyTorch and TPUs. The technology enables developers to easily migrate their models to Google’s custom chips. The focus is on user-friendliness, portability, and maximum performance in distributed AI systems at hyperscale.

Google introduces TorchTPU – a new technology that runs PyTorch directly on Tensor Processing Units. The solution enables developers to port their existing machine learning models to Google’s specialized hardware with minimal code changes while leveraging full performance.

The requirements for modern AI infrastructure have fundamentally changed. The machine learning frontier requires distributed systems that can scale across thousands of accelerators. For models running on clusters with around 100,000 chips, the supporting software systems must meet the highest standards for performance, hardware portability, and reliability.

At Google, Tensor Processing Units (TPUs) form the foundation of the supercomputing infrastructure. These custom ASICs train and run inference for Google’s own AI platforms such as Gemini and Veo – as well as for large-scale workloads from cloud customers.

Since many developers create their models in PyTorch, a native and high-performance integration between PyTorch and TPUs was urgently needed. This is exactly where TorchTPU comes in.

Google’s development team aimed to create a technology stack that prioritizes ease of use, portability, and exceptional performance. Developers should be able to migrate their existing PyTorch workloads with minimal code changes while gaining access to APIs and tools to fully leverage the computing power of the hardware. Google thus provides a behind-the-scenes look at TorchTPU’s engineering principles, the developed architecture, and the roadmap for 2026.

Share on: