At a glance: Gemini 3.1 Flash TTS extends Google DeepMind’s speech synthesis with audio tags for granular control over voice characteristics in over 70 languages.

Google has released Gemini 3.1 Flash TTS, a text-to-speech model with detailed audio tags for more precise control over speech style and speaking rate. The system supports more than 70 languages.

Gemini 3.1 Flash TTS is Google’s new audio model that allows users to refine AI voices through granular control. The core feature is audio tags, which enable precise modulation of vocal type and speaking rate without needing to retrain the underlying language model.

The enhanced speech quality enables more expressive synthesis than previous generations. With support for over 70 languages, the model targets practitioners who need to integrate speech output in global applications or produce localized voice content.

For developers and product teams, this means that voice applications such as voicebots, audiobook generation, or accessible content can now sound more differentiated and natural without the need to manage multiple separate models in parallel.

Source: ainews-dev.lumi-systems.io · Published 17 May 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.5.2.

Share on:

Gemini 3.1 Flash TTS: Google Introduces Text-to-Speech Model with Audio Tags

Lumi AI News

Legal

Topics