The Bottom Line: OpenAI releases GPT-Realtime-2 with 15.2 percent performance improvement. New features include configurable preambles, parallel tool calls, better error handling, expanded context length to 128K tokens, and improved voice nuance control for developers.
OpenAI has released three new models with significantly improved speech processing. GPT-Realtime-2 demonstrates an impressive 15.2 percent performance improvement on the Big Bench Audio test and is powered by substantially enhanced speech comprehension.
While the realtime model released three months ago offered only a five percent improvement and was based on 4o’s intelligence, today’s Realtime-2 release marks a significant leap forward. OpenAI is providing three new models: one for speech input, one for speech output, and one for speech-to-speech conversions.
The focus is not primarily on voice quality, but on practical applicability. Developers can now configure custom preamble phrases such as “Let me check that” or “One moment, let me look that up.” The model supports parallel tool calls and can verbalize its actions, such as “Check your calendar” or “I’m looking into that right now.”
Error handling has been improved – the model responds more naturally to problems with phrases like “I’m having difficulty with that at the moment.” Context has been expanded from 32,000 to 128,000 tokens, while the model better preserves technical terms, proper names, and medical concepts. Developers can precisely control tone and speaking style and choose from five levels of reasoning effort. The demo video also shows improved interruption detection.