Skip to content

Thinking Machines’ TML-Interaction-Small: Setting New Standards in Real-Time Speech Processing

In Brief: Thinking Machines presents TML-Interaction-Small with 276B parameters for natural real-time speech interaction. The encoder-free model uses 200ms microturns and demonstrates outstanding cache efficiency. Skepticism grows around TurboQuant while open-source models continue to gain performance rapidly and outpace Moore’s Law.

Thinking Machines has achieved a significant breakthrough in real-time speech interaction with the TML-Interaction-Small model. The 276-billion-parameter model with 12 billion active parameters realizes for the first time in practical form what experts have described as necessary for natural human-AI collaboration.

Thinking Machines published “Interaction Models: A Scalable Approach to Human-AI Collaboration,” a groundbreaking solution for real-time speech processing. The TML-Interaction-Small model is based on a mixture-of-experts architecture with 276 billion total parameters and 12 billion active parameters.

The system operates with an encoder-free early-fusion approach that processes both images and audio with 30x processing. Impressive is the stability of the procedure: cache hit rates of 80–96 percent and over 353 times longer processing time per task demonstrate the efficiency of the approach. The model works with “Time-Aligned Microturns” of 200 milliseconds each, enabling highly continuous interactivity.

The publication thus revitalizes the well-known “Her” demo concept from GPT-4o and significantly exceeds it with more detailed and realistic demonstrations that come closer to practical operational capability. In parallel, further advances are emerging: OpenHands updated its software engineering benchmark, while Claw-Eval introduced a more comprehensive test set for agent-based tasks.

Skeptical voices are growing regarding the hyped TurboQuant method. Independent analyses suggest that the quantization and serving method is less capable than hoped. Meanwhile, open-source models are showing impressive progress: on a MacBook Pro of the same memory size, the best runnable open-source architecture has evolved from Llama-3-70B level to DeepSeek-V4-Flash level – a gain of approximately 473 percent in just 24 months. This corresponds to a doubling every three months and outpaces Moore’s Law.

Share on: