In a nutshell: Gemini 3.5 Flash can now capture screen content and independently execute computer-controlled workflows, opening up new integration possibilities for enterprise applications.
Google has extended Gemini 3.5 Flash with computer use functions, enabling the model to independently analyze screen recordings and execute software-driven tasks.
Google has extended its language model Gemini 3.5 Flash with a computer use capability. This functionality enables the model to interpret screen recordings, understand user interfaces, and independently execute actions on a computer — without requiring specific APIs or additional integrations.
The computer use capability addresses a relevant challenge for CTOs and developers: legacy systems and proprietary applications that do not provide modern APIs can nonetheless be integrated into AI-driven workflows. The model can make data entries through its interpretation of screen content, fill out forms, or orchestrate business processes.
In practical terms, this means that development teams can implement AI agents that directly operate existing enterprise software, rather than having to perform complex interface development. Integration runs through Google’s API ecosystem and essentially requires passing screenshots to the model, which then suggests or executes appropriate actions.
Source: deepmind.google · Published June 24, 2026
Lumi AI News — AI-assisted curation in accordance with Article 50 EU AI Act. Paraphrasing and classification by Lumi News Pipeline v1.7.1.