The point: While video generation models produce visually convincing movements, visual quality does not correlate with practical executability by robots — an evaluation criterion overlooked by standard metrics.
Researchers have introduced Dream.exe, an evaluation framework that measures whether video generation models have sufficiently understood physical laws to generate executable robot actions from them. The system converts manipulation videos synthesized by AI models into actually executable robot trajectories and tests them in physics simulation.
Dream.exe is an evaluation framework that assesses video generation models based on whether their generated manipulation movements are practically executable. The system follows a three-stage pipeline: from a scene and a task description, the model generates a manipulation video; the framework extracts robot trajectories from it and executes these in a physics simulator. The result is a measurable success signal that purely visual metrics cannot provide.
The evaluation covered eight models from three categories: frontier closed-source generators, open-source solutions, and specialized robot models. 101 manually curated manipulation tasks across three complexity levels were tested, measured by visual quality, trajectory reliability, and execution success. Multiple models achieved measurable execution success, suggesting that the generative priors learned from internet-scale data already encode meaningful physical knowledge.
A central finding: visual quality is not a reliable predictor of executability. A model can generate visually convincing videos whose resulting robot movements fail in simulation — a dimension of model performance that standard evaluations do not capture. For the development of robotics AI systems, this means new evaluation criteria are needed that measure genuine physical competence rather than aesthetic persuasiveness. The framework will be made open source.
Source: arxiv.org · Published 3 June 2026
Lumi AI News — AI-assisted curation in accordance with Art. 50 EU AI Act. Paraphrasing and classification by Lumi News Pipeline v1.6.3.