Visual world models can be systematically manipulated through visually imperceptible image modifications to generate erroneous predictions without requiring knowledge of future data or user inputs.
Linear probes for deception detection in LLMs function reliably only on training data, not under stylistic variations—but style augmentation can restore robustness.