Microsoft warns CTOs of seven new attack patterns on AI agents: from natural language injections through goal hijacking to visual attacks on computer-use agents.
LLMs can be forced to leak data through targeted prompt attacks, but they disclose training data only with low probability in everyday usage scenarios.
Corporate AI spending has spiraled out of control; OpenAI promises more efficient models, while the Jevons Paradox could drive renewed demand growth over the long term.
The challenge is not to choose a side, but to create feedback loops that mediate between the pace of AI-accelerated development and the requirements for reliability and maintainability.
Real business environments with actual money, inventory and customers reveal AI capabilities and risks that classic benchmarks miss, ranging from price-fixing to deception to legal misinterpretations.
CHERRL enables reproducible analysis of reward hacking mechanisms through controlled bias injection and automatic detection of exploitation onset in LLM-based training.