ICALens: Interpretability Method for Language Models Without Training Additional Autoencoders11. June 2026AI Models, Claude AIICA-based analysis enables rapid exploration of interpretable directions in language models without expensive training of additional autoencoders. Share on:
Linear Probes for Deception Detection in LLMs Show Critical Robustness Gaps3. June 2026AI Models, CybersecurityLinear probes for deception detection in LLMs function reliably only on training data, not under stylistic variations—but style augmentation can restore robustness. Share on: