Skip to content

Language Compression in LLMs: Output Optimization Saves Costs, Input Reduction Increases Them

In brief: Output compression effectively reduces inference costs, while input compression increases overall costs and degrades response quality.

A study evaluates the impact of linguistic compression on the costs and accuracy of large language models. It shows: output compression reduces inference costs by a factor of 1.4 to 2.4, while input compression increases them by approximately 1.15-fold and simultaneously degrades response quality.

The “Cavewoman” protocol evaluates how eight language models respond to five datasets at five compression levels. Two channels are measured separately: the input prompt and the generated output. Each generation is assessed for task accuracy, realized per-item costs, and agreement with an uncompressed reference generation from the model.

Output compression shows consistently positive effects: for most API models, it reduces realized costs by a factor of 1.4 to 2.4, at best by a factor of 3. Costs also decrease for all four evaluated open-weight models under public pricing. The well-known principle “Talk short. Drop grammar. Save token” thus actually works for output.

Input compression, by contrast, produces the opposite result: it creates a strict loss scenario. Net costs increase by approximately 1.15-fold on average across five benchmarks, in the worst case by a factor of 1.8, and under stronger compression even by a factor of 2.7. The reason: models compensate for shortened inputs by generating longer responses, while simultaneously response accuracy declines.

Another problem emerges with input compression: the surface form of generated texts diverges from the uncompressed reference generation of the model. For non-reasoning models, roughly half of all generations are factually correct but their wording no longer matches what the model would generate without input compression. This divergence persists even under length-controlled re-evaluation and under complementary semantic measures.


Source: arxiv.org · Published June 22, 2026
Lumi AI News — AI-assisted curation in accordance with Art. 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.1.

Share on: