Start with the input context
An AI request can include the current instruction, conversation history, files, tool results, and other context. In a long coding or research session, that input can become much larger than the answer. Repeatedly submitting it is often a major part of token use.
Input-context compression removes or condenses material before the request reaches the model. SOTA Token Plan can reduce submitted input-token volume by up to 85%-90% for suitable workloads. It is an upper range, not a promise for every task.
Compression and caching are different
Compression reduces the token volume submitted for the current request. A cache hit reuses eligible context under the upstream model provider's rules. They can both lower cost, but the calculation is different.
Prehendo supplies a one-hour cache configuration where the selected model supports it. The model provider and the shape of later requests determine whether the cache is hit. Because that result is outside Prehendo's control, the examples below do not count a cache discount.
How the input-context calculation works
If compression removes 85%-90% of the original input, 10%-15% remains. Multiply that remaining share by the displayed model multiplier.
For a model displayed at 0.3x, the arithmetic is 0.3 x 0.15 = 0.045 and 0.3 x 0.10 = 0.03. Under those conditions, the input-context component is 3%-4.5% of the comparable official input price.
Why total request cost varies
The 3%-4.5% figure is not the price of the whole request. Output tokens are not reduced by the input-compression ratio. Cache writes and cache reads may have separate rates. A response-heavy task will therefore have a different effective discount from a task dominated by a long repeated input.
Model choice, package terms, the input/output mix, actual compression, and cache reuse all affect the final amount. The honest way to compare cost is to look at a real workload and keep each component separate.
Check quality on your own workload
Compression is designed to retain context that matters to the task, but no compression method should be treated as invisible in every situation. A repository refactor, a legal document, and a long creative draft do not need the same details.
Test representative work before relying on a savings estimate. Compare the answer, not just the token count. If a task depends on small details spread across a large context, use a less aggressive setup or send the original material.
| Displayed multiplier | 85% input reduction | 90% input reduction |
|---|---|---|
| 0.3x | 4.5% of official input price | 3% of official input price |
| 0.9x | 13.5% of official input price | 9% of official input price |
| 1.1x | 16.5% of official input price | 11% of official input price |
These figures describe the input-context component, not the complete request. Total request cost varies.