Savings library

Cut your token spend

A practical, opinionated set of techniques for reducing AI API costs without sacrificing quality.

CachingBeginnerHigh impact

Use prompt caching for repeated system prompts

Long, repeated system or tool-definition blocks can cost 90% less when cached. Restructure prompts so static content comes first.

OpenAIAnthropicGoogleDeepSeek
BatchingBeginnerHigh impact

Route async workloads to Batch APIs

Most providers offer 50% off when results can return within 24 hours. Ideal for evals, backfills, and enrichment.

OpenAIAnthropicGoogle
RoutingIntermediateHigh impact

Route by task complexity, not by default to your largest model

Classify intent with a small model, then escalate. A simple router can cut spend 60%+ on mixed workloads.

Prompt designBeginnerMedium impact

Cap output tokens deliberately

Output tokens are typically 4–5x the price of input. Set max_tokens, request concise schemas, and prefer structured outputs.

ContextIntermediateMedium impact

Prune context aggressively

Drop turn-history older than what the model needs. Summarize older state into a short memory block.

Prompt designBeginnerMedium impact

Use structured outputs over freeform JSON parsing

Native structured outputs reduce retries and wasted tokens from malformed JSON.

OpenAIAnthropicGoogle
Model selectionBeginnerHigh impact

Use a small model for classification and extraction

Sub-$0.30 models handle most extraction and routing tasks at near-frontier accuracy.

OpenAIGoogleDeepSeek
MonitoringIntermediateMedium impact

Instrument per-feature token spend

Tag every call with feature, route, and model. You cannot optimize what you cannot attribute.

EstimationIntermediateMedium impact

Estimate before you ship

Run a 1000-call simulation against representative traffic. Project monthly spend before launch.