Cost Optimization2026-05-20·15 min·PromptShelf Team

How We Reduced LLM Costs by 60%: A Real Optimization Case Study

Through model routing, prompt compression, caching, and quality gates, we cut monthly AI costs from $12,000 to $4,800.

Cost OptimizationModel RoutingCaching

Background

A SaaS company's AI customer service costs soared from $2,000 to $12,000/month because:

All requests used GPT-4o (the most expensive model)

Average prompt length: 800 tokens (40% redundant)

No caching — identical questions repeatedly called the API

Strategy 1: Intelligent Model Routing

Not all requests need the strongest model.

Request TypeBeforeAfterQuality ImpactCost Reduction

|---|---|---|---|---|

Simple FAQGPT-4o ($0.005)DeepSeek V4 ($0.00014)-2 pts97%Standard CSGPT-4o ($0.005)GPT-4o Mini ($0.00015)-5 pts97%Complex complaintGPT-4o ($0.005)GPT-4o ($0.005)None0%EscalationGPT-4o ($0.005)Claude Sonnet ($0.003)+3 pts40%

**Key insight**: 80% of requests are simple or standard — use cheaper models. Only 20% need the strongest.

from promptshelf import Router

router = Router(

rules=[

{"condition": "complexity < 3", "model": "deepseek-v4"},

{"condition": "complexity < 7", "model": "gpt-4o-mini"},

{"condition": "complexity >= 7 OR sentiment == 'negative'", "model": "gpt-4o"},

fallback="gpt-4o-mini"

)

Strategy 2: Prompt Compression

Original (800 tokens) → Compressed (320 tokens)

**Removed**:

"You are a senior expert with 10 years experience" — no accuracy impact

Repeated format requirements (appeared 3x) — keep 1

Verbose examples (5 total) — keep 2 most typical

**Result**: 60% fewer input tokens, quality dropped only 2 points.

Strategy 3: Semantic Caching

30% of customer service questions are repetitive ("how to reset password", "refund policy").

result = client.execute(

prompt="Customer service reply",

variables={"question": user_input},

cache={

"enabled": True,

"similarity_threshold": 0.92,

"ttl": 86400 # 24 hours

}

)

Final Results

MetricBeforeAfterImprovement

|---|---|---|---|

Monthly cost$12,000$4,800-60%Avg latency1,200ms680ms-43%Customer satisfaction4.2/54.5/5+7%Quality score8886-2 pts

Summary

LLM cost optimization is **systems engineering**, not a single technique. Model routing + prompt compression + semantic caching can dramatically reduce costs with minimal quality loss.