Cost Optimization2026-05-20ยท15 minยทPromptShelf Team

How We Reduced LLM Costs by 60%: A Real Optimization Case Study

Through model routing, prompt compression, caching, and quality gates, we cut monthly AI costs from $12,000 to $4,800.

Cost OptimizationModel RoutingCaching

Background

A SaaS company's AI customer service costs soared from $2,000 to $12,000/month because:

  • All requests used GPT-4o (the most expensive model)
  • Average prompt length: 800 tokens (40% redundant)
  • No caching โ€” identical questions repeatedly called the API
  • Strategy 1: Intelligent Model Routing

    Not all requests need the strongest model.

    Request TypeBeforeAfterQuality ImpactCost Reduction

    |---|---|---|---|---|

    Simple FAQGPT-4o ($0.005)DeepSeek V4 ($0.00014)-2 pts97%Standard CSGPT-4o ($0.005)GPT-4o Mini ($0.00015)-5 pts97%Complex complaintGPT-4o ($0.005)GPT-4o ($0.005)None0%EscalationGPT-4o ($0.005)Claude Sonnet ($0.003)+3 pts40%

    **Key insight**: 80% of requests are simple or standard โ€” use cheaper models. Only 20% need the strongest.

    from promptshelf import Router

    router = Router(

    rules=[

    {"condition": "complexity < 3", "model": "deepseek-v4"},

    {"condition": "complexity < 7", "model": "gpt-4o-mini"},

    {"condition": "complexity >= 7 OR sentiment == 'negative'", "model": "gpt-4o"},

    ],

    fallback="gpt-4o-mini"

    )

    Strategy 2: Prompt Compression

    Original (800 tokens) โ†’ Compressed (320 tokens)

    **Removed**:

  • "You are a senior expert with 10 years experience" โ€” no accuracy impact
  • Repeated format requirements (appeared 3x) โ€” keep 1
  • Verbose examples (5 total) โ€” keep 2 most typical
  • **Result**: 60% fewer input tokens, quality dropped only 2 points.

    Strategy 3: Semantic Caching

    30% of customer service questions are repetitive ("how to reset password", "refund policy").

    result = client.execute(

    prompt="Customer service reply",

    variables={"question": user_input},

    cache={

    "enabled": True,

    "similarity_threshold": 0.92,

    "ttl": 86400 # 24 hours

    }

    )

    Final Results

    MetricBeforeAfterImprovement

    |---|---|---|---|

    Monthly cost$12,000$4,800-60%Avg latency1,200ms680ms-43%Customer satisfaction4.2/54.5/5+7%Quality score8886-2 pts

    Summary

    LLM cost optimization is **systems engineering**, not a single technique. Model routing + prompt compression + semantic caching can dramatically reduce costs with minimal quality loss.

    Want to try it out?

    PromptShelf is free. Start managing your AI prompts in 3 minutes.