·12 min read·Pricing Guide

AI API Pricing Guide 2026: Complete Cost Breakdown

AI API pricing is complex — input tokens, output tokens, cached tokens, batch pricing, and more. This guide breaks down costs across every major provider so you can make informed decisions. All data is live and updated hourly.

How AI API Pricing Works

AI APIs charge per token — roughly 0.75 words. Every request has two costs:

📥 Input Tokens

What you send to the model (prompt, context, system instructions). Usually cheaper.

📤 Output Tokens

What the model generates (response text). Usually 2-5x more expensive than input.

Prices are quoted per 1 million tokens ($/M). A typical conversation uses 1,000-5,000 tokens total.

Provider-by-Provider Pricing

Flagship model pricing per provider. Use our calculator for exact costs.

ProviderFlagship ModelInput $/MOutput $/MBlendedModels
OpenAI
GPT-5.4 (xhigh)$2.50$15.00$5.6355
Anthropic
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)$5.00$25.00$10.0028
Google
Gemini 3.1 Pro Preview$2.00$12.00$4.5042
DeepSeek
DeepSeek V3.2 (Reasoning)$0.28$0.42$0.3225
Mistral
Magistral Medium 1.2$2.00$5.00$2.7531
xAI
Grok 4.20 Beta 0309 (Reasoning)$2.00$6.00$3.0014
Alibaba (Qwen)
Qwen3.5 397B A17B (Reasoning)$0.60$3.60$1.3571

💰 Cheapest AI Models (Overall)

1
Gemma 3n E4B InstructGoogle$0.025/M
2
LFM2 24B A2BLiquid AI$0.052/M
3
Nova MicroAmazon$0.061/M
4
NVIDIA Nemotron Nano 9B V2 (Reasoning)NVIDIA$0.070/M
5
Llama 3 Instruct 8BMeta$0.070/M

Budget Options by Provider

OpenAI
Cheapest: gpt-oss-20B (high)
$0.060/M in · $0.200/M out
Anthropic
Cheapest: Claude 3 Haiku
$0.250/M in · $1.250/M out
Google
Cheapest: Gemma 3n E4B Instruct
$0.020/M in · $0.040/M out
DeepSeek
Cheapest: DeepSeek R1 Distill Qwen 32B
$0.270/M in · $0.270/M out
Mistral
Cheapest: Devstral Small (May '25)
$0.060/M in · $0.120/M out
xAI
Cheapest: Grok 4 Fast (Non-reasoning)
$0.200/M in · $0.500/M out
Alibaba (Qwen)
Cheapest: Qwen2.5 Turbo
$0.050/M in · $0.200/M out

Cost Optimization Tips

1.

Use cached input pricing. If you send the same system prompt or context repeatedly, enable prompt caching. Most providers offer 50-90% discounts on cached input tokens.

2.

Right-size your model. Don't use GPT-5.4 for tasks a smaller model handles well. Use our recommender to find the right model for your use case.

3.

Use batch APIs. OpenAI and other providers offer 50% discounts for non-real-time batch processing. Great for data processing, content generation, and classification tasks.

4.

Monitor your token usage. Output tokens cost 2-5x more than input tokens. Reduce output by requesting concise responses, using max_tokens limits, and structuring prompts efficiently.

5.

Consider open-source models. For high-volume workloads, self-hosting models like Llama or Qwen can be cheaper at scale. But factor in infrastructure and engineering costs.

Calculate Your Exact Costs

Use our free calculator to compare costs across all 446+ models.

Open Cost Calculator →

Related