How much does the OpenAI GPT-5.4 API cost?

GPT-5.4 API pricing is $2.50 per million input tokens and $15.00 per million output tokens. Use our calculator at aiapicost.com for exact cost estimates based on your usage.

Which AI model is cheapest for API usage?

The cheapest AI API models change frequently. Use aiapicost.com to compare real-time pricing across 400+ models from OpenAI, Anthropic, Google, DeepSeek, and more. DeepSeek and open-source models typically offer the lowest per-token costs.

How do AI API token costs work?

AI APIs charge per token (roughly 0.75 words). Costs are split into input tokens (what you send) and output tokens (what the model generates). Output tokens are typically 2-5x more expensive. Prices are quoted per 1 million tokens.

Claude vs ChatGPT: which is better?

Both are top-tier models. Claude excels at coding and instruction-following, while GPT-5.4 offers broader multimodal capabilities. Compare them head-to-head at aiapicost.com/compare with real benchmark data.

Live data · Updated hourly

PinchBench — Real-World AI Agent Benchmarks

How do AI models perform on real agent tasks? PinchBench scores 510+ models across coding, reasoning, tool use, and instruction following — with live pricing data.

Models Tested

510

Scenarios

Avg Score

32.2

Best Value

Qwen3.5 0.8B (Non-reasoning)

⭐ Overall

Balanced score across all agent capabilities

intelligence index (15%)coding index (15%)math index (10%)gpqa (10%)livecodebench (10%)ifbench (10%)tau2 (10%)terminalbench hard (10%)hle (10%)

🥇#168.4

OpenAI

GPT-5.5 (xhigh)

Price

$11.25

Speed

Efficiency

6.1

🥈#267.1

OpenAI

GPT-5.5 (high)

Price

$11.25

Speed

Efficiency

6.0

🥉#367.1

OpenAI

GPT-5.2 (xhigh)

Price

$4.81

Speed

Efficiency

13.9

#	Model	Score	Input $/M	Output $/M	Speed	TTFT	Efficiency
1	GPT-5.5 (xhigh) OpenAI	68.4	$5.00	$30.00	63	46.82s	6.1
2	GPT-5.5 (high) OpenAI	67.1	$5.00	$30.00	61	19.21s	6.0
3	GPT-5.2 (xhigh) OpenAI	67.1	$1.75	$14.00	69	62.91s	13.9
4	Gemini 3.1 Pro Preview Google	66.8	$2.00	$12.00	133	23.51s	14.8
5	Gemini 3 Pro Preview (high) Google	65.7	$2.00	$12.00	125	69.37s	14.6
6	GPT-5.4 (xhigh) OpenAI	65.4	$2.50	$15.00	80	159.82s	11.6
7	GPT-5.5 (medium) OpenAI	65.4	$5.00	$30.00	61	3.52s	5.8
8	Gemini 3 Flash Preview (Reasoning) Google	64.3	$0.50	$3.00	197	5.65s	57.1
9	Claude Opus 4.5 (Reasoning) Anthropic	63.4	$6.25	$25.00	62	11.93s	5.8
10	GPT-5.1 (high) OpenAI	63.3	$1.25	$10.00	118	21.14s	18.4
11	GPT-5.3 Codex (xhigh) OpenAI	63.2	$1.75	$14.00	80	52.48s	13.1
12	Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Anthropic	61.8	$6.25	$25.00	64	14.54s	5.7
13	Kimi K2.6 Kimi	61.8	$0.95	$4.00	40	1.35s	36.1
14	GPT-5.2 (medium) OpenAI	61.6	$1.75	$14.00	—	—	12.8
15	GPT-5 Codex (high) OpenAI	61.6	$1.25	$10.00	171	6.50s	17.9
16	DeepSeek V4 Pro (Reasoning, Max Effort) DeepSeek	61.5	$1.74	$3.48	31	1.19s	28.3
17	Muse Spark Meta	61.3	—	—	—	—	—
18	GLM-4.7 (Reasoning) Z AI	60.9	$0.60	$2.20	107	0.77s	60.9
19	MiMo-V2.5-Pro Xiaomi	60.8	$1.00	$3.00	55	2.32s	40.5
20	Grok 4.3 xAI	60.4	$1.25	$2.50	86	10.26s	38.6

💰 Best Cost Efficiency — Overall

Score per dollar (higher = better value). Only models with pricing data.

Qwen3.5 0.8B (Non-reasoning)

822.6$0.02

Qwen3.5 4B (Reasoning)

654.3$0.06

Qwen3.5 0.8B (Reasoning)

607.5$0.02

Qwen3.5 2B (Non-reasoning)

601.8$0.04

Qwen3.5 2B (Reasoning)

567.8$0.04

Qwen3.5 4B (Non-reasoning)

553.1$0.06

gpt-oss-20B (high)

506.9$0.09

NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)

460.0$0.10

Gemma 3n E4B Instruct

455.7$0.03

NVIDIA Nemotron Nano 9B V2 (Reasoning)

413.4$0.07

⚡ Score vs Speed — Overall

Models in the top-right are both fast and capable.

Inception

Mercury 2

Score

44.3

Speed

736

IBM

Granite 3.3 8B (Non-reasoning)

Score

10.6

Speed

380

Alibaba

Qwen3.5 2B (Non-reasoning)

Score

24.1

Speed

333

Google

Gemini 3.1 Flash-Lite Preview

Score

40.8

Speed

277

NVIDIA

Nemotron 3 Nano Omni 30B A3B Reasoning

Score

27.9

Speed

308

Amazon

Nova Micro

Score

12.7

Speed

341

OpenAI

gpt-oss-20B (high)

Score

44.6

Speed

256

Google

Gemini 3 Flash Preview (Reasoning)

Score

64.3

Speed

197

OpenAI

gpt-oss-20B (low)

Score

35.9

Speed

266

OpenAI

gpt-oss-120B (high)

Score

52.9

Speed

208

Frequently Asked Questions

What is PinchBench and how does it differ from traditional benchmarks?

PinchBench evaluates AI models on real-world agent tasks spanning coding, reasoning, tool use, and instruction following. Unlike academic benchmarks that test isolated capabilities, PinchBench combines multiple benchmark dimensions to reflect how models perform as autonomous agents in practical workflows.

Which scenarios does PinchBench test?

PinchBench covers 6 scenarios: Coding Agent (code generation, debugging, terminal use), Reasoning & Logic (math, science, multi-step problems), Instruction Following (format compliance, structured output), Research & Analysis (scientific reasoning, knowledge), Tool Use & Agentic (multi-turn orchestration, planning), and an Overall balanced score.

How are scores calculated?

Each scenario uses a weighted combination of relevant benchmarks. For example, Coding Agent combines LiveCodeBench, TerminalBench, SciCode, and the Artificial Analysis Coding Index. Scores are normalized to 0-100. Cost efficiency is calculated as score divided by price per million tokens.

Why do real-world results differ from academic benchmarks?

Academic benchmarks test specific skills in controlled conditions. Real agent tasks require combining multiple skills — a model might score well on individual benchmarks but struggle when tasks require coding + tool use + instruction following simultaneously. PinchBench's weighted scenario scores better approximate this combined performance.

How often is the data updated?

PinchBench data refreshes hourly from the Artificial Analysis API, ensuring you see the latest benchmark scores and pricing for all models.

Best for your use case·Model recommender·Compare models·Full benchmarks·Calculator