How much does the OpenAI GPT-5.4 API cost?

GPT-5.4 API pricing is $2.50 per million input tokens and $15.00 per million output tokens. Use our calculator at aiapicost.com for exact cost estimates based on your usage.

Which AI model is cheapest for API usage?

The cheapest AI API models change frequently. Use aiapicost.com to compare real-time pricing across 400+ models from OpenAI, Anthropic, Google, DeepSeek, and more. DeepSeek and open-source models typically offer the lowest per-token costs.

How do AI API token costs work?

AI APIs charge per token (roughly 0.75 words). Costs are split into input tokens (what you send) and output tokens (what the model generates). Output tokens are typically 2-5x more expensive. Prices are quoted per 1 million tokens.

Claude vs ChatGPT: which is better?

Both are top-tier models. Claude excels at coding and instruction-following, while GPT-5.4 offers broader multimodal capabilities. Compare them head-to-head at aiapicost.com/compare with real benchmark data.

Which performs better on benchmarks, gpt-oss-120b (high) or Claude Opus 4.6 (Non-reasoning, High Effort)?

gpt-oss-120b (high) wins 6 out of 12 benchmarks vs 5 for Claude Opus 4.6 (Non-reasoning, High Effort).

Compare/gpt-oss-120b (high) vs Claude Opus 4.6 (Non-reasoning, High Effort)

gpt-oss-120b (high)vsClaude Opus 4.6 (Non-reasoning, High Effort)

Side-by-side comparison of pricing, 12 benchmarks, and generation speed.

OpenAI

gpt-oss-120b (high)

Input

$0.15/M

Output

$0.6/M

Speed

217 tok/s

TTFT

0.54s

Anthropic

Claude Opus 4.6 (Non-reasoning, High Effort)

Input

$5/M

Output

$25/M

Speed

—

TTFT

—

Winner by Category

Cheaper

gpt-oss-120b (high)

Faster (tok/s)

gpt-oss-120b (high)

Lower Latency

Claude Opus 4.6 (Non-reasoning, High Effort)

Benchmarks (6-5)

gpt-oss-120b (high)

Pricing Comparison

Metric	gpt-oss-120b (high)	Claude Opus 4.6 (Non-reasoning, High Effort)
Input ($/M tokens)	$0.15	$5
Output ($/M tokens)	$0.6	$25

Cost for 1M input + 100K output tokens:

gpt-oss-120b (high)$0.21

Claude Opus 4.6 (Non-reasoning, High Effort)$7.50

Speed Comparison

Output Speed (tokens/s) — higher is better

gpt-oss-120b (high)

217 tok/s

Claude Opus 4.6 (Non-reasoning, High Effort)

—

Time to First Token (seconds) — lower is better

gpt-oss-120b (high)

0.54s

Claude Opus 4.6 (Non-reasoning, High Effort)

—

Editorial Analysis

Verdict. gpt-oss-120b (high) wins the overall benchmark matchup 6–5 across 11 overlapping categories, but raw benchmark score is only one input to the decision.

Pricing. Both models sit in the budget / premium bracket for output-token pricing. At 0.0× the per-million-token cost, gpt-oss-120b (high) is meaningfully cheaper if your traffic is output-heavy (long completions, document generation, agent loops). gpt-oss-120b (high) makes more sense when output volume is low and absolute reasoning quality justifies the premium.

Strengths. gpt-oss-120b (high) is strongest on AIME 2025 (93%), Math Index (93.4), LiveCodeBench (88%). Claude Opus 4.6 (Non-reasoning, High Effort) leads on GPQA Diamond (84%), TerminalBench (48%), SciCode (46%).

Speed. On throughput, gpt-oss-120b (high) generates tokens at 217 tok/s versus 0 tok/s — about 100% faster. On time-to-first-token, Claude Opus 4.6 (Non-reasoning, High Effort) responds in 0ms vs 535ms, which matters most for chat-style UIs.

Provider. OpenAI and Anthropic sell to overlapping but distinct developer audiences: OpenAI tends to ship frontier reasoning models with premium positioning, while Anthropic often prices more aggressively. Your existing vendor relationships, billing, and SLA preferences may matter as much as the raw numbers above.

Workload cost. Workload scenarios (per million requests at 30M input + 15M output tokens): gpt-oss-120b (high) costs $13.50 ($162/year); Claude Opus 4.6 (Non-reasoning, High Effort) costs $525.00 ($6300/year). At a smaller 5M-input/2M-output scale (single-developer tool or prototype): gpt-oss-120b (high) ≈ $1.95/run, Claude Opus 4.6 (Non-reasoning, High Effort) ≈ $75.00/run. At agent/realtime scale (200M input / 100M output per million requests): gpt-oss-120b (high) ≈ $90/run, Claude Opus 4.6 (Non-reasoning, High Effort) ≈ $3500/run. gpt-oss-120b (high) becomes more attractive at higher volume — the absolute per-token pricing difference compounds when you ship at scale.

Recommendation. Benchmarks are too close to call (1-win swing) but Claude Opus 4.6 (Non-reasoning, High Effort) is 41.7× more expensive per million output tokens. For most workloads the cheaper option (gpt-oss-120b (high)) wins on cost-quality tradeoff. Only pay the premium for Claude Opus 4.6 (Non-reasoning, High Effort) if you have a measured lift on a task your product depends on.

Head-to-head deltas

On throughput, gpt-oss-120b (high) is 21658.20× faster (217 tok/s vs 0 tok/s). For streaming chat or real-time agents this alone often flips the recommendation.
Time-to-first-token differs by 5350.0× — Claude Opus 4.6 (Non-reasoning, High Effort) responds in 0ms vs 535ms. For interactive chat UIs this can matter more than raw benchmark wins.
gpt-oss-120b (high) is 41.67× cheaper per million output tokens than Claude Opus 4.6 (Non-reasoning, High Effort). At scale, that price ratio is roughly your monthly bill — the deciding factor for most production teams.

Benchmark Comparison

Data from Artificial Analysis API — 12 benchmarks

Intelligence Index

23.837.8

Coding Index

30.4—

Math Index

93.4—

GPQA Diamond

78.2%84.0%

MMLU-Pro

80.8%—

LiveCodeBench

87.8%—

AIME 2025

93.4%—

MATH-500

——

Humanity's Last Exam

18.5%18.6%

SciCode

38.9%45.7%

IFBench

69.0%44.6%

TerminalBench

23.5%48.5%

gpt-oss-120b (high)6 wins

5 winsClaude Opus 4.6 (Non-reasoning, High Effort)

Frequently Asked Questions

Which is cheaper, gpt-oss-120b (high) or Claude Opus 4.6 (Non-reasoning, High Effort)?

gpt-oss-120b (high) is cheaper overall. Its blended price (3:1 input/output ratio) is $0.26/M tokens vs $10.00/M for Claude Opus 4.6 (Non-reasoning, High Effort).

Which model performs better on benchmarks?

gpt-oss-120b (high) wins 6 out of 12 benchmarks compared to 5 for Claude Opus 4.6 (Non-reasoning, High Effort). See the detailed benchmark chart above for per-category results.

Which is faster for real-time applications?

gpt-oss-120b (high) generates tokens faster at 217 tok/s vs 0 tok/s. However, Claude Opus 4.6 (Non-reasoning, High Effort) has lower time-to-first-token (0.00s vs 0.54s).

When should I use gpt-oss-120b (high) vs Claude Opus 4.6 (Non-reasoning, High Effort)?

Choose based on your priorities: gpt-oss-120b (high) for lower cost, gpt-oss-120b (high) for stronger benchmark performance, and gpt-oss-120b (high) for faster generation. For latency-sensitive apps, check the TTFT comparison above.

More Comparisons

← All comparisons·Full benchmark table·Cost calculator