Compare/gpt-oss-120B (low) vs Gemini 2.0 Flash (Feb '25)

gpt-oss-120B (low)vsGemini 2.0 Flash (Feb '25)

Side-by-side comparison of pricing, 12 benchmarks, and generation speed.

OpenAI

gpt-oss-120B (low)

Input
$0.15/M
Output
$0.6/M
Speed
232 tok/s
TTFT
0.55s
Google

Gemini 2.0 Flash (Feb '25)

Input
$0.15/M
Output
$0.6/M
Speed
TTFT

Winner by Category

Cheaper
Tie
Faster (tok/s)
gpt-oss-120B (low)
Lower Latency
Gemini 2.0 Flash (Feb '25)
Benchmarks (9-3)
gpt-oss-120B (low)

Pricing Comparison

Metricgpt-oss-120B (low)Gemini 2.0 Flash (Feb '25)
Input ($/M tokens)$0.15$0.15
Output ($/M tokens)$0.6$0.6
Cost for 1M input + 100K output tokens:
gpt-oss-120B (low)$0.21
Gemini 2.0 Flash (Feb '25)$0.21

Speed Comparison

Output Speed (tokens/s) — higher is better
gpt-oss-120B (low)
232 tok/s
Gemini 2.0 Flash (Feb '25)
Time to First Token (seconds) — lower is better
gpt-oss-120B (low)
0.55s
Gemini 2.0 Flash (Feb '25)

Benchmark Comparison

Data from Artificial Analysis API — 12 benchmarks

Intelligence Index
24.518.5
Coding Index
15.513.6
Math Index
66.721.7
GPQA Diamond
67.2%62.3%
MMLU-Pro
77.5%77.9%
LiveCodeBench
70.7%33.4%
AIME 2025
66.7%21.7%
MATH-500
93.0%
Humanity's Last Exam
5.2%5.3%
SciCode
36.0%33.3%
IFBench
58.3%40.2%
TerminalBench
5.3%3.8%
gpt-oss-120B (low)9 wins
3 winsGemini 2.0 Flash (Feb '25)

Frequently Asked Questions

Which is cheaper, gpt-oss-120B (low) or Gemini 2.0 Flash (Feb '25)?

Both models have similar pricing. Check the detailed breakdown above for input vs output token costs.

Which model performs better on benchmarks?

gpt-oss-120B (low) wins 9 out of 12 benchmarks compared to 3 for Gemini 2.0 Flash (Feb '25). See the detailed benchmark chart above for per-category results.

Which is faster for real-time applications?

gpt-oss-120B (low) generates tokens faster at 232 tok/s vs 0 tok/s. However, Gemini 2.0 Flash (Feb '25) has lower time-to-first-token (0.00s vs 0.55s).

When should I use gpt-oss-120B (low) vs Gemini 2.0 Flash (Feb '25)?

Choose based on your priorities: both are similarly priced, gpt-oss-120B (low) for stronger benchmark performance, and gpt-oss-120B (low) for faster generation. For latency-sensitive apps, check the TTFT comparison above.