Compare/Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) vs GLM-4.5V (Non-reasoning)

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)vsGLM-4.5V (Non-reasoning)

Side-by-side comparison of pricing, 12 benchmarks, and generation speed.

NVIDIA

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

Input
$0.6/M
Output
$1.8/M
Speed
42 tok/s
TTFT
0.68s
Z AI

GLM-4.5V (Non-reasoning)

Input
$0.6/M
Output
$1.8/M
Speed
69 tok/s
TTFT
26.97s

Winner by Category

Cheaper
Tie
Faster (tok/s)
GLM-4.5V (Non-reasoning)
Lower Latency
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
Benchmarks (11-1)
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

Pricing Comparison

MetricLlama 3.1 Nemotron Ultra 253B v1 (Reasoning)GLM-4.5V (Non-reasoning)
Input ($/M tokens)$0.6$0.6
Output ($/M tokens)$1.8$1.8
Cost for 1M input + 100K output tokens:
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)$0.78
GLM-4.5V (Non-reasoning)$0.78

Speed Comparison

Output Speed (tokens/s) — higher is better
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
42 tok/s
GLM-4.5V (Non-reasoning)
69 tok/s
Time to First Token (seconds) — lower is better
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
0.68s
GLM-4.5V (Non-reasoning)
26.97s

Benchmark Comparison

Data from Artificial Analysis API — 12 benchmarks

Intelligence Index
15.012.7
Coding Index
13.110.8
Math Index
63.715.3
GPQA Diamond
72.8%57.3%
MMLU-Pro
82.5%75.1%
LiveCodeBench
64.1%35.2%
AIME 2025
63.7%15.3%
MATH-500
95.2%
Humanity's Last Exam
8.1%3.6%
SciCode
34.7%18.8%
IFBench
38.2%28.6%
TerminalBench
2.3%6.8%
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)11 wins
1 winsGLM-4.5V (Non-reasoning)

Frequently Asked Questions

Which is cheaper, Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) or GLM-4.5V (Non-reasoning)?

Both models have similar pricing. Check the detailed breakdown above for input vs output token costs.

Which model performs better on benchmarks?

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) wins 11 out of 12 benchmarks compared to 1 for GLM-4.5V (Non-reasoning). See the detailed benchmark chart above for per-category results.

Which is faster for real-time applications?

GLM-4.5V (Non-reasoning) generates tokens faster at 69 tok/s vs 42 tok/s. Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) also has lower time-to-first-token (0.68s vs 26.97s).

When should I use Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) vs GLM-4.5V (Non-reasoning)?

Choose based on your priorities: both are similarly priced, Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) for stronger benchmark performance, and GLM-4.5V (Non-reasoning) for faster generation. For latency-sensitive apps, check the TTFT comparison above.