NVIDIA

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

AI model by NVIDIA. Real-time pricing and benchmark data.

Pricing (per 1M tokens)

Input$0.60
Output$1.80
Blended (3:1)$0.90

Source: Artificial Analysis

Performance

Output Speed41 tok/s
Time to First Token723ms

Median values from Artificial Analysis

Compare with similar models

ModelInputOutputSpeed
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)Current
$0.60$1.8041 tok/s
MiMo-V2.5
$0.36$1.8088 tok/s
GLM-4.5V (Non-reasoning)
$0.60$1.8053 tok/s
GLM-4.5V (Reasoning)
$0.60$1.8052 tok/s
Qwen3 Coder 480B A35B Instruct
$0.30$1.8067 tok/s
Qwen3 235B A22B (Non-reasoning)
$0.45$1.8068 tok/s

Example Costs

Single Request
$0.0015
1.0K in / 500 out
1K Requests/day
$1.50
1.0M in / 500.0K out
10K Requests/day
$15.00
10.0M in / 5.0M out