Best For/Fastest AI Models

Fastest AI Models

Find the fastest AI models ranked by tokens per second and time to first token. Perfect for real-time applications, streaming, and latency-sensitive workloads.

Output speed (tokens/s)Time to first token (TTFT)Throughput at scaleQuality-speed tradeoff
🥇#1 Pick
Inception

Mercury 2

Overall Score90
Price
$0.38/M
Speed
907 tok/s
Compare with #2
🥈#2 Pick
IBM

Granite 4.0 H Small

Overall Score65
Price
$0.11/M
Speed
524 tok/s
Compare with #1
🥉#3 Pick
NVIDIA

NVIDIA Nemotron 3 Super 120B A12B (Reasoning)

Overall Score65
Price
$0.41/M
Speed
370 tok/s
Compare with #1
Sort by:
#ModelScoreBenchmarksInput $/MOutput $/MSpeedTTFT
1
Mercury 2
Inception
90
56$0.25$0.759073.76s
2
65
29$0.06$0.255248.68s
3
65
59$0.30$0.753700.55s
4
65
62$0.10$0.403653.39s
5
64
69$0.06$0.203160.41s
6
63
80$0.15$0.602530.49s
7
62
51$0.10$0.403440.36s
8
62
57$0.06$0.203210.46s
9
62
93$1.25$10.0021612.05s
10
62
96$0.50$3.001806.33s
11
61
78$0.75$4.502193.82s
12
60
77$2.00$6.0023810.94s
13
60
85$0.25$2.001866.58s
14
60
62$0.15$0.602550.51s
15
59
72$0.30$0.501970.38s

Scoring Weights for Fastest AI Models

Models are scored using a weighted combination of benchmarks, pricing, and speed metrics relevant to this use case.

Intelligence Index
6%
Coding Index
4%
MMLU-Pro
4%
IFBench
3%
Math Index
3%
Price
10%
Speed
45%
Latency
25%

💡 Tips

  • For real-time streaming UIs, TTFT under 0.5s feels instant to users
  • Faster models aren't always worse — some achieve excellent quality at high speed
  • Consider using fast models for draft generation, then a stronger model for refinement

⚠️ Things to Consider

  • Speed varies by provider, region, and current load
  • Benchmarked speeds are median values — peak and off-peak can differ significantly

Frequently Asked Questions

Which AI model is the fastest in 2026?

Speed depends on the metric. Check the rankings above for both output speed (tokens/s) and TTFT (time to first token). Smaller models and those optimized for inference typically lead.

Does faster mean worse quality?

Not necessarily. Some smaller models achieve excellent benchmark scores while being much faster. The rankings above show both speed and quality so you can find the best tradeoff.

What speed do I need for a real-time chat application?

For a good user experience: TTFT under 1 second, output speed above 50 tok/s. For premium feel: TTFT under 0.3s, speed above 100 tok/s. Streaming helps mask latency.