Find the fastest AI models ranked by tokens per second and time to first token. Perfect for real-time applications, streaming, and latency-sensitive workloads.
| # | Model | Score | Benchmarks | Input $/M | Output $/M | Speed | TTFT |
|---|---|---|---|---|---|---|---|
| 1 | Mercury 2 Inception | 90 | 56 | $0.25 | $0.75 | 907 | 3.76s |
| 2 | 65 | 29 | $0.06 | $0.25 | 524 | 8.68s | |
| 3 | 65 | 59 | $0.30 | $0.75 | 370 | 0.55s | |
| 4 | 65 | 62 | $0.10 | $0.40 | 365 | 3.39s | |
| 5 | gpt-oss-20B (high) OpenAI | 64 | 69 | $0.06 | $0.20 | 316 | 0.41s |
| 6 | gpt-oss-120B (high) OpenAI | 63 | 80 | $0.15 | $0.60 | 253 | 0.49s |
| 7 | 62 | 51 | $0.10 | $0.40 | 344 | 0.36s | |
| 8 | gpt-oss-20B (low) OpenAI | 62 | 57 | $0.06 | $0.20 | 321 | 0.46s |
| 9 | GPT-5 Codex (high) OpenAI | 62 | 93 | $1.25 | $10.00 | 216 | 12.05s |
| 10 | 62 | 96 | $0.50 | $3.00 | 180 | 6.33s | |
| 11 | GPT-5.4 mini (xhigh) OpenAI | 61 | 78 | $0.75 | $4.50 | 219 | 3.82s |
| 12 | 60 | 77 | $2.00 | $6.00 | 238 | 10.94s | |
| 13 | 60 | 85 | $0.25 | $2.00 | 186 | 6.58s | |
| 14 | gpt-oss-120B (low) OpenAI | 60 | 62 | $0.15 | $0.60 | 255 | 0.51s |
| 15 | 59 | 72 | $0.30 | $0.50 | 197 | 0.38s |
Models are scored using a weighted combination of benchmarks, pricing, and speed metrics relevant to this use case.
Speed depends on the metric. Check the rankings above for both output speed (tokens/s) and TTFT (time to first token). Smaller models and those optimized for inference typically lead.
Not necessarily. Some smaller models achieve excellent benchmark scores while being much faster. The rankings above show both speed and quality so you can find the best tradeoff.
For a good user experience: TTFT under 1 second, output speed above 50 tok/s. For premium feel: TTFT under 0.3s, speed above 100 tok/s. Streaming helps mask latency.