Compare/LFM2 24B A2B vs Llama 3.1 Instruct 8B

LFM2 24B A2BvsLlama 3.1 Instruct 8B

Side-by-side comparison of pricing, 12 benchmarks, and generation speed.

Liquid AI

LFM2 24B A2B

Input
$0.03/M
Output
$0.12/M
Speed
211 tok/s
TTFT
0.22s
Meta

Llama 3.1 Instruct 8B

Input
$0.1/M
Output
$0.1/M
Speed
194 tok/s
TTFT
0.47s

Winner by Category

Cheaper
LFM2 24B A2B
Faster (tok/s)
LFM2 24B A2B
Lower Latency
LFM2 24B A2B
Benchmarks (2-10)
Llama 3.1 Instruct 8B

Pricing Comparison

MetricLFM2 24B A2BLlama 3.1 Instruct 8B
Input ($/M tokens)$0.03$0.1
Output ($/M tokens)$0.12$0.1
Cost for 1M input + 100K output tokens:
LFM2 24B A2B$0.04
Llama 3.1 Instruct 8B$0.11

Speed Comparison

Output Speed (tokens/s) — higher is better
LFM2 24B A2B
211 tok/s
Llama 3.1 Instruct 8B
194 tok/s
Time to First Token (seconds) — lower is better
LFM2 24B A2B
0.22s
Llama 3.1 Instruct 8B
0.47s

Benchmark Comparison

Data from Artificial Analysis API — 12 benchmarks

Intelligence Index
10.511.8
Coding Index
3.64.9
Math Index
4.3
GPQA Diamond
47.4%25.9%
MMLU-Pro
47.6%
LiveCodeBench
11.6%
AIME 2025
4.3%
MATH-500
51.9%
Humanity's Last Exam
4.4%5.1%
SciCode
10.9%13.2%
IFBench
45.9%28.6%
TerminalBench
0.0%0.8%
LFM2 24B A2B2 wins
10 winsLlama 3.1 Instruct 8B

Frequently Asked Questions

Which is cheaper, LFM2 24B A2B or Llama 3.1 Instruct 8B?

LFM2 24B A2B is cheaper overall. Its blended price (3:1 input/output ratio) is $0.05/M tokens vs $0.10/M for Llama 3.1 Instruct 8B.

Which model performs better on benchmarks?

Llama 3.1 Instruct 8B wins 10 out of 12 benchmarks compared to 2 for LFM2 24B A2B. See the detailed benchmark chart above for per-category results.

Which is faster for real-time applications?

LFM2 24B A2B generates tokens faster at 211 tok/s vs 194 tok/s. LFM2 24B A2B also has lower time-to-first-token (0.22s vs 0.47s).

When should I use LFM2 24B A2B vs Llama 3.1 Instruct 8B?

Choose based on your priorities: LFM2 24B A2B for lower cost, Llama 3.1 Instruct 8B for stronger benchmark performance, and LFM2 24B A2B for faster generation. For latency-sensitive apps, check the TTFT comparison above.