Compare Models

Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.

VS
Capability Radar
L
Llama 3.1 405B
Meta
Flagship Open source
77.6
Score
Overall Score
77.6
Arena ELO
1266
MMLU
88.6%
HumanEval
89.0%
MATH
73.8%
GPQA
51.1%
MT-Bench
8.9/10
Context Window
128K
Avg Response
1200ms
Input Cost / 1M
$3.0
Output Cost / 1M
$3.0
Open source
✓ Yes
Multimodal
✗ No
M
Mistral Large 2
Mistral AI
Flagship Open source
73.9
Score
Overall Score
73.9
Arena ELO
1232
MMLU
84.0%
HumanEval
92.0%
MATH
69.3%
GPQA
45.0%
MT-Bench
8.6/10
Context Window
128K
Avg Response
650ms
Input Cost / 1M
$3.0
Output Cost / 1M
$9.0
Open source
✓ Yes
Multimodal
✗ No
Overall Winner
Llama 3.1 405B 🏆
Leads by 3.7 points (77.6 vs 73.9)