Side by Side
Compare Models
Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.
Capability Radar
Llama 3.1 405B
Meta
Flagship
Open source
77.6
Score
Overall Score
77.6
Arena ELO
1266
MMLU
88.6%
HumanEval
89.0%
MATH
73.8%
GPQA
51.1%
MT-Bench
8.9/10
Context Window
128K
Avg Response
1200ms
Input Cost / 1M
$3.0
Output Cost / 1M
$3.0
Open source
✓ Yes
Multimodal
✗ No
Mistral Large 2
Mistral AI
Flagship
Open source
73.9
Score
Overall Score
73.9
Arena ELO
1232
MMLU
84.0%
HumanEval
92.0%
MATH
69.3%
GPQA
45.0%
MT-Bench
8.6/10
Context Window
128K
Avg Response
650ms
Input Cost / 1M
$3.0
Output Cost / 1M
$9.0
Open source
✓ Yes
Multimodal
✗ No
Overall Winner
Llama 3.1 405B 🏆
Leads by 3.7 points (77.6 vs 73.9)