Side by Side
Compare Models
Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.
Capability Radar
Llama 4 Maverick
Meta
Flagship
Open source
Multimodal
78.9
Score
Overall Score
78.9
Human Votes (Arena ELO)
1285
General Knowledge (MMLU)
88.7%
Coding (HumanEval)
89.8%
Maths (MATH)
74.9%
Science (GPQA)
52.8%
Conversation (MT-Bench)
8.9/10
Context Window
1.0M
Avg Response
1150ms
Input Cost / 1M
$0.19
Output Cost / 1M
$0.65
Free & Open Source
✓ Free
Multimodal
✓ Yes
Mistral Large 2
Mistral AI
Flagship
Open source
73.9
Score
Overall Score
73.9
Human Votes (Arena ELO)
1232
General Knowledge (MMLU)
84.0%
Coding (HumanEval)
92.0%
Maths (MATH)
69.3%
Science (GPQA)
45.0%
Conversation (MT-Bench)
8.6/10
Context Window
128K
Avg Response
650ms
Input Cost / 1M
$3.0
Output Cost / 1M
$9.0
Free & Open Source
✓ Free
Multimodal
✗ No
Highest Overall Score
Llama 4 Maverick 🏆
Scores 78.9 vs 73.9 — leads by 5.0 points out of 100