Side by Side
Compare Models
Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.
Capability Radar
Gemini 1.5 Pro
Google DeepMind
Flagship
Multimodal
74.4
Score
Overall Score
74.4
Human Votes (Arena ELO)
1266
General Knowledge (MMLU)
85.9%
Coding (HumanEval)
84.1%
Maths (MATH)
67.7%
Science (GPQA)
46.2%
Conversation (MT-Bench)
8.9/10
Context Window
1.0M
Avg Response
920ms
Input Cost / 1M
$3.5
Output Cost / 1M
$10.5
Free & Open Source
Paid API
Multimodal
✓ Yes
Llama 4 Scout
Meta
Efficient
Open source
Multimodal
74.1
Score
Overall Score
74.1
Human Votes (Arena ELO)
1248
General Knowledge (MMLU)
87.1%
Coding (HumanEval)
86.5%
Maths (MATH)
67.4%
Science (GPQA)
47.1%
Conversation (MT-Bench)
8.6/10
Context Window
10.0M
Avg Response
680ms
Input Cost / 1M
$0.08
Output Cost / 1M
$0.3
Free & Open Source
✓ Free
Multimodal
✓ Yes
Highest Overall Score
Gemini 1.5 Pro 🏆
Scores 74.4 vs 74.1 — leads by 0.3 points out of 100
💡 Llama 4 Scout is free & open source — worth considering if cost matters.