Side by Side
Compare Models
Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.
Capability Radar
Gemini 1.5 Pro
Google DeepMind
Flagship
Multimodal
74.4
Score
Overall Score
74.4
Arena ELO
1266
MMLU
85.9%
HumanEval
84.1%
MATH
67.7%
GPQA
46.2%
MT-Bench
8.9/10
Context Window
1.0M
Avg Response
920ms
Input Cost / 1M
$3.5
Output Cost / 1M
$10.5
Open source
✗ No
Multimodal
✓ Yes
GPT-4o
OpenAI
Flagship
Multimodal
79.5
Score
Overall Score
79.5
Arena ELO
1285
MMLU
88.7%
HumanEval
90.2%
MATH
76.6%
GPQA
53.6%
MT-Bench
9.0/10
Context Window
128K
Avg Response
850ms
Input Cost / 1M
$5.0
Output Cost / 1M
$15.0
Open source
✗ No
Multimodal
✓ Yes
Overall Winner
GPT-4o 🏆
Leads by 5.1 points (79.5 vs 74.4)