Side by Side
Compare Models
Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.
Capability Radar
GPT-4o
OpenAI
Flagship
Multimodal
79.5
Score
Overall Score
79.5
Human Votes (Arena ELO)
1285
General Knowledge (MMLU)
88.7%
Coding (HumanEval)
90.2%
Maths (MATH)
76.6%
Science (GPQA)
53.6%
Conversation (MT-Bench)
9.0/10
Context Window
128K
Avg Response
850ms
Input Cost / 1M
$5.0
Output Cost / 1M
$15.0
Free & Open Source
Paid API
Multimodal
✓ Yes
DeepSeek V3
DeepSeek
Flagship
Open source
81.0
Score
Overall Score
81.0
Human Votes (Arena ELO)
1302
General Knowledge (MMLU)
88.5%
Coding (HumanEval)
89.1%
Maths (MATH)
87.2%
Science (GPQA)
51.3%
Conversation (MT-Bench)
8.9/10
Context Window
128K
Avg Response
680ms
Input Cost / 1M
$0.27
Output Cost / 1M
$1.1
Free & Open Source
✓ Free
Multimodal
✗ No
Highest Overall Score
DeepSeek V3 🏆
Scores 81.0 vs 79.5 — leads by 1.5 points out of 100
💡 DeepSeek V3 is free & open source — the stronger model with no subscription cost.