Side by Side
Compare Models
Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.
Capability Radar
DeepSeek V3
DeepSeek
Flagship
Open source
81.0
Score
Overall Score
81.0
Arena ELO
1302
MMLU
88.5%
HumanEval
89.1%
MATH
87.2%
GPQA
51.3%
MT-Bench
8.9/10
Context Window
128K
Avg Response
680ms
Input Cost / 1M
$0.27
Output Cost / 1M
$1.1
Open source
✓ Yes
Multimodal
✗ No
Claude 3.5 Sonnet
Anthropic
Flagship
Multimodal
80.1
Score
Overall Score
80.1
Arena ELO
1289
MMLU
88.7%
HumanEval
92.0%
MATH
71.1%
GPQA
59.4%
MT-Bench
9.0/10
Context Window
200K
Avg Response
780ms
Input Cost / 1M
$3.0
Output Cost / 1M
$15.0
Open source
✗ No
Multimodal
✓ Yes
Overall Winner
DeepSeek V3 🏆
Leads by 0.9 points (81.0 vs 80.1)