Compare Models

Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.

VS
Capability Radar
D
DeepSeek V3
DeepSeek
Flagship Open source
81.0
Score
Overall Score
81.0
Arena ELO
1302
MMLU
88.5%
HumanEval
89.1%
MATH
87.2%
GPQA
51.3%
MT-Bench
8.9/10
Context Window
128K
Avg Response
680ms
Input Cost / 1M
$0.27
Output Cost / 1M
$1.1
Open source
✓ Yes
Multimodal
✗ No
C
Claude 3.5 Sonnet
Anthropic
Flagship Multimodal
80.1
Score
Overall Score
80.1
Arena ELO
1289
MMLU
88.7%
HumanEval
92.0%
MATH
71.1%
GPQA
59.4%
MT-Bench
9.0/10
Context Window
200K
Avg Response
780ms
Input Cost / 1M
$3.0
Output Cost / 1M
$15.0
Open source
✗ No
Multimodal
✓ Yes
Overall Winner
DeepSeek V3 🏆
Leads by 0.9 points (81.0 vs 80.1)