Side by Side

Compare Models

Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.

Model A

VS

Model B

Capability Radar

D

DeepSeek V3

DeepSeek

Flagship Open source

81.0

Score

Overall Score

81.0

Arena ELO

1302

MMLU

88.5%

HumanEval

89.1%

MATH

87.2%

GPQA

51.3%

MT-Bench

8.9/10

Context Window

128K

Avg Response

680ms

Input Cost / 1M

$0.27

Output Cost / 1M

$1.1

Open source

✓ Yes

Multimodal

✗ No

Full details →

C

Claude 3.5 Sonnet

Anthropic

Flagship Multimodal

80.1

Score

Overall Score

80.1

Arena ELO

1289

MMLU

88.7%

HumanEval

92.0%

MATH

71.1%

GPQA

59.4%

MT-Bench

9.0/10

Context Window

200K

Avg Response

780ms

Input Cost / 1M

$3.0

Output Cost / 1M

$15.0

Open source

✗ No

Multimodal

✓ Yes

Full details →

Overall Winner

DeepSeek V3 🏆

Leads by 0.9 points (81.0 vs 80.1)