Side by Side

Compare Models

Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.

Model A

Model B

Capability Radar

GPT-4o

OpenAI

Flagship Multimodal

79.5

Score

Overall Score

79.5

Human Votes (Arena ELO)

1285

General Knowledge (MMLU)

88.7%

Coding (HumanEval)

90.2%

Maths (MATH)

76.6%

Science (GPQA)

53.6%

Conversation (MT-Bench)

9.0/10

Context Window

128K

Avg Response

850ms

Input Cost / 1M

$5.0

Output Cost / 1M

$15.0

Free & Open Source

Paid API

Multimodal

✓ Yes

Full details →

Claude 3.5 Sonnet

Anthropic

Flagship Multimodal

80.1

Score

Overall Score

80.1

Human Votes (Arena ELO)

1289

General Knowledge (MMLU)

88.7%

Coding (HumanEval)

92.0%

Maths (MATH)

71.1%

Science (GPQA)

59.4%

Conversation (MT-Bench)

9.0/10

Context Window

200K

Avg Response

780ms

Input Cost / 1M

$3.0