Side by Side

Compare Models

Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.

Model A

VS

Model B

Capability Radar

G

Gemini 1.5 Pro

Google DeepMind

Flagship Multimodal

74.4

Score

Overall Score

74.4

Arena ELO

1266

MMLU

85.9%

HumanEval

84.1%

MATH

67.7%

GPQA

46.2%

MT-Bench

8.9/10

Context Window

1.0M

Avg Response

920ms

Input Cost / 1M

$3.5

Output Cost / 1M

$10.5

Open source

✗ No

Multimodal

✓ Yes

Full details →

G

GPT-4o

OpenAI

Flagship Multimodal

79.5

Score

Overall Score

79.5

Arena ELO

1285

MMLU

88.7%

HumanEval

90.2%

MATH

76.6%

GPQA

53.6%

MT-Bench

9.0/10

Context Window

128K

Avg Response

850ms

Input Cost / 1M

$5.0

Output Cost / 1M

$15.0

Open source

✗ No

Multimodal

✓ Yes

Full details →

Overall Winner

GPT-4o 🏆

Leads by 5.1 points (79.5 vs 74.4)