Side by Side

Compare Models

Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.

Model A

VS

Model B

Capability Radar

G

Gemini 1.5 Pro

Google DeepMind

Flagship Multimodal

74.4

Score

Overall Score

74.4

Human Votes (Arena ELO)

1266

General Knowledge (MMLU)

85.9%

Coding (HumanEval)

84.1%

Maths (MATH)

67.7%

Science (GPQA)

46.2%

Conversation (MT-Bench)

8.9/10

Context Window

1.0M

Avg Response

920ms

Input Cost / 1M

$3.5

Output Cost / 1M

$10.5

Free & Open Source

Paid API

Multimodal

✓ Yes

Full details →

L

Llama 4 Scout

Meta

Efficient Open source Multimodal

74.1

Score

Overall Score

74.1

Human Votes (Arena ELO)

1248

General Knowledge (MMLU)

87.1%

Coding (HumanEval)

86.5%

Maths (MATH)

67.4%

Science (GPQA)

47.1%

Conversation (MT-Bench)

8.6/10

Context Window

10.0M

Avg Response

680ms

Input Cost / 1M

$0.08

Output Cost / 1M

$0.3

Free & Open Source

✓ Free

Multimodal

✓ Yes

Full details →

Highest Overall Score

Gemini 1.5 Pro 🏆

Scores 74.4 vs 74.1 — leads by 0.3 points out of 100

💡 Llama 4 Scout is free & open source — worth considering if cost matters.