Side by Side

Compare Models

Select two models to compare their benchmarks, capabilities, pricing, and performance metrics.

Model A

Model B

Capability Radar

Llama 4 Maverick

Meta

Flagship Open source Multimodal

78.9

Score

Overall Score

78.9

Human Votes (Arena ELO)

1285

General Knowledge (MMLU)

88.7%

Coding (HumanEval)

89.8%

Maths (MATH)

74.9%

Science (GPQA)

52.8%

Conversation (MT-Bench)

8.9/10

Context Window

1.0M

Avg Response

1150ms

Input Cost / 1M

$0.19

Output Cost / 1M

$0.65

Free & Open Source

✓ Free

Multimodal

✓ Yes

Full details →

Mistral Large 2

Mistral AI

Flagship Open source

73.9

Score

Overall Score

73.9

Human Votes (Arena ELO)

1232

General Knowledge (MMLU)

84.0%

Coding (HumanEval)

92.0%

Maths (MATH)

69.3%

Science (GPQA)

45.0%

Conversation (MT-Bench)

8.6/10

Context Window

128K

Avg Response

650ms

Input Cost / 1M

$3.0

Output Cost / 1M

$9.0

Free & Open Source

✓ Free

Multimodal

✗ No

Full details →

Highest Overall Score

Llama 4 Maverick 🏆

Scores 78.9 vs 73.9 — leads by 5.0 points out of 100