Grok 3

by xAI

Flagship Paid API Multimodal 🏆 Ranked #4 of 85

90.6

Overall Score

out of 100

About

xAI's most capable model, trained on a 100,000-GPU cluster and setting new benchmarks in mathematics and scientific reasoning. Grok 3 integrates real-time data from the X platform and leads the Arena ELO leaderboard among commercial models.

Key Metrics

Context Window

131K

tokens

Avg Response

900

milliseconds

Input Cost

$3.0

per million tokens

Output Cost

$15.0

per million tokens

Arena ELO

1402

Chatbot Arena rating

MT-Bench

9.2

out of 10

Benchmark Scores

MMLU

93.3%

HumanEval

91.8%

MATH

93.3%

GPQA

72.0%

MT-Bench

92.0/10

Capability Profile

Strengths & Limitations

Strengths

Limitations

Ideal Use Cases

Model Details

Provider xAI

Released 2025-02-17

Type Paid API

Multimodal Yes

Tier Flagship

Global rank #4 / 85

Pricing (USD)

Input tokens $3.0/M

Output tokens $15.0/M

Per 1,000 tokens ≈ $0.0030 input / $0.0150 output

All Benchmarks

MMLU 93.3%

HumanEval 91.8%

MATH 93.3%

GPQA 72.0%

MT-Bench 9.2/10

Arena ELO 1402

Compare this model View Rankings

Similar Models

You might also consider

OpenAI's most powerful reasoning model, using extended chain-of-thought to tackle the hardest problems in mathematics, science, and coding. o3 sets new standards on GPQA and competitive maths at the cost of higher latency and price.

Anthropic's most powerful and intelligent model, built for the most demanding tasks where quality outweighs cost. Claude Opus 4 leads on complex multi-step reasoning, graduate-level science, and nuanced long-form writing.

OpenAI's flagship reasoning model trained with reinforcement learning to think through complex problems step-by-step before responding. Excels at maths, science, and multi-step logic at the cost of higher latency.