Phi-4

by Microsoft

Efficient Free & Open Source 🏆 Ranked #34 of 85

77.4

Overall Score

out of 100

About

Microsoft's 14-billion parameter model that challenges models three times its size. Phi-4 was trained on curated high-quality synthetic data, achieving remarkable mathematics and science benchmark scores and demonstrating that data quality can outperform raw scale.

Key Metrics

Context Window

16K

tokens

Avg Response

480

milliseconds

Input Cost

$0.0

per million tokens

Output Cost

$0.0

per million tokens

Arena ELO

1280

Chatbot Arena rating

MT-Bench

8.6

out of 10

Benchmark Scores

MMLU

84.8%

HumanEval

82.6%

MATH

80.4%

GPQA

56.1%

MT-Bench

86.0/10

Capability Profile

Strengths & Limitations

Strengths

Limitations

Ideal Use Cases

Model Details

Provider Microsoft

Released 2024-12-12

Type Free & Open Source

Multimodal No

Tier Efficient

Global rank #34 / 85

Pricing (USD)

Input tokens $0.0/M

Output tokens $0.0/M

Per 1,000 tokens ≈ $0.0000 input / $0.0000 output

All Benchmarks

MMLU 84.8%

HumanEval 82.6%

MATH 80.4%

GPQA 56.1%

MT-Bench 8.6/10

Arena ELO 1280

Compare this model View Rankings

Similar Models

You might also consider

OpenAI's compact reasoning model achieving near-o3 performance at a fraction of the cost. o4-mini uses extended chain-of-thought and achieves exceptional results on mathematics, science, and coding — making advanced reasoning economically accessible.

xAI's compact reasoning model offering excellent maths and logic at a fraction of Grok 3's cost. Grok 3 Mini uses chain-of-thought reasoning and real-time X platform data to punch above its size class.

Google DeepMind's latest fast multimodal model with strong reasoning and a 1 million token context window. Bridges the gap between Flash speed and Pro capability, with thinking mode for harder tasks.