Qwen 2.5 14B

by Alibaba

Efficient Free & Open Source 🏆 Ranked #49 of 85

72.6

Overall Score

out of 100

About

Alibaba's mid-size 14B model from the Qwen 2.5 series. Strikes an excellent balance between capability and compute requirements, outperforming many larger models on maths and coding benchmarks.

Key Metrics

Context Window

128K

tokens

Avg Response

620

milliseconds

Input Cost

$0.14

per million tokens

Output Cost

$0.28

per million tokens

Arena ELO

1210

Chatbot Arena rating

MT-Bench

8.6

out of 10

Benchmark Scores

MMLU

79.5%

HumanEval

86.0%

MATH

83.0%

GPQA

42.0%

MT-Bench

86.0/10

Capability Profile

Strengths & Limitations

Strengths

Limitations

Ideal Use Cases

Model Details

Provider Alibaba

Released 2024-09-18

Type Free & Open Source

Multimodal No

Tier Efficient

Global rank #49 / 85

Pricing (USD)

Input tokens $0.14/M

Output tokens $0.28/M

Per 1,000 tokens ≈ $0.0001 input / $0.0003 output

All Benchmarks

MMLU 79.5%

HumanEval 86.0%

MATH 83.0%

GPQA 42.0%

MT-Bench 8.6/10

Arena ELO 1210

Compare this model View Rankings

Similar Models

You might also consider

OpenAI's compact reasoning model achieving near-o3 performance at a fraction of the cost. o4-mini uses extended chain-of-thought and achieves exceptional results on mathematics, science, and coding — making advanced reasoning economically accessible.

xAI's compact reasoning model offering excellent maths and logic at a fraction of Grok 3's cost. Grok 3 Mini uses chain-of-thought reasoning and real-time X platform data to punch above its size class.

Google DeepMind's latest fast multimodal model with strong reasoning and a 1 million token context window. Bridges the gap between Flash speed and Pro capability, with thinking mode for harder tasks.