Llama 3.2 3B

by Meta

Efficient Free & Open Source 🏆 Ranked #82 of 85

49.8

Overall Score

out of 100

About

Meta's ultra-compact 3B model designed for edge and on-device deployment. Llama 3.2 3B runs entirely on CPU or low-end GPUs with surprisingly capable text understanding for its size.

Key Metrics

Context Window

128K

tokens

Avg Response

180

milliseconds

Input Cost

$0.015

per million tokens

Output Cost

$0.025

per million tokens

Arena ELO

1120

Chatbot Arena rating

MT-Bench

7.6

out of 10

Benchmark Scores

MMLU

63.4%

HumanEval

58.0%

MATH

40.0%

GPQA

24.0%

MT-Bench

76.0/10

Capability Profile

Strengths & Limitations

Strengths

Limitations

Ideal Use Cases

Model Details

Provider Meta

Released 2024-09-25

Type Free & Open Source

Multimodal No

Tier Efficient

Global rank #82 / 85

Pricing (USD)

Input tokens $0.015/M

Output tokens $0.025/M

Per 1,000 tokens ≈ $0.0000 input / $0.0000 output

All Benchmarks

MMLU 63.4%

HumanEval 58.0%

MATH 40.0%

GPQA 24.0%

MT-Bench 7.6/10

Arena ELO 1120

Compare this model View Rankings

Similar Models

You might also consider

OpenAI's compact reasoning model achieving near-o3 performance at a fraction of the cost. o4-mini uses extended chain-of-thought and achieves exceptional results on mathematics, science, and coding — making advanced reasoning economically accessible.

xAI's compact reasoning model offering excellent maths and logic at a fraction of Grok 3's cost. Grok 3 Mini uses chain-of-thought reasoning and real-time X platform data to punch above its size class.

Google DeepMind's latest fast multimodal model with strong reasoning and a 1 million token context window. Bridges the gap between Flash speed and Pro capability, with thinking mode for harder tasks.