Llama 3.2 1B

by Meta

Efficient Free & Open Source 🏆 Ranked #84 of 85

35.8

Overall Score

out of 100

About

Meta's smallest Llama model, designed for on-device and embedded deployments. Llama 3.2 1B runs entirely on CPU and low-power devices with a 128K context window despite its tiny footprint.

Key Metrics

Context Window

128K

tokens

Avg Response

milliseconds

Input Cost

$0.01

per million tokens

Output Cost

$0.01

per million tokens

Arena ELO

1070

Chatbot Arena rating

MT-Bench

6.5

out of 10

Benchmark Scores

MMLU

49.3%

HumanEval

38.0%

MATH

25.0%

GPQA

15.0%

MT-Bench

65.0/10

Capability Profile

Strengths & Limitations

Strengths

Limitations

Ideal Use Cases

Model Details

Provider Meta

Released 2024-09-25

Type Free & Open Source

Multimodal No

Tier Efficient

Global rank #84 / 85

Pricing (USD)

Input tokens $0.01/M

Output tokens $0.01/M

Per 1,000 tokens ≈ $0.0000 input / $0.0000 output

All Benchmarks

MMLU 49.3%

HumanEval 38.0%

MATH 25.0%

GPQA 15.0%

MT-Bench 6.5/10

Arena ELO 1070

Compare this model View Rankings

Similar Models

You might also consider

OpenAI's compact reasoning model achieving near-o3 performance at a fraction of the cost. o4-mini uses extended chain-of-thought and achieves exceptional results on mathematics, science, and coding — making advanced reasoning economically accessible.

xAI's compact reasoning model offering excellent maths and logic at a fraction of Grok 3's cost. Grok 3 Mini uses chain-of-thought reasoning and real-time X platform data to punch above its size class.

Google DeepMind's latest fast multimodal model with strong reasoning and a 1 million token context window. Bridges the gap between Flash speed and Pro capability, with thinking mode for harder tasks.