Llama 3.2 Vision 90B

by Meta

Flagship Free & Open Source Multimodal 🏆 Ranked #50 of 85

71.6

Overall Score

out of 100

About

Meta's large vision-language model with strong image understanding and text reasoning capabilities. Competes with frontier multimodal models for visual analysis tasks.

Key Metrics

Context Window

128K

tokens

Avg Response

1100

milliseconds

Input Cost

$0.88

per million tokens

Output Cost

$0.88

per million tokens

Arena ELO

1228

Chatbot Arena rating

MT-Bench

8.7

out of 10

Benchmark Scores

MMLU

83.0%

HumanEval

81.0%

MATH

69.0%

GPQA

46.0%

MT-Bench

87.0/10

Capability Profile

Strengths & Limitations

Strengths

Limitations

Ideal Use Cases

Model Details

Provider Meta

Released 2024-09-25

Type Free & Open Source

Multimodal Yes

Tier Flagship

Global rank #50 / 85

Pricing (USD)

Input tokens $0.88/M

Output tokens $0.88/M

Per 1,000 tokens ≈ $0.0009 input / $0.0009 output

All Benchmarks

MMLU 83.0%

HumanEval 81.0%

MATH 69.0%

GPQA 46.0%

MT-Bench 8.7/10

Arena ELO 1228

Compare this model View Rankings

Similar Models

You might also consider

OpenAI's most powerful reasoning model, using extended chain-of-thought to tackle the hardest problems in mathematics, science, and coding. o3 sets new standards on GPQA and competitive maths at the cost of higher latency and price.

Anthropic's most powerful and intelligent model, built for the most demanding tasks where quality outweighs cost. Claude Opus 4 leads on complex multi-step reasoning, graduate-level science, and nuanced long-form writing.

xAI's most capable model, trained on a 100,000-GPU cluster and setting new benchmarks in mathematics and scientific reasoning. Grok 3 integrates real-time data from the X platform and leads the Arena ELO leaderboard among commercial models.