← Back to all models
L

Llama 3.2 Vision 90B

by Meta
Flagship Free & Open Source Multimodal 🏆 Ranked #50 of 85
71.6
Overall Score
out of 100
About

Meta's large vision-language model with strong image understanding and text reasoning capabilities. Competes with frontier multimodal models for visual analysis tasks.

Key Metrics
Context Window
128K
tokens
Avg Response
1100
milliseconds
Input Cost
$0.88
per million tokens
Output Cost
$0.88
per million tokens
Arena ELO
1228
Chatbot Arena rating
MT-Bench
8.7
out of 10
Benchmark Scores
MMLU
83.0%
HumanEval
81.0%
MATH
69.0%
GPQA
46.0%
MT-Bench
87.0/10
Capability Profile
Strengths & Limitations
Strengths
✓ Strong vision understanding ✓ Large context ✓ Open source ✓ Competitive multimodal ✓ Fine-tuneable
Limitations
⚠ Requires 60GB+ VRAM ⚠ Slower ⚠ Server hardware recommended
Ideal Use Cases
Advanced vision tasks Medical imaging Document analysis Research Complex visual Q&A
Model Details
Provider Meta
Released 2024-09-25
Type Free & Open Source
Multimodal Yes
Tier Flagship
Global rank #50 / 85
Pricing (USD)
Input tokens $0.88/M
Output tokens $0.88/M
Per 1,000 tokens ≈ $0.0009 input / $0.0009 output
All Benchmarks
MMLU 83.0%
HumanEval 81.0%
MATH 69.0%
GPQA 46.0%
MT-Bench 8.7/10
Arena ELO 1228
Compare this model View Rankings

You might also consider