← Back to all models
Q

Qwen3 VL 32B

by Alibaba
Flagship Free & Open Source Multimodal 🏆 Ranked #44 of 85
74.6
Overall Score
out of 100
About

Alibaba's capable 32B vision-language model combining visual perception with powerful language reasoning. Strong at document understanding, visual maths, and complex image analysis.

Key Metrics
Context Window
128K
tokens
Avg Response
820
milliseconds
Input Cost
$0.2
per million tokens
Output Cost
$0.6
per million tokens
Arena ELO
1225
Chatbot Arena rating
MT-Bench
8.8
out of 10
Benchmark Scores
MMLU
81.0%
HumanEval
84.0%
MATH
85.0%
GPQA
48.0%
MT-Bench
88.0/10
Capability Profile
Strengths & Limitations
Strengths
✓ Strong visual reasoning ✓ Document understanding ✓ Open source ✓ Multimodal ✓ Large context
Limitations
⚠ Requires 24GB+ VRAM ⚠ Heavier than 8B ⚠ Newer with less community testing
Ideal Use Cases
Visual maths Document analysis Image Q&A Research Multimodal workflows
Model Details
Provider Alibaba
Released 2025-04-01
Type Free & Open Source
Multimodal Yes
Tier Flagship
Global rank #44 / 85
Pricing (USD)
Input tokens $0.2/M
Output tokens $0.6/M
Per 1,000 tokens ≈ $0.0002 input / $0.0006 output
All Benchmarks
MMLU 81.0%
HumanEval 84.0%
MATH 85.0%
GPQA 48.0%
MT-Bench 8.8/10
Arena ELO 1225
Compare this model View Rankings

You might also consider