← Back to all models
Q

Qwen3 VL 8B

by Alibaba
Efficient Free & Open Source Multimodal 🏆 Ranked #63 of 85
64.8
Overall Score
out of 100
About

Alibaba's compact 8B vision-language model from the Qwen3 VL family, supporting visual question answering, image description, and multimodal reasoning on consumer-grade hardware.

Key Metrics
Context Window
32K
tokens
Avg Response
500
milliseconds
Input Cost
$0.06
per million tokens
Output Cost
$0.06
per million tokens
Arena ELO
1175
Chatbot Arena rating
MT-Bench
8.2
out of 10
Benchmark Scores
MMLU
72.0%
HumanEval
78.0%
MATH
72.0%
GPQA
34.0%
MT-Bench
82.0/10
Capability Profile
Strengths & Limitations
Strengths
✓ Vision-language ✓ Compact size ✓ Open source ✓ Fast ✓ Consumer GPU friendly
Limitations
⚠ Less capable than larger VL models ⚠ Limited complex visual reasoning
Ideal Use Cases
Visual Q&A Image captioning Document OCR Personal AI Edge vision tasks
Model Details
Provider Alibaba
Released 2025-04-01
Type Free & Open Source
Multimodal Yes
Tier Efficient
Global rank #63 / 85
Pricing (USD)
Input tokens $0.06/M
Output tokens $0.06/M
Per 1,000 tokens ≈ $0.0001 input / $0.0001 output
All Benchmarks
MMLU 72.0%
HumanEval 78.0%
MATH 72.0%
GPQA 34.0%
MT-Bench 8.2/10
Arena ELO 1175
Compare this model View Rankings

You might also consider