Rankings
Model Leaderboard
Click any column heading to re-sort the table. Green indicates a winner in that metric, lower cost and faster response are highlighted as advantages.
Model Releases by Provider & Month
Sort by:
Overall Score
Human Votes
Knowledge
Coding
Maths
Science
Memory
Speed
Tokens/sec
Lowest Price
| # | Model | Score | ELO | MMLU | Code | Maths | Ctx | ms | TPS | Cost/M | Type |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
o3
OpenAI
|
93.7 | 1391 | 91.6% | 96.4% | 97.8% | 200K | 4200ms | 15 t/s | $10.0 |
Paid
Vision
|
| 2 |
o4-mini
OpenAI
|
91.5 | 1370 | 90.8% | 95.8% | 95.9% | 200K | 2200ms | 80 t/s | $1.1 |
Paid
Vision
|
| 3 |
Claude Opus 4
Anthropic
|
90.7 | 1395 | 93.2% | 95.6% | 86.0% | 200K | 1100ms | 30 t/s | $15.0 |
Paid
Vision
|
| 4 |
Grok 3
xAI
|
90.6 | 1402 | 93.3% | 91.8% | 93.3% | 131K | 900ms | 65 t/s | $3.0 |
Paid
Vision
|
| 5 |
o1
OpenAI
|
89.8 | 1350 | 92.3% | 92.4% | 94.8% | 128K | 4200ms | 28 t/s | $15.0 |
Paid
|
| 6 |
Claude Opus 4.6
Anthropic
|
89.3 | 1360 | 92.0% | 93.0% | 90.0% | 1.0M | 1100ms | 42 t/s | $18.0 |
Paid
Vision
|
| 7 |
Gemini 3 Pro
Google DeepMind
|
88.8 | 1355 | 92.0% | 91.0% | 90.0% | 1.0M | 820ms | 62 t/s | $7.0 |
Paid
Vision
|
| 8 |
Claude 3.7 Sonnet
Anthropic
|
88.6 | 1355 | 90.7% | 93.0% | 96.2% | 200K | 850ms | 80 t/s | $3.0 |
Paid
Vision
|
| 9 |
Claude Sonnet 4.6
Anthropic
|
88.3 | 1374 | 92.1% | 94.8% | 83.7% | 200K | 790ms | 85 t/s | $3.0 |
Paid
Vision
|
| 10 |
DeepSeek R1
DeepSeek
|
88.2 | 1358 | 90.8% | 92.1% | 97.3% | 64K | 2800ms | 20 t/s | Free |
Free
|
| 11 |
Claude Opus 4.5
Anthropic
|
87.8 | 1345 | 91.5% | 91.5% | 89.0% | 200K | 950ms | 46 t/s | $15.0 |
Paid
Vision
|
| 12 |
Claude Sonnet 4.5
Anthropic
|
86.9 | 1362 | 91.7% | 94.2% | 81.5% | 200K | 800ms | 85 t/s | $3.0 |
Paid
Vision
|
| 13 |
Claude Opus 4.1
Anthropic
|
86.9 | 1340 | 91.0% | 91.0% | 88.0% | 200K | 900ms | 48 t/s | $15.0 |
Paid
Vision
|
| 14 |
Gemini 2.5 Pro
Google DeepMind
|
85.9 | 1380 | 90.0% | 87.9% | 91.2% | 1.0M | 1050ms | 120 t/s | $1.25 |
Paid
Vision
|
| 15 |
Claude Sonnet 4
Anthropic
|
85.4 | 1345 | 91.0% | 93.5% | 79.2% | 200K | 820ms | 90 t/s | $3.0 |
Paid
Vision
|
| 16 |
GPT-4.1
OpenAI
|
85.2 | 1340 | 90.2% | 97.1% | 86.5% | 1.0M | 880ms | 75 t/s | $2.0 |
Paid
Vision
|
| 17 |
Grok 3 Mini
xAI
|
85.0 | 1340 | 87.5% | 88.0% | 89.4% | 131K | 560ms | 150 t/s | $0.3 |
Paid
|
| 18 |
Qwen3 72B
Alibaba
|
84.9 | 1320 | 88.0% | 92.5% | 90.0% | 128K | 900ms | 50 t/s | Free |
Free
|
| 19 |
DeepSeek V3.1 671B
DeepSeek
|
83.6 | 1310 | 89.0% | 91.0% | 87.0% | 128K | 1100ms | 20 t/s | Free |
Free
|
| 20 |
Qwen3.5 122B
Alibaba
|
82.2 | 1280 | 87.0% | 94.0% | 89.0% | 128K | 1400ms | 22 t/s | Free |
Free
|
| 21 |
Mistral Large 3 675B
Mistral AI
|
82.1 | 1295 | 88.0% | 90.0% | 85.0% | 128K | 1200ms | 18 t/s | Free |
Free
|
| 22 |
Gemini 2.5 Flash
Google DeepMind
|
81.7 | 1300 | 86.0% | 89.0% | 84.0% | 1.0M | 540ms | 190 t/s | $0.15 |
Paid
Vision
|
| 23 |
DeepSeek V3
DeepSeek
|
81.0 | 1302 | 88.5% | 89.1% | 87.2% | 128K | 680ms | 60 t/s | Free |
Free
|
| 24 |
GPT-OSS 120B
OpenAI
|
80.9 | 1285 | 87.0% | 90.0% | 84.0% | 128K | 1300ms | 25 t/s | Free |
Free
|
| 25 |
Claude 3.5 Sonnet
Anthropic
|
80.1 | 1289 | 88.7% | 92.0% | 71.1% | 200K | 780ms | 75 t/s | $3.0 |
Paid
Vision
|
| 26 |
GPT-4o
OpenAI
|
79.5 | 1285 | 88.7% | 90.2% | 76.6% | 128K | 850ms | 55 t/s | $5.0 |
Paid
Vision
|
| 27 |
Llama 4 Maverick
Meta
|
78.9 | 1285 | 88.7% | 89.8% | 74.9% | 1.0M | 1150ms | 25 t/s | Free |
Free
Vision
|
| 28 |
DeepSeek R1 Distill 70B
DeepSeek
|
78.5 | 1250 | 84.0% | 86.0% | 90.0% | 128K | 1800ms | 40 t/s | Free |
Free
|
| 29 |
Qwen3.5 35B
Alibaba
|
77.9 | 1245 | 83.0% | 92.0% | 86.0% | 128K | 850ms | 55 t/s | Free |
Free
|
| 30 |
Llama 3.1 405B
Meta
|
77.6 | 1266 | 88.6% | 89.0% | 73.8% | 128K | 1200ms | 15 t/s | Free |
Free
|
| 31 |
Seed 1.6
ByteDance
|
77.5 | 1270 | 85.0% | 84.0% | 80.0% | 262K | 720ms | 95 t/s | $0.9 |
Paid
Vision
|
| 32 |
Devstral 2 123B
Mistral AI
|
77.5 | 1255 | 82.0% | 94.0% | 79.0% | 128K | 1500ms | 22 t/s | Free |
Free
|
| 33 |
Qwen 2.5 72B
Alibaba
|
77.4 | 1259 | 86.0% | 86.6% | 83.1% | 128K | 750ms | 60 t/s | Free |
Free
|
| 34 |
Phi-4
Microsoft
|
77.4 | 1280 | 84.8% | 82.6% | 80.4% | 16K | 480ms | 90 t/s | Free |
Free
|
| 35 |
Grok-2
xAI
|
77.3 | 1248 | 87.5% | 88.4% | 76.1% | 131K | 890ms | 80 t/s | $2.0 |
Paid
Vision
|
| 36 |
Llama 3.3 70B
Meta
|
77.0 | 1256 | 86.0% | 88.0% | 77.0% | 128K | 1100ms | 45 t/s | Free |
Free
|
| 37 |
Grok 4.1 Fast
xAI
|
76.9 | 1270 | 84.0% | 84.0% | 78.0% | 2.0M | 500ms | 140 t/s | $3.0 |
Paid
Vision
|
| 38 |
Qwen3 14B
Alibaba
|
76.3 | 1230 | 82.0% | 87.0% | 88.0% | 128K | 680ms | 80 t/s | Free |
Free
|
| 39 |
DeepSeek V2.5 236B
DeepSeek
|
76.2 | 1268 | 80.4% | 89.0% | 75.7% | 128K | 1300ms | 25 t/s | Free |
Free
|
| 40 |
Gemini 2.0 Flash
Google DeepMind
|
75.7 | 1252 | 85.0% | 87.4% | 73.0% | 1.0M | 520ms | 250 t/s | $0.1 |
Paid
Vision
|
| 41 |
Gemma 3 27B
Google DeepMind
|
75.6 | 1290 | 87.5% | 77.2% | 72.0% | 128K | 980ms | 45 t/s | Free |
Free
Vision
|
| 42 |
Qwen3 Coder 30B
Alibaba
|
75.2 | 1240 | 78.0% | 93.0% | 82.0% | 128K | 800ms | 60 t/s | Free |
Free
|
| 43 |
GPT-4.1 mini
OpenAI
|
74.9 | 1230 | 83.5% | 90.0% | 79.8% | 1.0M | 430ms | 180 t/s | $0.4 |
Paid
Vision
|
| 44 |
Qwen3 VL 32B
Alibaba
|
74.6 | 1225 | 81.0% | 84.0% | 85.0% | 128K | 820ms | 55 t/s | Free |
Free
Vision
|
| 45 |
Gemini 1.5 Pro
Google DeepMind
|
74.4 | 1266 | 85.9% | 84.1% | 67.7% | 1.0M | 920ms | 70 t/s | $3.5 |
Paid
Vision
|
| 46 |
Llama 4 Scout
Meta
|
74.1 | 1248 | 87.1% | 86.5% | 67.4% | 10.0M | 680ms | 80 t/s | Free |
Free
Vision
|
| 47 |
Mistral Large 2
Mistral AI
|
73.9 | 1232 | 84.0% | 92.0% | 69.3% | 128K | 650ms | 65 t/s | Free |
Free
|
| 48 |
Cogito 70B
DeepCogito
|
72.9 | 1230 | 82.0% | 82.0% | 75.0% | 128K | 1100ms | 42 t/s | Free |
Free
|
| 49 |
Qwen 2.5 14B
Alibaba
|
72.6 | 1210 | 79.5% | 86.0% | 83.0% | 128K | 620ms | 85 t/s | Free |
Free
|
| 50 |
Llama 3.2 Vision 90B
Meta
|
71.6 | 1228 | 83.0% | 81.0% | 69.0% | 128K | 1100ms | 38 t/s | Free |
Free
Vision
|
| 51 |
Llama 3.1 70B
Meta
|
71.2 | 1220 | 83.6% | 80.5% | 66.4% | 128K | 1200ms | 48 t/s | Free |
Free
|
| 52 |
GPT-4o mini
OpenAI
|
70.0 | 1179 | 82.0% | 87.2% | 70.2% | 128K | 420ms | 130 t/s | $0.15 |
Paid
Vision
|
| 53 |
Claude Haiku 4.5
Anthropic
|
69.5 | 1230 | 78.0% | 78.0% | 68.0% | 200K | 380ms | 160 t/s | $0.8 |
Paid
Vision
|
| 54 |
Nemotron 3 Nano 30B
NVIDIA
|
69.0 | 1210 | 78.0% | 80.0% | 72.0% | 128K | 680ms | 65 t/s | Free |
Free
|
| 55 |
Qwen 2.5 7B
Alibaba
|
68.7 | 1185 | 74.2% | 84.5% | 80.0% | 128K | 380ms | 140 t/s | Free |
Free
|
| 56 |
Devstral Small 2 24B
Mistral AI
|
67.9 | 1195 | 74.0% | 88.0% | 68.0% | 128K | 600ms | 75 t/s | Free |
Free
|
| 57 |
Gemma 3 12B
Google DeepMind
|
67.6 | 1200 | 78.0% | 76.0% | 72.0% | 128K | 560ms | 85 t/s | Free |
Free
Vision
|
| 58 |
Minimax M2.1
MiniMax
|
67.3 | 1215 | 78.0% | 74.0% | 70.0% | 256K | 450ms | 130 t/s | $0.2 |
Paid
|
| 59 |
Mistral Small 3.1
Mistral AI
|
66.4 | 1198 | 81.0% | 74.5% | 67.0% | 128K | 410ms | 140 t/s | $0.1 |
Paid
Vision
|
| 60 |
Gemini 1.5 Flash
Google DeepMind
|
66.0 | 1226 | 78.9% | 71.5% | 58.5% | 1.0M | 480ms | 210 t/s | $0.075 |
Paid
Vision
|
| 61 |
GLM 4.7
Zhipu AI
|
66.0 | 1200 | 75.0% | 76.0% | 70.0% | 128K | 580ms | 95 t/s | Free |
Free
|
| 62 |
GPT-4.1 Nano
OpenAI
|
65.8 | 1210 | 75.0% | 76.0% | 65.0% | 1.0M | 320ms | 250 t/s | $0.1 |
Paid
Vision
|
| 63 |
Qwen3 VL 8B
Alibaba
|
64.8 | 1175 | 72.0% | 78.0% | 72.0% | 32K | 500ms | 100 t/s | Free |
Free
Vision
|
| 64 |
Ministral 3 14B
Mistral AI
|
64.0 | 1185 | 73.0% | 78.0% | 62.0% | 128K | 550ms | 95 t/s | Free |
Free
|
| 65 |
OLMo 3 32B
AllenAI
|
64.0 | 1195 | 75.0% | 74.0% | 62.0% | 4K | 750ms | 60 t/s | Free |
Free
|
| 66 |
Claude 3 Haiku
Anthropic
|
63.2 | 1168 | 75.2% | 75.9% | 60.4% | 200K | 380ms | 140 t/s | $0.25 |
Paid
Vision
|
| 67 |
Mixtral 8x7B
Mistral AI
|
63.0 | 1191 | 70.6% | 75.1% | 58.0% | 32K | 700ms | 55 t/s | Free |
Free
|
| 68 |
Phi-3.5 Mini
Microsoft
|
61.9 | 1150 | 69.0% | 78.0% | 69.0% | 128K | 200ms | 250 t/s | Free |
Free
|
| 69 |
Gemma 2 9B
Google DeepMind
|
61.7 | 1190 | 71.3% | 71.0% | 58.0% | 8K | 450ms | 110 t/s | Free |
Free
|
| 70 |
Command R+
Cohere
|
61.6 | 1155 | 75.7% | 69.6% | 56.7% | 128K | 720ms | 55 t/s | $2.5 |
Paid
|
| 71 |
Mathstral 7B
Mistral AI
|
61.4 | 1165 | 64.0% | 60.0% | 86.0% | 32K | 400ms | 140 t/s | Free |
Free
|
| 72 |
Llama 3.2 Vision 11B
Meta
|
61.2 | 1175 | 73.0% | 72.0% | 58.0% | 128K | 580ms | 90 t/s | Free |
Free
Vision
|
| 73 |
Mistral Nemo 12B
Mistral AI
|
60.9 | 1180 | 68.0% | 75.0% | 55.0% | 128K | 580ms | 95 t/s | Free |
Free
|
| 74 |
Granite Code 34B
IBM
|
60.6 | 1180 | 60.0% | 86.0% | 56.0% | 8K | 880ms | 55 t/s | Free |
Free
|
| 75 |
Llama 3.1 8B
Meta
|
60.5 | 1170 | 73.0% | 72.6% | 51.9% | 128K | 400ms | 120 t/s | Free |
Free
|
| 76 |
Ministral 3 8B
Mistral AI
|
58.2 | 1155 | 67.0% | 74.0% | 52.0% | 128K | 380ms | 160 t/s | Free |
Free
|
| 77 |
Gemma 3 4B
Google DeepMind
|
58.1 | 1160 | 68.0% | 64.0% | 62.0% | 128K | 220ms | 220 t/s | Free |
Free
Vision
|
| 78 |
CodeGemma 7B
Google DeepMind
|
55.4 | 1145 | 54.0% | 82.0% | 50.0% | 8K | 380ms | 140 t/s | Free |
Free
|
| 79 |
Mistral 7B
Mistral AI
|
54.8 | 1141 | 64.2% | 73.0% | 40.5% | 32K | 320ms | 150 t/s | Free |
Free
|
| 80 |
OLMo 3 7B
AllenAI
|
53.4 | 1140 | 65.0% | 64.0% | 45.0% | 4K | 400ms | 130 t/s | Free |
Free
|
| 81 |
Granite Code 8B
IBM
|
50.3 | 1130 | 51.0% | 75.0% | 40.0% | 4K | 380ms | 140 t/s | Free |
Free
|
| 82 |
Llama 3.2 3B
Meta
|
49.8 | 1120 | 63.4% | 58.0% | 40.0% | 128K | 180ms | 280 t/s | Free |
Free
|
| 83 |
Ministral 3 3B
Mistral AI
|
48.5 | 1115 | 61.0% | 55.0% | 42.0% | 128K | 160ms | 320 t/s | Free |
Free
|
| 84 |
Llama 3.2 1B
Meta
|
35.8 | 1070 | 49.3% | 38.0% | 25.0% | 128K | 80ms | 550 t/s | Free |
Free
|
| 85 |
Gemma 3 1B
Google DeepMind
|
35.5 | 1050 | 44.0% | 40.0% | 32.0% | 32K | 90ms | 480 t/s | Free |
Free
|
Overall Scores
Arena ELO Ratings
MATH Benchmark
HumanEval (Coding)
Legend:
■ Green = best in that column ·
Free = free & open-source model (self-hosted) ·
Paid = paid API subscription ·
Vision = handles images ·
Speed and price columns: lower is better.