Rankings

Model Leaderboard

Click any column heading to re-sort the table. Green indicates a winner in that metric, lower cost and faster response are highlighted as advantages.

Model Releases by Provider & Month

Sort by: Overall Score Human Votes Knowledge Coding Maths Science Memory Speed Tokens/sec Lowest Price

#	Model	Score ↓	ELO ↕	MMLU ↕	Code ↕	Maths ↕	Ctx ↕	ms ↕	TPS ↕	Cost/M ↕	Type
1	o o3 OpenAI	93.7	1391	91.6%	96.4%	97.8%	200K	4200ms	15 t/s	$10.0	Paid Vision
2	o o4-mini OpenAI	91.5	1370	90.8%	95.8%	95.9%	200K	2200ms	80 t/s	$1.1	Paid Vision
3	C Claude Opus 4 Anthropic	90.7	1395	93.2%	95.6%	86.0%	200K	1100ms	30 t/s	$15.0	Paid Vision
4	G Grok 3 xAI	90.6	1402	93.3%	91.8%	93.3%	131K	900ms	65 t/s	$3.0	Paid Vision
5	o o1 OpenAI	89.8	1350	92.3%	92.4%	94.8%	128K	4200ms	28 t/s	$15.0	Paid
6	C Claude Opus 4.6 Anthropic	89.3	1360	92.0%	93.0%	90.0%	1.0M	1100ms	42 t/s	$18.0	Paid Vision
7	G Gemini 3 Pro Google DeepMind	88.8	1355	92.0%	91.0%	90.0%	1.0M	820ms	62 t/s	$7.0	Paid Vision
8	C Claude 3.7 Sonnet Anthropic	88.6	1355	90.7%	93.0%	96.2%	200K	850ms	80 t/s	$3.0	Paid Vision
9	C Claude Sonnet 4.6 Anthropic	88.3	1374	92.1%	94.8%	83.7%	200K	790ms	85 t/s	$3.0	Paid Vision
10	D DeepSeek R1 DeepSeek	88.2	1358	90.8%	92.1%	97.3%	64K	2800ms	20 t/s	Free	Free
11	C Claude Opus 4.5 Anthropic	87.8	1345	91.5%	91.5%	89.0%	200K	950ms	46 t/s	$15.0	Paid Vision
12	C Claude Sonnet 4.5 Anthropic	86.9	1362	91.7%	94.2%	81.5%	200K	800ms	85 t/s	$3.0	Paid Vision
13	C Claude Opus 4.1 Anthropic	86.9	1340	91.0%	91.0%	88.0%	200K	900ms	48 t/s	$15.0	Paid Vision
14	G Gemini 2.5 Pro Google DeepMind	85.9	1380	90.0%	87.9%	91.2%	1.0M	1050ms	120 t/s	$1.25	Paid Vision
15	C Claude Sonnet 4 Anthropic	85.4	1345	91.0%	93.5%	79.2%	200K	820ms	90 t/s	$3.0	Paid Vision
16	G GPT-4.1 OpenAI	85.2	1340	90.2%	97.1%	86.5%	1.0M	880ms	75 t/s	$2.0	Paid Vision
17	G Grok 3 Mini xAI	85.0	1340	87.5%	88.0%	89.4%	131K	560ms	150 t/s	$0.3	Paid
18	Q Qwen3 72B Alibaba	84.9	1320	88.0%	92.5%	90.0%	128K	900ms	50 t/s	Free	Free
19	D DeepSeek V3.1 671B DeepSeek	83.6	1310	89.0%	91.0%	87.0%	128K	1100ms	20 t/s	Free	Free
20	Q Qwen3.5 122B Alibaba	82.2	1280	87.0%	94.0%	89.0%	128K	1400ms	22 t/s	Free	Free
21	M Mistral Large 3 675B Mistral AI	82.1	1295	88.0%	90.0%	85.0%	128K	1200ms	18 t/s	Free	Free
22	G Gemini 2.5 Flash Google DeepMind	81.7	1300	86.0%	89.0%	84.0%	1.0M	540ms	190 t/s	$0.15	Paid Vision
23	D DeepSeek V3 DeepSeek	81.0	1302	88.5%	89.1%	87.2%	128K	680ms	60 t/s	Free	Free
24	G GPT-OSS 120B OpenAI	80.9	1285	87.0%	90.0%	84.0%	128K	1300ms	25 t/s	Free	Free
25	C Claude 3.5 Sonnet Anthropic	80.1	1289	88.7%	92.0%	71.1%	200K	780ms	75 t/s	$3.0	Paid Vision
26	G GPT-4o OpenAI	79.5	1285	88.7%	90.2%	76.6%	128K	850ms	55 t/s	$5.0	Paid Vision
27	L Llama 4 Maverick Meta	78.9	1285	88.7%	89.8%	74.9%	1.0M	1150ms	25 t/s	Free	Free Vision
28	D DeepSeek R1 Distill 70B DeepSeek	78.5	1250	84.0%	86.0%	90.0%	128K	1800ms	40 t/s	Free	Free
29	Q Qwen3.5 35B Alibaba	77.9	1245	83.0%	92.0%	86.0%	128K	850ms	55 t/s	Free	Free
30	L Llama 3.1 405B Meta	77.6	1266	88.6%	89.0%	73.8%	128K	1200ms	15 t/s	Free	Free
31	S Seed 1.6 ByteDance	77.5	1270	85.0%	84.0%	80.0%	262K	720ms	95 t/s	$0.9	Paid Vision
32	D Devstral 2 123B Mistral AI	77.5	1255	82.0%	94.0%	79.0%	128K	1500ms	22 t/s	Free	Free
33	Q Qwen 2.5 72B Alibaba	77.4	1259	86.0%	86.6%	83.1%	128K	750ms	60 t/s	Free	Free
34	P Phi-4 Microsoft	77.4	1280	84.8%	82.6%	80.4%	16K	480ms	90 t/s	Free	Free
35	G Grok-2 xAI	77.3	1248	87.5%	88.4%	76.1%	131K	890ms	80 t/s	$2.0	Paid Vision
36	L Llama 3.3 70B Meta	77.0	1256	86.0%	88.0%	77.0%	128K	1100ms	45 t/s	Free	Free
37	G Grok 4.1 Fast xAI	76.9	1270	84.0%	84.0%	78.0%	2.0M	500ms	140 t/s	$3.0	Paid Vision
38	Q Qwen3 14B Alibaba	76.3	1230	82.0%	87.0%	88.0%	128K	680ms	80 t/s	Free	Free
39	D DeepSeek V2.5 236B DeepSeek	76.2	1268	80.4%	89.0%	75.7%	128K	1300ms	25 t/s	Free	Free
40	G Gemini 2.0 Flash Google DeepMind	75.7	1252	85.0%	87.4%	73.0%	1.0M	520ms	250 t/s	$0.1	Paid Vision
41	G Gemma 3 27B Google DeepMind	75.6	1290	87.5%	77.2%	72.0%	128K	980ms	45 t/s	Free	Free Vision
42	Q Qwen3 Coder 30B Alibaba	75.2	1240	78.0%	93.0%	82.0%	128K	800ms	60 t/s	Free	Free
43	G GPT-4.1 mini OpenAI	74.9	1230	83.5%	90.0%	79.8%	1.0M	430ms	180 t/s	$0.4	Paid Vision
44	Q Qwen3 VL 32B Alibaba	74.6	1225	81.0%	84.0%	85.0%	128K	820ms	55 t/s	Free	Free Vision
45	G Gemini 1.5 Pro Google DeepMind	74.4	1266	85.9%	84.1%	67.7%	1.0M	920ms	70 t/s	$3.5	Paid Vision
46	L Llama 4 Scout Meta	74.1	1248	87.1%	86.5%	67.4%	10.0M	680ms	80 t/s	Free	Free Vision
47	M Mistral Large 2 Mistral AI	73.9	1232	84.0%	92.0%	69.3%	128K	650ms	65 t/s	Free	Free
48	C Cogito 70B DeepCogito	72.9	1230	82.0%	82.0%	75.0%	128K	1100ms	42 t/s	Free	Free
49	Q Qwen 2.5 14B Alibaba	72.6	1210	79.5%	86.0%	83.0%	128K	620ms	85 t/s	Free	Free
50	L Llama 3.2 Vision 90B Meta	71.6	1228	83.0%	81.0%	69.0%	128K	1100ms	38 t/s	Free	Free Vision
51	L Llama 3.1 70B Meta	71.2	1220	83.6%	80.5%	66.4%	128K	1200ms	48 t/s	Free	Free
52	G GPT-4o mini OpenAI	70.0	1179	82.0%	87.2%	70.2%	128K	420ms	130 t/s	$0.15	Paid Vision
53	C Claude Haiku 4.5 Anthropic	69.5	1230	78.0%	78.0%	68.0%	200K	380ms	160 t/s	$0.8	Paid Vision
54	N Nemotron 3 Nano 30B NVIDIA	69.0	1210	78.0%	80.0%	72.0%	128K	680ms	65 t/s	Free	Free
55	Q Qwen 2.5 7B Alibaba	68.7	1185	74.2%	84.5%	80.0%	128K	380ms	140 t/s	Free	Free
56	D Devstral Small 2 24B Mistral AI	67.9	1195	74.0%	88.0%	68.0%	128K	600ms	75 t/s	Free	Free
57	G Gemma 3 12B Google DeepMind	67.6	1200	78.0%	76.0%	72.0%	128K	560ms	85 t/s	Free	Free Vision
58	M Minimax M2.1 MiniMax	67.3	1215	78.0%	74.0%	70.0%	256K	450ms	130 t/s	$0.2	Paid
59	M Mistral Small 3.1 Mistral AI	66.4	1198	81.0%	74.5%	67.0%	128K	410ms	140 t/s	$0.1	Paid Vision
60	G Gemini 1.5 Flash Google DeepMind	66.0	1226	78.9%	71.5%	58.5%	1.0M	480ms	210 t/s	$0.075	Paid Vision
61	G GLM 4.7 Zhipu AI	66.0	1200	75.0%	76.0%	70.0%	128K	580ms	95 t/s	Free	Free
62	G GPT-4.1 Nano OpenAI	65.8	1210	75.0%	76.0%	65.0%	1.0M	320ms	250 t/s	$0.1	Paid Vision
63	Q Qwen3 VL 8B Alibaba	64.8	1175	72.0%	78.0%	72.0%	32K	500ms	100 t/s	Free	Free Vision
64	M Ministral 3 14B Mistral AI	64.0	1185	73.0%	78.0%	62.0%	128K	550ms	95 t/s	Free	Free
65	O OLMo 3 32B AllenAI	64.0	1195	75.0%	74.0%	62.0%	4K	750ms	60 t/s	Free	Free
66	C Claude 3 Haiku Anthropic	63.2	1168	75.2%	75.9%	60.4%	200K	380ms	140 t/s	$0.25	Paid Vision
67	M Mixtral 8x7B Mistral AI	63.0	1191	70.6%	75.1%	58.0%	32K	700ms	55 t/s	Free	Free
68	P Phi-3.5 Mini Microsoft	61.9	1150	69.0%	78.0%	69.0%	128K	200ms	250 t/s	Free	Free
69	G Gemma 2 9B Google DeepMind	61.7	1190	71.3%	71.0%	58.0%	8K	450ms	110 t/s	Free	Free
70	C Command R+ Cohere	61.6	1155	75.7%	69.6%	56.7%	128K	720ms	55 t/s	$2.5	Paid
71	M Mathstral 7B Mistral AI	61.4	1165	64.0%	60.0%	86.0%	32K	400ms	140 t/s	Free	Free
72	L Llama 3.2 Vision 11B Meta	61.2	1175	73.0%	72.0%	58.0%	128K	580ms	90 t/s	Free	Free Vision
73	M Mistral Nemo 12B Mistral AI	60.9	1180	68.0%	75.0%	55.0%	128K	580ms	95 t/s	Free	Free
74	G Granite Code 34B IBM	60.6	1180	60.0%	86.0%	56.0%	8K	880ms	55 t/s	Free	Free
75	L Llama 3.1 8B Meta	60.5	1170	73.0%	72.6%	51.9%	128K	400ms	120 t/s	Free	Free
76	M Ministral 3 8B Mistral AI	58.2	1155	67.0%	74.0%	52.0%	128K	380ms	160 t/s	Free	Free
77	G Gemma 3 4B Google DeepMind	58.1	1160	68.0%	64.0%	62.0%	128K	220ms	220 t/s	Free	Free Vision
78	C CodeGemma 7B Google DeepMind	55.4	1145	54.0%	82.0%	50.0%	8K	380ms	140 t/s	Free	Free
79	M Mistral 7B Mistral AI	54.8	1141	64.2%	73.0%	40.5%	32K	320ms	150 t/s	Free	Free
80	O OLMo 3 7B AllenAI	53.4	1140	65.0%	64.0%	45.0%	4K	400ms	130 t/s	Free	Free
81	G Granite Code 8B IBM	50.3	1130	51.0%	75.0%	40.0%	4K	380ms	140 t/s	Free	Free
82	L Llama 3.2 3B Meta	49.8	1120	63.4%	58.0%	40.0%	128K	180ms	280 t/s	Free	Free
83	M Ministral 3 3B Mistral AI	48.5	1115	61.0%	55.0%	42.0%	128K	160ms	320 t/s	Free	Free
84	L Llama 3.2 1B Meta	35.8	1070	49.3%	38.0%	25.0%	128K	80ms	550 t/s	Free	Free
85	G Gemma 3 1B Google DeepMind	35.5	1050	44.0%	40.0%	32.0%	32K	90ms	480 t/s	Free	Free

Overall Scores

Arena ELO Ratings

MATH Benchmark

HumanEval (Coding)

Legend: ■ Green = best in that column · Free = free & open-source model (self-hosted) · Paid = paid API subscription · Vision = handles images · Speed and price columns: lower is better.