All Models

Browse all 85 models grouped by provider.

Filter by
Tier
Provider
85 models across 17 providers
O

OpenAI

9 models
Free & Open Multimodal
Best score 93.7
o
o3
OpenAI
93.7
score
Paid API Multimodal Flagship Top ranked

OpenAI's most powerful reasoning model, using extended chain-of-thought to tackle the hardest problems in mathematics, science, and coding. o3 sets new standards on GPQA and competitive maths at the cost of higher latency and price.

Context window
200K
Avg response
4200ms
Price / 1M tokens
$10.0
Speed (TPS)
15
o
o4-mini
OpenAI
91.5
score
Paid API Multimodal Efficient Top ranked

OpenAI's compact reasoning model achieving near-o3 performance at a fraction of the cost. o4-mini uses extended chain-of-thought and achieves exceptional results on mathematics, science, and coding — making advanced reasoning economically accessible.

Context window
200K
Avg response
2200ms
Price / 1M tokens
$1.1
Speed (TPS)
80
o
o1
OpenAI
89.8
score
Paid API Flagship

OpenAI's flagship reasoning model trained with reinforcement learning to think through complex problems step-by-step before responding. Excels at maths, science, and multi-step logic at the cost of higher latency.

Context window
128K
Avg response
4200ms
Price / 1M tokens
$15.0
Speed (TPS)
28
G
GPT-4.1
OpenAI
85.2
score
Paid API Multimodal Flagship

OpenAI's coding-focused flagship model with a 1 million token context window and top-tier performance on software engineering tasks. GPT-4.1 was specifically optimised for instruction following and agentic coding workflows.

Context window
1.0M
Avg response
880ms
Price / 1M tokens
$2.0
Speed (TPS)
75
G
GPT-OSS 120B
OpenAI
80.9
score
Free Flagship

OpenAI's 120B open-weight language model, making frontier-class performance available for self-hosted and on-premise deployments. The largest member of OpenAI's open-weights family.

Context window
128K
Avg response
1300ms
Price / 1M tokens
Free
Speed (TPS)
25
G
GPT-4o
OpenAI
79.5
score
Paid API Multimodal Flagship

OpenAI's flagship multimodal model combining text, vision, and audio capabilities. GPT-4o delivers state-of-the-art performance across reasoning, coding, and creative tasks whilst offering faster response times than its predecessors.

Context window
128K
Avg response
850ms
Price / 1M tokens
$5.0
Speed (TPS)
55
G
GPT-4.1 mini
OpenAI
74.9
score
Paid API Multimodal Efficient

OpenAI's cost-efficient sibling to GPT-4.1 with a 1 million token context window. GPT-4.1 mini makes strong coding and instruction-following capability available at a low price point, ideal for high-volume developer workflows.

Context window
1.0M
Avg response
430ms
Price / 1M tokens
$0.4
Speed (TPS)
180
G
GPT-4o mini
OpenAI
70.0
score
Paid API Multimodal Efficient

OpenAI's lightweight, cost-efficient model that punches well above its weight class. GPT-4o mini makes advanced AI capabilities accessible for high-volume, cost-sensitive applications without sacrificing too much quality.

Context window
128K
Avg response
420ms
Price / 1M tokens
$0.15
Speed (TPS)
130
G
GPT-4.1 Nano
OpenAI
65.8
score
Paid API Multimodal Efficient

OpenAI's smallest and fastest GPT-4.1 model with vision support and a 1M token context window. Optimised for latency-critical and cost-sensitive applications.

Context window
1.0M
Avg response
320ms
Price / 1M tokens
$0.1
Speed (TPS)
250
A

Anthropic

11 models
Multimodal
Best score 90.7
C
Claude Opus 4
Anthropic
90.7
score
Paid API Multimodal Flagship Top ranked

Anthropic's most powerful and intelligent model, built for the most demanding tasks where quality outweighs cost. Claude Opus 4 leads on complex multi-step reasoning, graduate-level science, and nuanced long-form writing.

Context window
200K
Avg response
1100ms
Price / 1M tokens
$15.0
Speed (TPS)
30
C
Claude Opus 4.6
Anthropic
89.3
score
Paid API Multimodal Flagship

Anthropic's hybrid reasoning flagship with a 1M context window, pushing the frontier for coding and agentic tasks. Combines extended thinking with tool use for complex multi-step workflows.

Context window
1.0M
Avg response
1100ms
Price / 1M tokens
$18.0
Speed (TPS)
42
C
Claude 3.7 Sonnet
Anthropic
88.6
score
Paid API Multimodal Flagship

Anthropic's breakthrough model introducing extended thinking — the ability to reason step-by-step before responding. Claude 3.7 Sonnet achieves best-in-class MATH scores and strong coding, making it Anthropic's strongest release at the Sonnet price point.

Context window
200K
Avg response
850ms
Price / 1M tokens
$3.0
Speed (TPS)
80
C
Claude Sonnet 4.6
Anthropic
88.3
score
Paid API Multimodal Flagship Top ranked

The latest and most capable Sonnet model to date. Claude Sonnet 4.6 brings further gains in mathematical reasoning and instruction following, making it Anthropic's most well-rounded model at the Sonnet price point.

Context window
200K
Avg response
790ms
Price / 1M tokens
$3.0
Speed (TPS)
85
C
Claude Opus 4.5
Anthropic
87.8
score
Paid API Multimodal Flagship

Anthropic's multimodal Opus model offering seamless vision-language interactions with a 200K context window. Designed for the most demanding tasks requiring deep analysis and image comprehension.

Context window
200K
Avg response
950ms
Price / 1M tokens
$15.0
Speed (TPS)
46
C
Claude Sonnet 4.5
Anthropic
86.9
score
Paid API Multimodal Flagship

A refined iteration of Claude Sonnet 4 with improved performance on graduate-level reasoning and coding benchmarks. Claude Sonnet 4.5 delivers notably stronger results on GPQA and competitive maths whilst maintaining the same pricing as its predecessor.

Context window
200K
Avg response
800ms
Price / 1M tokens
$3.0
Speed (TPS)
85
C
Claude Opus 4.1
Anthropic
86.9
score
Paid API Multimodal Flagship

Anthropic's vision-capable Opus model combining visual perception with language reasoning across a 200K context window.

Context window
200K
Avg response
900ms
Price / 1M tokens
$15.0
Speed (TPS)
48
C
Claude Sonnet 4
Anthropic
85.4
score
Paid API Multimodal Flagship

Anthropic's fourth-generation Sonnet model, offering a significant leap in reasoning depth and coding accuracy over the 3.x series. Claude Sonnet 4 introduces refined tool use and improved adherence to complex multi-step instructions.

Context window
200K
Avg response
820ms
Price / 1M tokens
$3.0
Speed (TPS)
90
C
Claude 3.5 Sonnet
Anthropic
80.1
score
Paid API Multimodal Flagship

Anthropic's most intelligent model, excelling at complex reasoning and coding tasks. Claude 3.5 Sonnet sets new benchmarks for intelligence whilst maintaining the safety and harmlessness Anthropic is known for.

Context window
200K
Avg response
780ms
Price / 1M tokens
$3.0
Speed (TPS)
75
C
Claude Haiku 4.5
Anthropic
69.5
score
Paid API Multimodal Efficient

Anthropic's fastest and most affordable vision model with a 200K context window. Ideal for high-volume tasks requiring speed and vision capability at minimal cost.

Context window
200K
Avg response
380ms
Price / 1M tokens
$0.8
Speed (TPS)
160
C
Claude 3 Haiku
Anthropic
63.2
score
Paid API Multimodal Efficient

Anthropic's fastest and most compact model, designed for near-instant responsiveness in demanding applications. Claude 3 Haiku delivers excellent value for tasks requiring speed at scale whilst maintaining Anthropic's commitment to safety.

Context window
200K
Avg response
380ms
Price / 1M tokens
$0.25
Speed (TPS)
140
x

xAI

4 models
Multimodal
Best score 90.6
G
Grok 3
xAI
90.6
score
Paid API Multimodal Flagship Top ranked

xAI's most capable model, trained on a 100,000-GPU cluster and setting new benchmarks in mathematics and scientific reasoning. Grok 3 integrates real-time data from the X platform and leads the Arena ELO leaderboard among commercial models.

Context window
131K
Avg response
900ms
Price / 1M tokens
$3.0
Speed (TPS)
65
G
Grok 3 Mini
xAI
85.0
score
Paid API Efficient

xAI's compact reasoning model offering excellent maths and logic at a fraction of Grok 3's cost. Grok 3 Mini uses chain-of-thought reasoning and real-time X platform data to punch above its size class.

Context window
131K
Avg response
560ms
Price / 1M tokens
$0.3
Speed (TPS)
150
G
Grok-2
xAI
77.3
score
Paid API Multimodal Flagship

xAI's flagship model, built with real-time access to information from the X platform. Grok-2 takes a distinctive approach to AI with a more candid, less filtered personality and strong performance on complex reasoning tasks.

Context window
131K
Avg response
890ms
Price / 1M tokens
$2.0
Speed (TPS)
80
G
Grok 4.1 Fast
xAI
76.9
score
Paid API Multimodal Efficient

xAI's fast vision-language model with a 2M token context window, combining visual reasoning with near real-time response speeds. Built for high-throughput production workloads.

Context window
2.0M
Avg response
500ms
Price / 1M tokens
$3.0
Speed (TPS)
140
G

Google DeepMind

12 models
Free & Open Multimodal
Best score 88.8
G
Gemini 3 Pro
Google DeepMind
88.8
score
Paid API Multimodal Flagship

Google DeepMind's next-generation Pro model with integrated vision understanding and a 1M token context window. Delivers frontier-level reasoning and multimodal analysis.

Context window
1.0M
Avg response
820ms
Price / 1M tokens
$7.0
Speed (TPS)
62
G
Gemini 2.5 Pro
Google DeepMind
85.9
score
Paid API Multimodal Flagship Top ranked

Google DeepMind's most advanced model, with standout performance in mathematics, science, and long-context reasoning. Gemini 2.5 Pro features a 1 million token context window and an experimental 2 million token mode, alongside strong multimodal capabilities.

Context window
1.0M
Avg response
1050ms
Price / 1M tokens
$1.25
Speed (TPS)
120
G
Gemini 2.5 Flash
Google DeepMind
81.7
score
Paid API Multimodal Efficient

Google DeepMind's latest fast multimodal model with strong reasoning and a 1 million token context window. Bridges the gap between Flash speed and Pro capability, with thinking mode for harder tasks.

Context window
1.0M
Avg response
540ms
Price / 1M tokens
$0.15
Speed (TPS)
190
G
Gemini 2.0 Flash
Google DeepMind
75.7
score
Paid API Multimodal Efficient

Google DeepMind's next-generation fast model offering impressive performance at a fraction of the cost. Gemini 2.0 Flash brings multimodal capabilities and a massive context window to real-time applications.

Context window
1.0M
Avg response
520ms
Price / 1M tokens
$0.1
Speed (TPS)
250
G
Gemma 3 27B
Google DeepMind
75.6
score
Free Multimodal Efficient

Google DeepMind's flagship open-source model in the Gemma 3 family, capable of running on a single high-end GPU. Gemma 3 27B supports text and images and delivers strong benchmarks across knowledge, reasoning, and instruction following — making it one of the best self-hostable multimodal models available.

Context window
128K
Avg response
980ms
Price / 1M tokens
Free
Speed (TPS)
45
G
Gemini 1.5 Pro
Google DeepMind
74.4
score
Paid API Multimodal Flagship

Google DeepMind's highly capable multimodal model with a groundbreaking 1 million token context window. Gemini 1.5 Pro excels at long-document analysis, video understanding, and complex cross-modal tasks.

Context window
1.0M
Avg response
920ms
Price / 1M tokens
$3.5
Speed (TPS)
70
G
Gemma 3 12B
Google DeepMind
67.6
score
Free Multimodal Efficient

Google DeepMind's capable 12B model from the latest Gemma 3 series. Features multimodal vision capabilities and a 128K context window, outperforming many larger models whilst fitting on a single consumer GPU.

Context window
128K
Avg response
560ms
Price / 1M tokens
Free
Speed (TPS)
85
G
Gemini 1.5 Flash
Google DeepMind
66.0
score
Paid API Multimodal Efficient

Google DeepMind's fast, cost-efficient multimodal model with a 1 million token context window. Ideal for high-volume applications that need capable reasoning at low latency and minimal cost.

Context window
1.0M
Avg response
480ms
Price / 1M tokens
$0.075
Speed (TPS)
210
G
Gemma 2 9B
Google DeepMind
61.7
score
Free Efficient

Google DeepMind's 9B open-source model from the Gemma 2 family, using interleaved local and global attention. Gemma 2 9B competes with models twice its size and is one of the best-performing small open-source models available.

Context window
8K
Avg response
450ms
Price / 1M tokens
Free
Speed (TPS)
110
G
Gemma 3 4B
Google DeepMind
58.1
score
Free Multimodal Efficient

Google DeepMind's lightweight 4B model from the Gemma 3 family. Designed to run on phones and laptops, it includes vision understanding and a 128K context window — remarkable capabilities for a sub-5B model.

Context window
128K
Avg response
220ms
Price / 1M tokens
Free
Speed (TPS)
220
C
CodeGemma 7B
Google DeepMind
55.4
score
Free Efficient

Google DeepMind's code-specialised 7B model built on Gemma, pre-trained on 500B+ tokens of code. Excels at code completion, generation, and natural language to code conversion.

Context window
8K
Avg response
380ms
Price / 1M tokens
Free
Speed (TPS)
140
G
Gemma 3 1B
Google DeepMind
35.5
score
Free Efficient

Google DeepMind's smallest Gemma 3 model at 1B parameters, designed for on-device inference with a 32K context window. Suitable for edge applications where memory is the primary constraint.

Context window
32K
Avg response
90ms
Price / 1M tokens
Free
Speed (TPS)
480
D

DeepSeek

5 models
Free & Open
Best score 88.2
D
DeepSeek R1
DeepSeek
88.2
score
Free Flagship

DeepSeek's open-source reasoning model trained with reinforcement learning to rival OpenAI's o1. DeepSeek R1 achieves exceptional scores on mathematics and scientific reasoning benchmarks, making advanced chain-of-thought reasoning accessible to everyone.

Context window
64K
Avg response
2800ms
Price / 1M tokens
Free
Speed (TPS)
20
D
DeepSeek V3.1 671B
DeepSeek
83.6
score
Free Flagship

DeepSeek's updated flagship MoE model with 671B total parameters and improved capability over V3. Balances frontier performance with efficient inference through sparse mixture-of-experts architecture.

Context window
128K
Avg response
1100ms
Price / 1M tokens
Free
Speed (TPS)
20
D
DeepSeek V3
DeepSeek
81.0
score
Free Flagship

DeepSeek's breakthrough open-source model that shocked the AI industry with frontier-level performance at a fraction of the training cost. DeepSeek V3 demonstrates that cutting-edge AI is no longer exclusive to the largest technology companies.

Context window
128K
Avg response
680ms
Price / 1M tokens
Free
Speed (TPS)
60
D
DeepSeek R1 Distill 70B
DeepSeek
78.5
score
Free Flagship

A Llama-3.3-70B model distilled from the full DeepSeek R1 reasoning model, inheriting chain-of-thought reasoning capabilities at a fraction of the compute cost. One of the strongest open-source reasoning models available.

Context window
128K
Avg response
1800ms
Price / 1M tokens
Free
Speed (TPS)
40
D
DeepSeek V2.5 236B
DeepSeek
76.2
score
Free Flagship

DeepSeek's 236B MoE model merging V2 Chat and Coder capabilities. A strong open-source model for combined reasoning and coding tasks at manageable inference cost.

Context window
128K
Avg response
1300ms
Price / 1M tokens
Free
Speed (TPS)
25
A

Alibaba

10 models
Free & Open Multimodal
Best score 84.9
Q
Qwen3 72B
Alibaba
84.9
score
Free Flagship

Alibaba's latest flagship open-source model with a unique dual-mode operation: fast standard responses or an extended thinking mode for harder problems. Qwen3 72B achieves frontier-class mathematics and coding scores as a fully open-source model, rivalling paid APIs.

Context window
128K
Avg response
900ms
Price / 1M tokens
Free
Speed (TPS)
50
Q
Qwen3.5 122B
Alibaba
82.2
score
Free Flagship

Alibaba's 122B flagship model from the Qwen3.5 series, offering frontier-level coding and technical performance in a large open-weight package.

Context window
128K
Avg response
1400ms
Price / 1M tokens
Free
Speed (TPS)
22
Q
Qwen3.5 35B
Alibaba
77.9
score
Free Flagship

Alibaba's latest 35B code-specialised model from the Qwen3.5 series, targeting software development and technical reasoning with expertise across major programming languages.

Context window
128K
Avg response
850ms
Price / 1M tokens
Free
Speed (TPS)
55
Q
Qwen 2.5 72B
Alibaba
77.4
score
Free Flagship

Alibaba's flagship open-source model demonstrating remarkable capability, especially in mathematics and coding. Qwen 2.5 72B offers an outstanding balance of performance and accessibility, making advanced AI widely available.

Context window
128K
Avg response
750ms
Price / 1M tokens
Free
Speed (TPS)
60
Q
Qwen3 14B
Alibaba
76.3
score
Free Efficient

Alibaba's latest 14B model from the Qwen3 series featuring hybrid thinking mode. Supports seamless switching between deep reasoning and fast response, with state-of-the-art maths and coding for its size.

Context window
128K
Avg response
680ms
Price / 1M tokens
Free
Speed (TPS)
80
Q
Qwen3 Coder 30B
Alibaba
75.2
score
Free Flagship

Alibaba's code-specialised 30B model from the Qwen3 series, with expertise across major programming languages and agentic coding workflows. Leads open-source coding benchmarks at its parameter class.

Context window
128K
Avg response
800ms
Price / 1M tokens
Free
Speed (TPS)
60
Q
Qwen3 VL 32B
Alibaba
74.6
score
Free Multimodal Flagship

Alibaba's capable 32B vision-language model combining visual perception with powerful language reasoning. Strong at document understanding, visual maths, and complex image analysis.

Context window
128K
Avg response
820ms
Price / 1M tokens
Free
Speed (TPS)
55
Q
Qwen 2.5 14B
Alibaba
72.6
score
Free Efficient

Alibaba's mid-size 14B model from the Qwen 2.5 series. Strikes an excellent balance between capability and compute requirements, outperforming many larger models on maths and coding benchmarks.

Context window
128K
Avg response
620ms
Price / 1M tokens
Free
Speed (TPS)
85
Q
Qwen 2.5 7B
Alibaba
68.7
score
Free Efficient

Alibaba's compact 7B model from the Qwen 2.5 series, punching well above its weight class in mathematics and coding. Runs comfortably on 8GB VRAM with remarkable benchmark scores for its size.

Context window
128K
Avg response
380ms
Price / 1M tokens
Free
Speed (TPS)
140
Q
Qwen3 VL 8B
Alibaba
64.8
score
Free Multimodal Efficient

Alibaba's compact 8B vision-language model from the Qwen3 VL family, supporting visual question answering, image description, and multimodal reasoning on consumer-grade hardware.

Context window
32K
Avg response
500ms
Price / 1M tokens
Free
Speed (TPS)
100
M

Mistral AI

12 models
Free & Open Multimodal
Best score 82.1
M
Mistral Large 3 675B
Mistral AI
82.1
score
Free Flagship

Mistral AI's largest open-weight model at 675B parameters, offering frontier performance in an open-source package. Engineered for fast, responsive interactions at scale.

Context window
128K
Avg response
1200ms
Price / 1M tokens
Free
Speed (TPS)
18
D
Devstral 2 123B
Mistral AI
77.5
score
Free Flagship

Mistral AI's large 123B coding model offering frontier code generation and completion across all major programming languages. Built for production software engineering at scale.

Context window
128K
Avg response
1500ms
Price / 1M tokens
Free
Speed (TPS)
22
M
Mistral Large 2
Mistral AI
73.9
score
Free Flagship

Mistral AI's most powerful model, developed with a focus on efficiency and European AI sovereignty. Mistral Large 2 excels at coding tasks and multilingual applications, particularly for European languages.

Context window
128K
Avg response
650ms
Price / 1M tokens
Free
Speed (TPS)
65
D
Devstral Small 2 24B
Mistral AI
67.9
score
Free Efficient

Mistral AI's compact 24B coding specialist model, designed for intelligent code completion, debugging, and software engineering workflows. Fast enough for interactive development tooling.

Context window
128K
Avg response
600ms
Price / 1M tokens
Free
Speed (TPS)
75
M
Mistral Small 3.1
Mistral AI
66.4
score
Paid API Multimodal Efficient

Mistral AI's compact yet capable API model designed for cost-effective deployments. Mistral Small 3.1 delivers strong multilingual performance and instruction following at a fraction of the cost of flagship models.

Context window
128K
Avg response
410ms
Price / 1M tokens
$0.1
Speed (TPS)
140
M
Ministral 3 14B
Mistral AI
64.0
score
Free Efficient

Mistral AI's 14B model from the Ministral 3 series, offering strong performance at a size that fits comfortably on a single 16GB GPU with quantisation.

Context window
128K
Avg response
550ms
Price / 1M tokens
Free
Speed (TPS)
95
M
Mixtral 8x7B
Mistral AI
63.0
score
Free Flagship

Mistral AI's Mixture-of-Experts model activating 2 of 8 expert networks per token, matching GPT-3.5 quality at much lower inference cost. A landmark open-source model for quality-efficiency balance.

Context window
32K
Avg response
700ms
Price / 1M tokens
Free
Speed (TPS)
55
M
Mathstral 7B
Mistral AI
61.4
score
Free Efficient

Mistral AI's mathematics-specialised 7B model, fine-tuned for step-by-step mathematical reasoning and problem solving. Achieves top scores on maths benchmarks for its parameter class.

Context window
32K
Avg response
400ms
Price / 1M tokens
Free
Speed (TPS)
140
M
Mistral Nemo 12B
Mistral AI
60.9
score
Free Efficient

A compact yet highly capable 12B model developed jointly by Mistral AI and NVIDIA. Uses a 128K context window and a new tokenizer (Tekken) optimised for multilingual content, balancing size and performance.

Context window
128K
Avg response
580ms
Price / 1M tokens
Free
Speed (TPS)
95
M
Ministral 3 8B
Mistral AI
58.2
score
Free Efficient

Mistral AI's efficient 8B model from the Ministral 3 series, balancing speed and capability for production use. Strong instruction following with a 128K context window.

Context window
128K
Avg response
380ms
Price / 1M tokens
Free
Speed (TPS)
160
M
Mistral 7B
Mistral AI
54.8
score
Free Efficient

The model that put open-source LLMs on the map. Mistral 7B outperformed Llama 2 13B on most benchmarks at half the parameters, using grouped-query attention and sliding window attention for efficiency.

Context window
32K
Avg response
320ms
Price / 1M tokens
Free
Speed (TPS)
150
M
Ministral 3 3B
Mistral AI
48.5
score
Free Efficient

Mistral AI's ultra-compact 3B model optimised for speed and cost-effectiveness. Ideal for edge deployment and high-volume applications where minimal latency matters most.

Context window
128K
Avg response
160ms
Price / 1M tokens
Free
Speed (TPS)
320
M

Meta

10 models
Free & Open Multimodal
Best score 78.9
L
Llama 4 Maverick
Meta
78.9
score
Free Multimodal Flagship

Meta's flagship fourth-generation model using a Mixture-of-Experts architecture for efficient high-quality inference. Llama 4 Maverick delivers frontier-class performance as a fully open-source model with a 1 million token context window.

Context window
1.0M
Avg response
1150ms
Price / 1M tokens
Free
Speed (TPS)
25
L
Llama 3.1 405B
Meta
77.6
score
Free Flagship

Meta's largest open-source model, competing directly with proprietary frontier models. Llama 3.1 405B can be self-hosted and fine-tuned, offering unmatched flexibility for organisations with data privacy requirements.

Context window
128K
Avg response
1200ms
Price / 1M tokens
Free
Speed (TPS)
15
L
Llama 3.3 70B
Meta
77.0
score
Free Flagship

Meta's latest 70B model, matching Llama 3.1 405B quality at a fraction of the compute cost. Llama 3.3 70B is the go-to open-source model for users with a single consumer GPU capable of running 70B weights.

Context window
128K
Avg response
1100ms
Price / 1M tokens
Free
Speed (TPS)
45
L
Llama 4 Scout
Meta
74.1
score
Free Multimodal Efficient

Meta's efficient Llama 4 model optimised for speed and cost. Despite being the lighter of the two Llama 4 releases, Scout achieves strong benchmark results and features an extraordinary 10 million token context window — the largest of any model.

Context window
10.0M
Avg response
680ms
Price / 1M tokens
Free
Speed (TPS)
80
L
Llama 3.2 Vision 90B
Meta
71.6
score
Free Multimodal Flagship

Meta's large vision-language model with strong image understanding and text reasoning capabilities. Competes with frontier multimodal models for visual analysis tasks.

Context window
128K
Avg response
1100ms
Price / 1M tokens
Free
Speed (TPS)
38
L
Llama 3.1 70B
Meta
71.2
score
Free Flagship

Meta's 70B parameter instruction-tuned model from the Llama 3.1 family. A powerful open-source alternative to paid APIs with a large 128K context window and strong multilingual capabilities.

Context window
128K
Avg response
1200ms
Price / 1M tokens
Free
Speed (TPS)
48
L
Llama 3.2 Vision 11B
Meta
61.2
score
Free Multimodal Efficient

Meta's compact vision-language model supporting image understanding and multimodal conversations. Runs on a single consumer GPU with 12GB VRAM and offers solid visual question answering capabilities.

Context window
128K
Avg response
580ms
Price / 1M tokens
Free
Speed (TPS)
90
L
Llama 3.1 8B
Meta
60.5
score
Free Efficient

Meta's lightweight 8B model from the Llama 3.1 family. The most accessible large language model for consumer hardware — runs on a laptop GPU with 8GB VRAM whilst punching well above its weight class.

Context window
128K
Avg response
400ms
Price / 1M tokens
Free
Speed (TPS)
120
L
Llama 3.2 3B
Meta
49.8
score
Free Efficient

Meta's ultra-compact 3B model designed for edge and on-device deployment. Llama 3.2 3B runs entirely on CPU or low-end GPUs with surprisingly capable text understanding for its size.

Context window
128K
Avg response
180ms
Price / 1M tokens
Free
Speed (TPS)
280
L
Llama 3.2 1B
Meta
35.8
score
Free Efficient

Meta's smallest Llama model, designed for on-device and embedded deployments. Llama 3.2 1B runs entirely on CPU and low-power devices with a 128K context window despite its tiny footprint.

Context window
128K
Avg response
80ms
Price / 1M tokens
Free
Speed (TPS)
550