Gemini 1.5 Flash

by Google DeepMind

Efficient Paid API Multimodal 🏆 Ranked #60 of 85

66.0

Overall Score

out of 100

About

Google DeepMind's fast, cost-efficient multimodal model with a 1 million token context window. Ideal for high-volume applications that need capable reasoning at low latency and minimal cost.

Key Metrics

Context Window

1.0M

tokens

Avg Response

480

milliseconds

Input Cost

$0.075

per million tokens

Output Cost

$0.3

per million tokens

Arena ELO

1226

Chatbot Arena rating

MT-Bench

8.5

out of 10

Benchmark Scores

MMLU

78.9%

HumanEval

71.5%

MATH

58.5%

GPQA

39.5%

MT-Bench

85.0/10

Capability Profile

Strengths & Limitations

Strengths

Limitations

Ideal Use Cases

Model Details

Provider Google DeepMind

Released 2024-05-14

Type Paid API

Multimodal Yes

Tier Efficient

Global rank #60 / 85

Pricing (USD)

Input tokens $0.075/M

Output tokens $0.3/M

Per 1,000 tokens ≈ $0.0001 input / $0.0003 output

All Benchmarks

MMLU 78.9%

HumanEval 71.5%

MATH 58.5%

GPQA 39.5%

MT-Bench 8.5/10

Arena ELO 1226

Compare this model View Rankings

Similar Models

You might also consider

OpenAI's compact reasoning model achieving near-o3 performance at a fraction of the cost. o4-mini uses extended chain-of-thought and achieves exceptional results on mathematics, science, and coding — making advanced reasoning economically accessible.

xAI's compact reasoning model offering excellent maths and logic at a fraction of Grok 3's cost. Grok 3 Mini uses chain-of-thought reasoning and real-time X platform data to punch above its size class.

Google DeepMind's latest fast multimodal model with strong reasoning and a 1 million token context window. Bridges the gap between Flash speed and Pro capability, with thinking mode for harder tasks.