Gemini 2.5 Flash

by Google DeepMind

Efficient Paid API Multimodal 🏆 Ranked #22 of 85

81.7

Overall Score

out of 100

About

Google DeepMind's latest fast multimodal model with strong reasoning and a 1 million token context window. Bridges the gap between Flash speed and Pro capability, with thinking mode for harder tasks.

Key Metrics

Context Window

1.0M

tokens

Avg Response

540

milliseconds

Input Cost

$0.15

per million tokens

Output Cost

$0.6

per million tokens

Arena ELO

1300

Chatbot Arena rating

MT-Bench

9.0

out of 10

Benchmark Scores

MMLU

86.0%

HumanEval

89.0%

MATH

84.0%

GPQA

62.0%

MT-Bench

90.0/10

Capability Profile

Strengths & Limitations

Strengths

Limitations

Ideal Use Cases

Model Details

Provider Google DeepMind

Released 2025-05-20

Type Paid API

Multimodal Yes

Tier Efficient

Global rank #22 / 85

Pricing (USD)

Input tokens $0.15/M

Output tokens $0.6/M

Per 1,000 tokens ≈ $0.0001 input / $0.0006 output

All Benchmarks

MMLU 86.0%

HumanEval 89.0%

MATH 84.0%

GPQA 62.0%

MT-Bench 9.0/10

Arena ELO 1300

Compare this model View Rankings

Similar Models

You might also consider

OpenAI's compact reasoning model achieving near-o3 performance at a fraction of the cost. o4-mini uses extended chain-of-thought and achieves exceptional results on mathematics, science, and coding — making advanced reasoning economically accessible.

xAI's compact reasoning model offering excellent maths and logic at a fraction of Grok 3's cost. Grok 3 Mini uses chain-of-thought reasoning and real-time X platform data to punch above its size class.

Efficient Open source

Microsoft's 14-billion parameter model that challenges models three times its size. Phi-4 was trained on curated high-quality synthetic data, achieving remarkable mathematics and science benchmark scores and demonstrating that data quality can outperform raw scale.