← Back to all models
G

GPT-4.1

by OpenAI
Flagship Proprietary Multimodal 🏆 Ranked #8 of 22
85.2
Overall Score
out of 100
About

OpenAI's coding-focused flagship model with a 1 million token context window and top-tier performance on software engineering tasks. GPT-4.1 was specifically optimised for instruction following and agentic coding workflows.

Key Metrics
Context Window
1.0M
tokens
Avg Response
880
milliseconds
Input Cost
$2.0
per million tokens
Output Cost
$8.0
per million tokens
Arena ELO
1340
Chatbot Arena rating
MT-Bench
9.0
out of 10
Benchmark Scores
MMLU
90.2%
HumanEval
97.1%
MATH
86.5%
GPQA
56.8%
MT-Bench
90.0/10
Capability Profile
Strengths & Limitations
Strengths
✓ Best-in-class coding ✓ 1M context window ✓ Strong instruction following ✓ Agentic capabilities ✓ Multimodal
Limitations
⚠ Less focus on pure reasoning than o-series ⚠ Higher cost ⚠ May over-engineer solutions
Ideal Use Cases
Large codebase analysis Software engineering agents Long document coding tasks API integrations Test generation
Model Details
Provider OpenAI
Released 2025-04-14
Open source No
Multimodal Yes
Tier Flagship
Global rank #8 / 22
Pricing (USD)
Input tokens $2.0/M
Output tokens $8.0/M
Per 1,000 tokens ≈ $0.0020 input / $0.0080 output
All Benchmarks
MMLU 90.2%
HumanEval 97.1%
MATH 86.5%
GPQA 56.8%
MT-Bench 9.0/10
Arena ELO 1340
Compare this model View leaderboard

You might also consider