Compare

Gemini 2.5 Ultra vs GPT-5 Turbo

Google DeepMind vs OpenAI. Specs, benchmarks, and real per-task cost — all in one page.

Verdict

GPT-5 Turbo leads on LMSYS ELO (1398 vs 1385).

Google DeepMind

2M-token context, native video understanding, and Google’s deepest multimodal stack. The long-context king.

2M context windowNative videoBest multimodal reasoning

OpenAI

OpenAI’s flagship unified model. Handles text, vision, and audio natively. The generalist benchmark champion.

Native multimodalStrong math and scienceHuge ecosystem

Pricing

Input / 1M$7.00$6.00

Output / 1M$21.00$24.00

Context2.0M400K

Max output64K32K

LicenseProprietaryProprietary

Released2026-02-052026-01-21

Benchmarks

LMSYS ELO1385.01398.0

MMLU Pro90.491.2

HumanEval90.293.0

SWE-Bench58.365.8

MATH92.091.5

GPQA—68.1

MMMU82.178.8

IFEval89.788.9

Per-task cost

Summarize a 1-hour meeting transcript$0.115$0.102

Review a 500-line pull request$0.098$0.096

Answer a customer support ticket$0.038$0.037

Extract structured data from a resume$0.056$0.054

Debug a stack trace with context$0.084$0.084

Per-call cost using published token counts for each task. Real-world prompts vary.

Add up to four models, tweak tasks live.