The leaderboard.
Every model we track, sortable by ELO, benchmarks, cost, and context. Live-seeded from public sources and updated as new scores land.
- 01141292.894.472.1$15.00$75.001MClaude Opus 4.7Anthropic
Anthropic’s frontier reasoning model. Leads SWE-Bench and holds top ranks on long-context coding and agentic workflows.
ELO 1412$15.00 - 02139891.293.065.8$6.00$24.00400KGPT-5 TurboOpenAI
OpenAI’s flagship unified model. Handles text, vision, and audio natively. The generalist benchmark champion.
ELO 1398$6.00 - 03138590.490.258.3$7.00$21.002MGemini 2.5 UltraGoogle DeepMind
2M-token context, native video understanding, and Google’s deepest multimodal stack. The long-context king.
ELO 1385$7.00 - 04136888.792.164.5$3.00$15.00500KClaude Sonnet 4.6Anthropic
The workhorse. Near-Opus quality at 1/5 the cost. The default choice for production code and agent workloads.
ELO 1368$3.00 - 05135886.288.4—$5.00$15.00256KGrok 4xAI
xAI’s latest. Real-time X search integration, strong on current events and meme-literate tasks.
ELO 1358$5.00 - 06134287.189.352.4$2.50$8.00256KLlama 4 BehemothMetaOpen
Meta’s open-weights flagship. 405B params, fully open license, runs on every major inference provider.
ELO 1342$2.50 - 07133485.491.554.1$0.14$0.28128KDeepSeek V4DeepSeekOpen
The shock-the-market moment of 2026. Frontier coding quality at 1/50th the price of Opus.
ELO 1334$0.14 - 08131883.588.0—$2.00$6.00256KMistral X-LargeMistralOpen
European frontier model. EU-hosted inference, strong European language coverage, Apache-licensed weights.
ELO 1318$2.00 - 09131282.185.8—$0.30$1.201MGemini 2.5 FlashGoogle DeepMind
Google’s price/performance darling. 1M context at $0.30/M input — nobody comes close on throughput-per-dollar.
ELO 1312$0.30 - 10130581.487.049.2$0.80$4.00200KClaude Haiku 4.5Anthropic
Fast, cheap, and smart enough for most routing and extraction tasks. Great base model for subagents.
ELO 1305$0.80 - 11129584.087.2—$1.60$6.40128KQwen 3 MaxQwenOpen
Alibaba’s flagship. Strongest non-English coverage of any open model; dominant in Asian markets.
ELO 1295$1.60 - 12128878.585.5—$0.50$2.00256KGPT-5 MiniOpenAI
GPT-5’s smaller, cheaper sibling. Optimized for chat and lightweight agent tasks.
ELO 1288$0.50 - 13126577.282.4—$0.35$0.80128KLlama 4 ScoutMetaOpen
The 70B workhorse open-weight. Fits on a single H100, beloved by self-hosters.
ELO 1265$0.35 - 14—93.892.368.0$15.00$60.00200Ko4OpenAI
Deep-reasoning model. Spends more tokens thinking to crush math, science, and hard coding problems.
ELO —$15.00 - 15—87.890.1—$0.55$2.19128KDeepSeek R2DeepSeekOpen
DeepSeek’s reasoning variant. Competes with o4 on math at a fraction of the cost.
ELO —$0.55 - 16—80.182.5—$2.50$10.00256KCommand ACohere
Cohere’s enterprise model. Built for RAG, tool use, and agentic workflows with strong citations.
ELO —$2.50 - 17—82.584.0—$0.80$3.20300KNova ProAmazon
Amazon’s frontier model. Available exclusively on Bedrock; strong enterprise integration story.
ELO —$0.80 - 18—78.579.8—$3.00$12.00128KInflection 3Inflection
Pi’s personality-tuned model. Famous for conversational warmth and emotional intelligence.
ELO —$3.00 - 19—79.278.5—$1.00$5.00128KSonar LargePerplexity
Llama-tuned with a search-first system prompt. The canonical answer-with-citations model.
ELO —$1.00 - 20—74.080.5—$0.10$0.40128KPhi-5MicrosoftOpen
Microsoft’s small model champion. Punches well above its weight class; ideal for edge inference.
ELO —$0.10 - 21—76.878.0—$0.40$1.00128KReka Flash 3Reka
Pure multimodal research lab. Video and audio native at a fraction of Gemini Ultra’s price.
ELO —$0.40 - 22—75.577.0—$0.50$1.50256KJamba 2AI21Open
Hybrid Mamba + Transformer architecture. Linear cost scaling on long contexts, so the 256K window is actually fast.
ELO —$0.50