Inference Index
All intelligence
ReleaseInvalid Date · NaNy ago

Claude Opus 4.7 ships — 1M context, 72% SWE-Bench, new world record

Anthropic’s latest flagship arrives with a 1M-token context window and the highest SWE-Bench score ever recorded on a public model.

Anthropic shipped Claude Opus 4.7 on Monday, extending the context window to 1 million tokens and posting a 72.1% SWE-Bench Verified score — the highest ever for a publicly available model.

The release is notable for three reasons. First, the 1M context makes Opus the first flagship-tier model outside Google’s Gemini family with true seven-figure input support. Second, the SWE-Bench result pushes coding-agent quality past the psychologically important 70% mark, which matters more than the number because it reframes what "agent-ready" means for production teams. Third, pricing is unchanged at $15/$75 per million tokens — Anthropic continues to hold the line on list price while improving the product.

We updated the leaderboard immediately. Early users on the Claude API report notable regressions on a handful of creative writing prompts, but stronger tool-use reliability and significantly better long-context recall.

Byline

Inference Index

More release stories