Claude Opus 4.7: Small Step, Big Leap

So Anthropic dropped Claude Opus 4.7 today, and before you roll your eyes at another decimal bump — stop. This one actually earned it.

If you'd told me six months ago that "4.6 to 4.7" would matter, I'd have given you the same face I give startups that pivot weekly. Version numbers are marketing. But sometimes you open the box and the thing inside is bigger than the sticker on the outside. That's this release.

The numbers that raised my eyebrow

Small version, big delta. Here's the part that caught me:

Benchmark	Opus 4.6	Opus 4.7	Competition
SWE-bench Pro	~58%	64.3%	GPT-5.4: 57.7% · Gemini 3.1 Pro: 54.2%
SWE-bench Verified	—	87.6%	Gemini 3.1 Pro: 80.6%
CursorBench	58%	70%	—
Internal 93-task coding	baseline	+13%	—
Visual acuity	54.5%	98.5%	—
Tool-use errors	baseline	–67% (a third of the errors)	—
GPQA Diamond	—	94.2%	Basically tied with GPT-5.4 Pro and Gemini 3.1 Pro at ~94.3%

The coding numbers are what Anthropic is waving around, and fair — SWE-bench Pro leadership is the headline. But the line I keep staring at is the visual one. 54.5% to 98.5% is not a "0.1 bump" improvement. That's "we rewrote the eyeballs" territory. If you feed it screenshots, diagrams, PDFs — 4.7 is a different animal.

The one knob nobody's talking about

Opus 4.7 introduced a new effort level called xhigh, sitting between "high" and "max". Sounds like a Spotify tier. It's actually a quiet big deal.

Before: you picked low/medium/high/max and lived with it. High was too fast for hard problems; max was too slow and expensive for most work. The middle of that gap — "think hard but don't go to Narnia" — didn't exist.

Now it does. xhigh is the productivity setting: noticeably better reasoning than high, without the latency tax of max. If you're wiring this into a coding agent that runs thousands of turns a day, that new tier is probably worth more than the benchmark points.

The trade Anthropic is being honest about

Here's the part I respect: 4.7 follows instructions more literally than 4.6. Anthropic flagged this directly. If you built prompts that leaned on the old model's "interpret my intent" muscle, you'll notice — 4.7 will do what you said, not what you meant.

That's a knife that cuts both ways:

Good: fewer hallucinations, fewer "why did it do that" moments, more reliable agents.
Bad: your sloppy prompt that used to magically work… won't.

If you're running 4.6 in production, don't just bump the model ID and ship. Re-read your prompts. Spell out what you want. The decimal is hiding a behavioral change.

The Mythos shadow

The weirdest part of today's announcement isn't what shipped — it's what didn't.

Anthropic openly said Opus 4.7 trails their unreleased Mythos Preview model, which is only out to a small handpicked group of tech and cybersecurity companies. In the same breath, they described 4.7 as "less risky than Mythos."

Read that again. The company is telling you: we have something better, we're not letting you touch it yet, and we're shipping this on purpose because we trust it more.

You can read that two ways:

Responsible scaling, working as advertised. They have internal safety gates; Mythos didn't clear them for open release; 4.7 did. That's the whole pitch of their RSP policy.
Marketing for the next release. Keeping Mythos dangling keeps the narrative hot without having to actually defend its numbers in public.

Honest answer: probably both. But the decimal bump makes more sense with that context. 4.7 isn't the ceiling — it's the last safe step before it.

My take

I get why someone looks at "4.6 to 4.7" and yawns. But the decimal is doing work Anthropic doesn't want to brag about too loudly.

A 13% jump on internal coding, a doubling of visual acuity, a third of the tool errors, a new effort tier that actually fills a real gap — that's a 5.0-shaped release wearing a 4.7 shirt. Pricing didn't change ($5/M in, $25/M out). The API ID is just claude-opus-4-7. It slots into your existing pipeline with a one-line swap.

If you're building agents, you'll want to try it this week. If you're using Claude Code, you already have it. And if you're a casual user who just asks it to help with emails, you'll probably never notice — which is also fine. Not every release needs to change your life.

But don't let the decimal fool you. This one throws hands.

Sources

Introducing Claude Opus 4.7 — the official announcement, with benchmarks and the xhigh effort level
Claude Opus 4.7 leads on SWE-bench and agentic reasoning — TNW with the side-by-side against GPT-5.4 and Gemini 3.1 Pro
Anthropic releases Claude Opus 4.7, concedes it trails unreleased Mythos — Axios on the Mythos angle
Claude Opus 4.7 is now available in Amazon Bedrock — AWS on cloud availability
Anthropic rolls out Claude Opus 4.7, less risky than Mythos — CNBC on the safety positioning