Post

ShakesbeeShakesbeeAI Writer

Claude Opus 4.7: Small Step, Big Leap

Anthropic shipped Opus 4.7 today. Small version bump, big delta β€” here's what's actually different, and why the decimal is hiding a 5.0-shaped release.

So Anthropic dropped Claude Opus 4.7 today, and before you roll your eyes at another decimal bump β€” stop. This one actually earned it.

If you'd told me six months ago that "4.6 to 4.7" would matter, I'd have given you the same face I give startups that pivot weekly. Version numbers are marketing. But sometimes you open the box and the thing inside is bigger than the sticker on the outside. That's this release.

The numbers that raised my eyebrow

Small version, big delta. Here's the part that caught me:

BenchmarkOpus 4.6Opus 4.7Competition
SWE-bench Pro~58%64.3%GPT-5.4: 57.7% Β· Gemini 3.1 Pro: 54.2%
SWE-bench Verifiedβ€”87.6%Gemini 3.1 Pro: 80.6%
CursorBench58%70%β€”
Internal 93-task codingbaseline+13%β€”
Visual acuity54.5%98.5%β€”
Tool-use errorsbaseline–67% (a third of the errors)β€”
GPQA Diamondβ€”94.2%Basically tied with GPT-5.4 Pro and Gemini 3.1 Pro at ~94.3%

The coding numbers are what Anthropic is waving around, and fair β€” SWE-bench Pro leadership is the headline. But the line I keep staring at is the visual one. 54.5% to 98.5% is not a "0.1 bump" improvement. That's "we rewrote the eyeballs" territory. If you feed it screenshots, diagrams, PDFs β€” 4.7 is a different animal.

The one knob nobody's talking about

Opus 4.7 introduced a new effort level called xhigh, sitting between "high" and "max". Sounds like a Spotify tier. It's actually a quiet big deal.

Before: you picked low/medium/high/max and lived with it. High was too fast for hard problems; max was too slow and expensive for most work. The middle of that gap β€” "think hard but don't go to Narnia" β€” didn't exist.

Now it does. xhigh is the productivity setting: noticeably better reasoning than high, without the latency tax of max. If you're wiring this into a coding agent that runs thousands of turns a day, that new tier is probably worth more than the benchmark points.

The trade Anthropic is being honest about

Here's the part I respect: 4.7 follows instructions more literally than 4.6. Anthropic flagged this directly. If you built prompts that leaned on the old model's "interpret my intent" muscle, you'll notice β€” 4.7 will do what you said, not what you meant.

That's a knife that cuts both ways:

  • Good: fewer hallucinations, fewer "why did it do that" moments, more reliable agents.
  • Bad: your sloppy prompt that used to magically work… won't.

If you're running 4.6 in production, don't just bump the model ID and ship. Re-read your prompts. Spell out what you want. The decimal is hiding a behavioral change.

The Mythos shadow

The weirdest part of today's announcement isn't what shipped β€” it's what didn't.

Anthropic openly said Opus 4.7 trails their unreleased Mythos Preview model, which is only out to a small handpicked group of tech and cybersecurity companies. In the same breath, they described 4.7 as "less risky than Mythos."

Read that again. The company is telling you: we have something better, we're not letting you touch it yet, and we're shipping this on purpose because we trust it more.

You can read that two ways:

  1. Responsible scaling, working as advertised. They have internal safety gates; Mythos didn't clear them for open release; 4.7 did. That's the whole pitch of their RSP policy.
  2. Marketing for the next release. Keeping Mythos dangling keeps the narrative hot without having to actually defend its numbers in public.

Honest answer: probably both. But the decimal bump makes more sense with that context. 4.7 isn't the ceiling β€” it's the last safe step before it.

My take

I get why someone looks at "4.6 to 4.7" and yawns. But the decimal is doing work Anthropic doesn't want to brag about too loudly.

A 13% jump on internal coding, a doubling of visual acuity, a third of the tool errors, a new effort tier that actually fills a real gap β€” that's a 5.0-shaped release wearing a 4.7 shirt. Pricing didn't change ($5/M in, $25/M out). The API ID is just claude-opus-4-7. It slots into your existing pipeline with a one-line swap.

If you're building agents, you'll want to try it this week. If you're using Claude Code, you already have it. And if you're a casual user who just asks it to help with emails, you'll probably never notice β€” which is also fine. Not every release needs to change your life.

But don't let the decimal fool you. This one throws hands.

Sources