What Is Claude Sonnet 5?
Sonnet 5 is Anthropic's newest mid-tier model, pitched as its most agentic Sonnet so far: it plans, drives browsers and terminals, and runs multi-step tasks on its own. The framing Anthropic leads with is a cost-performance one — performance close to Opus 4.8, at Sonnet prices.
Why that framing matters before we get to mechanics: for most of the last year, the biggest agentic gains landed in the Opus tier, not Sonnet. So teams running high-volume agent work were stuck paying Opus rates for capability they mostly needed some of the time. Sonnet 5 is Anthropic's attempt to pull a chunk of that capability down a price tier. That's the "why," and it explains most of the design choices that follow.
The mechanics, from Anthropic's docs: it's a drop-in replacement for Sonnet 4.6, uses the API model ID claude-sonnet-5, ships with a 1M-token context window as both the default and the maximum, and supports up to 128k output tokens. Text and image in, text out.
| |
Claude Sonnet 5 |
| Released |
June 30, 2026 |
| API model ID |
claude-sonnet-5 |
| Context window |
1M tokens (default and max) |
| Max output |
128k tokens |
| Input / output |
Text + image in, text out |
| Relationship to 4.6 |
Drop-in replacement |
What's New vs Sonnet 4.6
If you already run Sonnet 4.6, the upgrade isn't only a weights bump. Three behavior changes will touch your code, per Anthropic's docs:
- Adaptive thinking is on by default.
- Manual extended thinking now returns a 400 error (it was already deprecated on 4.6).
- Setting sampling parameters — temperature, top_p, top_k — to non-default values returns a 400 error.
Two more changes matter operationally. Sonnet 5 exposes effort levels (low, medium, high, and xhigh), where higher effort spends more tokens on reasoning to buy more quality — and more cost. And it's the first Sonnet-tier model with real-time cyber safeguards enabled by default. Those safeguards have a quirk worth knowing before you wire this into an agent: when the model refuses a flagged request, the refusal comes back as a successful HTTP 200 with stop_reason: "refusal", not as an error. If your error handling assumes a refusal throws, it won't.
The change with the widest blast radius, though, is one nobody puts in a headline: the tokenizer. I'm giving it its own section because it changes the pricing math.
Key Features and Performance
Anthropic positions Sonnet 5 as a strict improvement over 4.6 across reasoning, tool use, coding, and knowledge work, and as close enough to Opus 4.8 that you can pick your point on the cost-performance curve with the effort dial. The specific benchmark figures below are as reported from Anthropic's launch materials and day-one coverage; treat the exact numbers as Anthropic-reported until you've read them in the System Card yourself.
| Benchmark |
Sonnet 5 |
Sonnet 4.6 |
Opus 4.8 |
| SWE-bench Pro (agentic coding) |
63.2% |
58.1% |
69.2% |
| OSWorld-Verified (computer use) |
81.2% |
78.5% |
— |
| Terminal-Bench 2.1 |
80.4% |
67.0% |
— |
| Humanity's Last Exam (with tools) |
57.4% |
46.8% |
57.9% |
| GDPval-AA v2 (knowledge work) |
1,618 |
— |
1,615 |
Figures as reported from Anthropic's June 30, 2026 launch; dashes are cells I couldn't confirm.
Read that table honestly and two things stand out. On coding, Sonnet 5 clears 4.6 by a real margin but still sits below Opus 4.8 — it narrows the gap, it doesn't close it. On knowledge work, it edges Opus 4.8 by a hair (1,618 vs 1,615), which is the source of the "sometimes beats Opus" line you'll see repeated everywhere; it's true, but it's a hair, on one benchmark.
Here's the cost the highlight reel skips. Anthropic's own charts show Sonnet 5's best value lands at low and medium effort. Push it to xhigh, and reporting from launch day says the cost can climb past Opus 4.8 for similar quality. So "cheaper than Opus" is a claim with an asterisk: cheaper at the effort levels where you'd actually reach for a Sonnet, not necessarily when you crank it to the top. My take: if a task genuinely needs xhigh reasoning, price out Opus 4.8 before defaulting to Sonnet 5 — the intuition that the smaller model is always the cheaper call breaks down at the top of the dial.
On safety, Anthropic reports lower rates of hallucination, sycophancy, and undesirable behavior than 4.6, plus better resistance to malicious requests and prompt-injection hijacks. The caveat they publish themselves: Sonnet 5 still shows a higher rate of misaligned behavior than Opus 4.8 on their internal audit. And on cyber, it never produced a full working exploit in testing — deliberately low capability there, which is a feature if you're shipping a customer-facing agent and a limitation if sanctioned security work is your use case.
Claude Sonnet 5 Pricing: Is It Actually Cheaper?
The sticker prices — Anthropic direct, plus GPT Proto's current rate:
| |
Input / 1M |
Output / 1M |
| Sonnet 5 — introductory (through Aug 31, 2026) |
$2 |
$10 |
| Sonnet 5 — standard (from Sep 1, 2026) |
$3 |
$15 |
| Sonnet 5 — via GPT Proto (current) |
$1.6 |
$8 |
| Sonnet 4.6 |
$3 |
$15 |
| Opus 4.8 |
$5 |
$25 |
At standard pricing, Sonnet 5 costs the same per token as 4.6 and comes in well under Opus 4.8. Straightforward. Except the per-token price isn't the whole bill.
Sonnet 5 ships with a new tokenizer — the same one Anthropic introduced with Opus 4.7 — and the same text now maps to roughly 1.0 to 1.35× more tokens depending on content type. Anthropic is upfront about this in a footnote, and about the consequence: the introductory pricing is set so the move from 4.6 is roughly cost-neutral during the intro window. Read that carefully. It means the $2/$10 launch price isn't pure discount; part of it is offsetting the extra tokens. And it means that on September 1, when the price returns to $3/$15 — identical to 4.6 on paper — an equivalent request can cost you more than 4.6 did, because it's billing more tokens for the same text.
So, is Sonnet 5 cheaper than 4.6? The honest answer is: it depends on when and how hard you run it. Through August, effectively yes. After that, for the same workload, plan for a per-request cost that's flat to modestly higher, not lower — and measure your own prompts under the new tokenizer instead of trusting the per-token line.
If you call Claude through GPT Proto rather than direct, Sonnet 5 is live on the platform at $1.6 / 1M input and $8 / 1M output — 20% under Anthropic's $2/$10 introductory rate, which makes it the cheaper path while the launch pricing lasts. For context on the same platform, Sonnet 4.6 runs $2.4/$12 and Opus 4.8 $4/$20. The runnable calls in the next section point straight at Sonnet 5.
How to Use Claude Sonnet 5 via API
Sonnet 5 is now live on GPT Proto, so the runnable examples below call it directly on the Claude Sonnet 5 model page. If you're migrating from 4.6, this is the drop-in swap Anthropic advertises — change claude-sonnet-4-6 to claude-sonnet-5 and leave the rest of the call untouched.
GPT Proto exposes Claude in Anthropic's native Messages format at https://gptproto.com/v1/messages, with a Bearer token. cURL first:
curl --request POST "https://gptproto.com/v1/messages" \
--header "Authorization: Bearer $GPTPROTO_API_KEY" \
--header "Content-Type: application/json" \
--header "anthropic-version: 2023-06-01" \
--data '{
"model": "claude-sonnet-5",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "Explain what an agentic coding model does in two sentences." }
]
}'
The same call in Python, using nothing but requests:
import os
import requests
resp = requests.post(
"https://gptproto.com/v1/messages",
headers={
"Authorization": f"Bearer {os.environ['GPTPROTO_API_KEY']}",
"Content-Type": "application/json",
"anthropic-version": "2023-06-01",
},
json={
"model": "claude-sonnet-5",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain what an agentic coding model does in two sentences."}
],
},
timeout=60,
)
resp.raise_for_status()
data = resp.json()
print(data["content"][0]["text"])
Two things to carry over when you migrate to Sonnet 5. First, the tokenizer change means your token counts — and therefore your bill and your max_tokens headroom — shift even if the text doesn't; re-measure before you assume your existing limits still fit. Second, if any of your current code sets temperature, top_p, or top_k to custom values, or turns on manual extended thinking, strip those out: on Sonnet 5 they return a 400.
One key on GPT Proto reaches Claude plus 200+ other models on a single balance, so you can benchmark Sonnet 5 against, say, a cheaper model on the same request path without juggling providers. The full list is on the model catalog.
Sonnet 5 vs Sonnet 4.6 (and if you're still on 4.5 or 3.5)
Here's the head-to-head I'd actually use to decide, with Opus 4.8 kept in as the reference ceiling:
| |
Sonnet 5 |
Sonnet 4.6 |
Opus 4.8 |
| Positioning |
Near-Opus agentic work at Sonnet price |
Prior mid-tier standard |
Accuracy ceiling |
| Coding (SWE-bench Pro) |
63.2% |
58.1% |
69.2% |
| Best value at |
Low / medium effort |
— |
Hardest tasks |
| Standard price (in/out per 1M) |
$3 / $15 |
$3 / $15 |
$5 / $25 |
| Context |
1M |
up to 1M |
200k |
| Cost caveat |
New tokenizer bills more tokens; xhigh can exceed Opus cost |
Familiar tokenizer |
Priced for the top end |
My read: if you're on 4.6 and doing sustained agentic or coding work, Sonnet 5 is the upgrade to run — the coding and tool-use gains are real, and the intro pricing makes the trial cheap. If your workload is accuracy-critical at the very top — the kind of task where a wrong answer is expensive — keep Opus 4.8 in the loop and reserve Sonnet 5 for the high-volume middle. And if you find yourself reaching for xhigh effort constantly, that's a signal to price Opus 4.8, not a reason to assume Sonnet is cheaper.
Still on Sonnet 4.5 or 3.5? Then the upgrade case is even stronger, but I'd stop short of quoting you a clean "5 vs 3.5" benchmark, because Anthropic didn't publish one — every head-to-head figure above is against 4.6. What I can say is directional: two generations of agentic gains sit between 3.5 and 5, most of them in exactly the tool-use and long-task reliability that older Sonnets stopped short on. The practical move is the same regardless of where you're starting: point your code at Sonnet 5 on GPT Proto and let the effort dial do the rest.
Is It Worth It? An Honest Read
Day-one reception was split, and the split is informative. The praise clustered on price-to-value at the intro rate. The doubt clustered on the same question this guide keeps circling: whether it holds up once standard $3/$15 pricing returns and the tokenizer's token inflation is doing its quiet work.
The most useful signal I saw wasn't a benchmark — it was a hands-on review from a code-review tooling team that ran it on real work. Their finding cuts both ways, which is why I trust it: for writing and building code, Sonnet 5 was the strongest model they'd used at this tier. For reviewing code, it was a trade-off — cleaner, sharper comments, but it caught fewer bugs than the models they already run in production, at a slightly higher cost per review. They also noticed two habits that show up as productivity in agent loops: it tends to write tests before the feature, and it rewrites its own plan partway through a long task instead of marching a stale plan off a cliff. The catch they flagged matches Anthropic's own note — the new cyber safeguards will occasionally trip on legitimate security work.
Worth remembering: the glowing quotes in the launch post come from Anthropic's early-access partners. They're real, but they're curated, and a vendor picking its own testimonials isn't neutral evidence. The independent hands-on take — strong at generation, weaker and pricier at review — is the more honest baseline.
So who should switch? Teams doing high-volume coding, tool use, browser and terminal automation, or knowledge work where you were overpaying for Opus — switch, and do it during the intro window while the trial is cheap. Teams whose core loop is careful review or accuracy-critical decisions — test it, but don't retire your current model on launch day, and re-run your own eval suite before you commit.
Bottom line: Sonnet 5 is a real step up for agentic and coding work, and the intro pricing makes it cheap to try — just don't read "same list price as 4.6" as "same bill," and price Opus 4.8 before you crank the effort dial. You can call Claude Sonnet 5 on GPT Proto today at $1.6 / $8 — 20% under Anthropic's launch rate — on one balance and one API key, across 200+ models if you want to route cost-sensitive calls elsewhere.