TL;DR: Claude Opus 4.8 (claude-opus-4-8) shipped on 2026-05-28 as Anthropic's most capable generally available model — same price as Opus 4.7 ($5 / $25 per million tokens), a 1M-token context window, and a new Fast Mode that is roughly three times cheaper than before. This guide is the builder's version: how to actually call claude-opus-4-8 from the API, what changed for the effort parameter and the Messages format, how to control cost with caching and batch, how to migrate cleanly from Opus 4.7, and what to do when the API returns 451: unsupported region.
Table of contents
What's actually new for builders
Anthropic frames Opus 4.8 as "a modest but tangible improvement on its predecessor." For most readers that line is a relief, not a disappointment: the price did not move, the context window did not shrink, and your existing Opus 4.7 code keeps working. What changed is worth knowing before you upgrade:
Reliability over raw IQ. Anthropic reports Opus 4.8 is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked." In practice that means the model is more willing to say it is unsure and to catch its own bugs instead of declaring victory early — the single most useful change for unattended agentic work.
Fast Mode got cheap. Running Opus 4.8 in Fast Mode (≈2.5× token throughput) costs $10 / $50 per million input/output tokens, described by Anthropic as "three times cheaper than it was for previous models."
Dynamic Workflows (research preview). Claude can now dispatch hundreds of parallel subagents in a single session for codebase-scale jobs.
Computer use jumped. Anthropic reports an 84% score on Online-Mind2Web, a meaningful step over both Opus 4.7 and GPT-5.5 on browser-agent tasks.
Effort control changed defaults (see below) — this is the one behavioral change most likely to surprise you on upgrade.
For the full benchmark table and methodology, read the official announcement and the system card. We deliberately do not reprint every number here — what follows is how to use the model.
The model ID and your first call
The model ID is `claude-opus-4-8`. Like every model since the 4.6 generation, it is a dateless but pinned snapshot — calling claude-opus-4-8 always resolves to this exact release, not an evergreen pointer.
A minimal Messages API call against Anthropic directly:
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-opus-4-8",
"max_tokens": 2048,
"messages": [
{ "role": "user", "content": "Refactor this function and explain the risk you are least sure about." }
]
}'Python, using the official SDK:
from anthropic import Anthropic
client = Anthropic() # reads ANTHROPIC_API_KEY
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=2048,
messages=[
{"role": "user", "content": "Review this migration plan and push back if it is unsound."}
],
)
print(resp.content[0].text)TypeScript:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic(); // reads ANTHROPIC_API_KEY
const resp = await client.messages.create({
model: "claude-opus-4-8",
max_tokens: 2048,
messages: [
{ role: "user", content: "Find the bug you are most uncertain about and say why." },
],
});
console.log(resp.content[0].type === "text" ? resp.content[0].text : "");Key limits to keep in mind:
Property | Value |
|---|---|
Model ID |
|
Context window | 1M tokens (~555k words) |
Max output (Messages API) | 128k tokens |
Max output (Batch API beta) | 300k tokens via |
Knowledge cutoff | January 2026 |
Pricing and cost control
Opus 4.8 is priced identically to Opus 4.7:
Mode | Input / MTok | Output / MTok |
|---|---|---|
Standard | $5 | $25 |
Fast Mode (≈2.5× speed) | $10 | $50 |
US-only inference | 1.1× standard | 1.1× standard |
Three levers bring the real bill down:
Prompt caching — up to 90% savings on repeated context (system prompts, large documents, code you reference every call). Cache your stable prefix and pay full rate only on the delta.
Batch processing — 50% savings for non-interactive workloads via the Message Batches API. Pair it with the 300k-output beta header for long-form generation jobs.
Pick the right effort level —
highis powerful but not free. Drop to a lower effort for routine tasks (see below).
If your usage is bursty or you cannot hold an Anthropic billing relationship in your region, a gateway that bills per token with no card-on-file requirement can be cheaper in practice than a minimum-commit enterprise plan. CodeGateway exposes claude-opus-4-8 at a transparent multiplier over Anthropic's list price and gives every new account $2 of free credit to test before committing.The effort parameter now defaults to high
This is the change most likely to bite you on upgrade. On Opus 4.8 the `effort` parameter defaults to `high` on every surface — the Claude API, Claude Code, and claude.ai. Higher effort means more thinking tokens and better answers on hard problems, but also more latency and cost on easy ones.
Available levels include high (default), extra (called xhigh in Claude Code), and max. Set it explicitly when you do not want the default:
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Rename this variable for clarity."}],
# Trivial task — don't pay for max-depth reasoning.
extra_body={"effort": "low"},
)Rule of thumb: leave high for architecture, debugging, and multi-file agentic work; dial it down for boilerplate, formatting, and single-shot edits. If your post-upgrade bill jumped, an unset effort defaulting to high is the first thing to check.
Dynamic Workflows — hundreds of parallel subagents
Dynamic Workflows (a research preview) is the headline platform feature. Instead of one agent working linearly, Claude plans the work, dispatches many subagents that attack the problem from independent angles, has other agents try to refute what they found, and iterates until the answers converge. Anthropic positions it for "codebase-scale migrations across hundreds of thousands of lines of code."
The proof-of-concept making the rounds is a full port of a 750,000-line Rust codebase with 99.8% of the test suite passing in 11 days — credible as a demonstration, not yet a production guarantee. Treat Dynamic Workflows as what it is labeled: a preview to pilot on a non-critical migration first, with a human reviewing the convergence, not a fire-and-forget button.
Migrating from Opus 4.7 to 4.8
For most code, migration is a one-line model-ID swap:
- "model": "claude-opus-4-7",
+ "model": "claude-opus-4-8",Three things to check beyond the string:
The `effort` default. Opus 4.7 and 4.8 both expose
effort, but if you relied on a particular implicit behavior, pin it explicitly now.highis the default on 4.8.Messages API now accepts `system` entries inside the `messages` array. You can keep using the top-level
systemparameter, but inline system turns are now valid — useful for multi-stage prompts.Re-baseline your cost dashboards. Same list price, but the
effortdefault and any move to Fast Mode will shift your per-request economics. Watch the first day of traffic.
Your prompt-caching keys, tool definitions, and streaming code all carry over unchanged. Anthropic's migration guide covers edge cases if you are coming from Opus 4.6 or older.
When you get "451: unsupported region"
If a direct call to api.anthropic.com returns HTTP 451 or a connection that simply hangs, the Anthropic API is not reachable from where your code runs. This is an availability problem, not a billing one — your key is fine; the endpoint is not serving your network.
A gateway solves this by terminating your request at a reachable edge and forwarding it upstream. With CodeGateway you change two things — the base URL and the key — and keep the rest of your Anthropic SDK code identical:
from anthropic import Anthropic
client = Anthropic(
base_url="https://api.codegateway.dev",
api_key="YOUR_CODEGATEWAY_KEY",
)
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=2048,
messages=[{"role": "user", "content": "Ship it, but tell me what you are unsure about."}],
)
print(resp.content[0].text)Because the Messages API contract is unchanged, your tool calls, streaming, prompt caching, and effort settings all behave the same — only the transport changes. New accounts get $2 of free credit, so you can confirm a working claude-opus-4-8 round-trip before adding any funds.
Opus 4.8 vs 4.7 vs GPT-5.5 — the honest version
The fair summary, based on Anthropic's own framing and the first wave of independent testing:
Agentic coding & reliability: Opus 4.8 is the strongest of the three for long-horizon, multi-step coding where not hallucinating a "done" is what matters. The 4× honesty improvement is the real story.
Computer use / browser agents: Opus 4.8 leads, at a reported 84% on Online-Mind2Web.
Raw terminal coding: GPT-5.5 still edges ahead — Anthropic itself notes GPT-5.5's Codex CLI scores 83.4% on Terminal-Bench 2.1.
Cost: Identical Anthropic list price to Opus 4.7; the Fast Mode discount changes the economics for high-throughput workloads.
Coding & agentic benchmarks: Opus 4.8 vs Opus 4.7
The numbers below are self-reported by Anthropic in the Opus 4.8 launch announcement and system card. The pattern is consistent with "modest but tangible": real gains on the harder agentic suites, a slight regression on one knowledge benchmark.
Benchmark | Opus 4.8 | Opus 4.7 | Δ |
|---|---|---|---|
SWE-bench Verified | 88.6% | 87.6% | +1.0 |
SWE-bench Pro | 69.2% | 64.3% | +4.9 |
Terminal-Bench 2.1 | 74.6% | 69.4%¹ | n/a¹ |
MCP-Atlas | 82.2% | 77.3% | +4.9 |
BrowseComp (single-agent) | 84.3% | 79.3% | +5.0 |
GPQA Diamond | 93.6% | 94.2% | −0.6 |
¹ Opus 4.7's 69.4% was measured on Terminal-Bench v2.0, so it is not directly comparable to 4.8's v2.1 score. For cross-model context, Anthropic notes GPT-5.5 scores 83.4% on Terminal-Bench 2.1 with the Codex CLI harness — still the leader on raw terminal coding.
The standout is SWE-bench Pro: the harder, real-world agentic-coding benchmark moved nearly five points. GPQA Diamond dipping slightly (−0.6) is the honest counterweight — this is a coding-and-reliability release, not an across-the-board knowledge jump.
If you are already on Claude in production, the upgrade is an easy call: same price, better judgment, no migration tax. If you are choosing between ecosystems, the decision is less about a single benchmark and more about whether your work is agentic-coding-and-reliability shaped (Opus 4.8) or raw-terminal-throughput shaped (GPT-5.5). For independent benchmark roundups, see coverage from VentureBeat and TechCrunch.
FAQ
Q: What is the Claude Opus 4.8 model ID? A: claude-opus-4-8. It is a pinned snapshot, so the ID always resolves to this specific release.
Q: How much does Claude Opus 4.8 cost? A: $5 per million input tokens and $25 per million output tokens at standard speed — the same as Opus 4.7. Fast Mode is $10 / $50. US-only inference is 1.1× standard.
Q: Is Opus 4.8 worth upgrading from 4.7? A: Yes for most teams — same price, meaningfully better reliability and judgment, and migration is usually a one-line model-ID change. Pin your effort level explicitly and re-baseline cost dashboards after the switch.
Q: What is the context window? A: 1M tokens, with up to 128k tokens of output on the Messages API (300k via the Batch API beta header).
Q: How do I call Opus 4.8 if `api.anthropic.com` is unreachable from my region? A: Route through a gateway. Point your Anthropic SDK's base_url at https://api.codegateway.dev, use a CodeGateway key, and keep the rest of your code unchanged. New accounts get $2 of free credit to test.
Q: What are Dynamic Workflows? A: A research-preview feature where Claude dispatches hundreds of parallel subagents, has them refute each other's findings, and iterates until they converge — built for very large migrations. Pilot it on non-critical work first.
