Claude Sonnet 5 keeps last year's price and closes most of the gap to Opus. That's the number for your in-house bid-leveling agent.
Anthropic's new Sonnet 5 model, released June 30, holds the same standard price as the model it replaces while closing most of the agentic-performance gap to the flagship Opus tier. For estimators pricing out an internal document-review agent, that's the math that changed.
An estimating lead who priced out building an internal agent to run bid leveling — reading a dozen sub quotes against the base scope, flagging exclusions, catching mismatched allowances before the buyout meeting — has run the same math for the past year: the model good enough to trust with that judgment call costs more per package than the review is worth automating. Anthropic's newest model doesn't blow that math up. It quietly moves it, and the way it moves it matters more than the headline suggests.
What actually shipped
On June 30, Anthropic released Claude Sonnet 5, positioned as its mid-tier model tuned specifically for agentic work — planning multi-step tasks, using tools like browsers and terminals, and running with less supervision than a chat-style model needs. On SWE-bench Pro, a benchmark for autonomous multi-step technical work, Sonnet 5 scores 63.2%, up from Sonnet 4.6's 58.1% and closing in on flagship Opus 4.8's 69.2%.
The pricing is the part worth sitting with. Sonnet 5 launches at an introductory rate of $2 per million input tokens and $10 per million output tokens through August 31. After that, it reverts to $3 and $15 — which is exactly what Sonnet 4.6 already charged. One caveat: Sonnet 5 uses a new tokenizer, and the same text can map to up to 1.35 times more tokens than before. Anthropic says the introductory pricing was set to keep that switch roughly cost-neutral, but it means the sticker price and your actual bill won't move in perfect lockstep — worth testing against your own documents rather than assuming the percentages translate directly.
Why flat pricing is the real story
Model launches usually come with a price increase to match a capability jump. This one didn't. The standard rate for meaningfully better agentic performance is the same rate you were already budgeting for the model it replaces. That's a different signal than "AI got cheaper" — it's "the tier you already priced into your tool now does more."
What that changes for a bid-leveling agent
The work of comparing sub quotes against a spec section isn't a single question-and-answer exchange — it's reading several long documents, cross-referencing line items, flagging what's missing, and producing a structured output a PE or estimator can act on. That's the multi-step, tool-using category Sonnet 5 was built for, not a one-shot summary task. Previously, getting that kind of judgment-heavy volume work to a trustworthy accuracy level often meant paying Opus-tier rates to get Opus-tier reliability. A model that closes most of that gap at the price you already had budgeted changes which workflows clear the build-vs-buy bar for a mid-size precon team, not just large firms that can absorb premium API costs across dozens of jobs.
What it doesn't solve
Sonnet 5 is a reasoning and agent model, not a document-ingestion tool. It still needs your bid packages and spec sections converted into clean, structured text before it can compare them — the extraction layer we covered with Mistral's OCR 4 release is a separate piece of the stack. A better benchmark score also isn't a substitute for review: anything that touches a dollar commitment or a scope decision still needs a person to check the model's flags against the actual contract documents before it goes into a buyout recommendation.
What to test first
Before committing budget to a custom build, run one real comparison: take a recent buyout package, feed the sub quotes and base spec section into Sonnet 5, and ask it to list scope mismatches and missing line items. Check the output against what your estimator actually caught manually, and check your account's real token usage against the sticker price — the tokenizer change means the only honest cost estimate is the one from your own documents, not the published rate card.
The agent tools already showing up in construction software often run on Claude underneath — Copilot Cowork, for instance, currently runs on Opus 4.8 and Sonnet 4.6. Expect Sonnet 5 to work its way into that kind of vendor stack over the coming months, which means the pricing and capability shift lands on your desk before it's ever a model name you chose yourself.
Friday one chart. Every week, one piece of data that should change a decision on your project. Subscribe at constructionaibrief.com.