Anthropic’s new Claude Opus 4.5 model promises major improvements in coding, automation, and multi-agent performance as the company expands its AI platform. (Image: Anthropic)AI startup Anthropic has unveiled its latest and most capable AI model yet – Claude Opus 4.5. The newest AI model is intelligent and efficient and is being dubbed as the world’s best model for tasks like coding, AI agents, and computer use. Anthropic claims that the new model is way better when it comes to daily tasks such as deep research, presentations, and spreadsheets. According to the company, Opus 4.5 is a leap forward in terms of capabilities of AI systems.
Opus 4.5 is Anthropic’s third biggest AI announcement this year after Haiku 4.5, which was launched in October, and Sonnet 4.5 in September. The new model, Anthropic says, is an extension of its aim to uplevel the world’s enterprises.
As of November 25, the new model is available on Anthropic’s apps, API, and on all three major cloud platforms. Developers can access it via Claude AI. In terms of pricing, Opus 4.5 is available at $5/$25 per million tokens, making it accessible for more users. Along with the model, the AI startup also released updates to its Claude Developer Platform, Claude Code, and its consumer apps. It also introduced new tools for longer-running agents and some newer ways to use Claude in Excel, Chrome, and desktop.
Anthropic, in its official blog, stated that its staff tested the model much before its official release and received ‘remarkably consistent’ feedback. The testers claimed that the model manages ambiguity and reasons about tradeoffs without any support. They also highlighted that when the model was confronted with a complex, multi-system bug, it ascertained the fix on its own. Most importantly, the testers said that the tasks that were nearly impossible for Sonnet 4.5 just a few weeks ago are now accomplished by the new model.
On performance benchmarks, Claude Opus 4.5 leads in the tasks that are key to modern AI agents, such as writing code, running tools, solving problems, and switching between modalities. In agentic coding, the model recorded 80.9 per cent, outdoing GPT-5.1, Gemini 3 Pro and its own earlier versions. In agentic terminal coding, it registered 59.3 per cent, which places it comfortably ahead of its peers.
However, the real deal is in agentic tool use, where the model scored 88.9 per cent in retail and a massive 98.2 per cent in telecom, essentially dwarfing all other models. The model also dominates the scaled tool use at 62.3 per cent and computer use at 66.3 per cent; these are both key to real-world automation. In places where the earlier Opus models lagged behind, the Opus 4.5 leaps ahead, such as in novel problem solving with 37.6 per cent. Meanwhile, in higher-order reasoning and multilingual tasks, the model supersedes others with 87.0 per cent GPQA Diamond and 90.8 per cent MMMU. Overall, the Opus 4.5 is a capable all-rounder AI model.
It needs to be noted that benchmark scores are often cherry-picked by AI companies, and most of the tests are carefully curated sandboxes where models look smarter than they actually are. Experts claim that the real-world performance of any AI model is usually complex, so in essence, treat any scorecard with caution.
Meanwhile, Anthropic also introduced new updates to the Claude Developer Platform. As models turn more advanced, they arrive at answers with far fewer steps. Opus 4.5 continues this trend by using significantly fewer tokens than earlier Claude models while still matching or even surpassing their performance.
The company said that developers can now tune this behaviour using a new effort parameter in the API by choosing whether to minimise speed and cost or maximise depth and capability. At a medium effort level, Opus 4.5 matches Sonnet 4.5 on SWE-bench Verified while using 76 per cent fewer output tokens; at the highest effort setting, it beats Sonnet 4.5 by 4.3 percentage points with 48 per cent fewer tokens. Paired with context compaction and improved tool use, Opus 4.5 can run longer, handle more complex workflows, and require less manual oversight.
Opus 4.5 also handles multi-agent setups more effectively, coordinating sub-agents for complex research or analysis. In Anthropic’s internal testing, the improvements raised deep-research task performance by nearly 15 points. Besides, the Claude Developer Platform is being redesigned around this kind of modularity, endowing developers with more control over efficiency, context, and tools. These upgrades also show up in products like Claude Code, which now reportedly plans and executes tasks more reliably and is available in the desktop app with support for parallel sessions across projects and workflows.