Anthropic releases Claude Sonnet 4.5 with advanced coding and agent capabilities

Spread the love
Anthropic releases Claude Sonnet 4.5 with advanced coding and agent capabilities

AI company Anthropic has released Claude Sonnet 4.5, a new flagship model that the company positions as its most capable for coding, building complex AI agents, and using computer systems, with significant gains in reasoning and mathematics.

The new model is available now and is accompanied by a new developer toolkit and major updates across the Claude product line.

Sonnet 4.5 features that stand out

According to Anthropic’s blog post, the model achieves state-of-the-art performance on the SWE-bench Verified evaluation, a benchmark that measures real-world software coding abilities. It also shows improved performance on the OSWorld benchmark, which tests an AI model’s ability to perform real-world tasks on a computer, such as navigating websites and filling spreadsheets.

The company also reports that experts in finance, law, medicine, and STEM found Sonnet 4.5 to have dramatically better domain-specific knowledge and reasoning compared to previous models.

New tools for developers: The Claude Agent SDK

Alongside the new model, Anthropic has launched the Claude Agent SDK. This software development kit provides developers with the same infrastructure the company uses to power its Claude Code product, enabling them to build their own custom AI agents. The SDK is designed to solve common challenges in agent development, such as managing memory for long-running tasks, handling permission systems, and coordinating subagents working toward a shared goal.

Product updates across the Claude ecosystem

The launch of Sonnet 4.5 includes several significant upgrades to existing Claude products.

  • Claude Code: Introduces checkpoints that allow users to save progress and roll back to a previous state, a refreshed terminal interface, and a native VS Code extension.
  • Claude API: Adds a new context editing feature and a memory tool to help agents run longer and handle more complex tasks.
  • Claude Apps: Users on paid plans can now execute code and create files, such as spreadsheets, slides, and documents, directly within their conversations.
  • Claude for Chrome Extension: Now available to Max users who previously joined the waitlist.

Focus on safety and alignment

Anthropic states that Claude Sonnet 4.5 is its most aligned model to date, with improvements in reducing undesirable behaviors like deception and sycophancy. The model is released under the company’s AI Safety Level 3 (ASL-3) framework, which includes safeguards like classifiers designed to detect potentially dangerous inputs and outputs, particularly those related to chemical, biological, radiological, and nuclear (CBRN) weapons.

Imagine with Claude

For a limited time, Anthropic is offering a research preview called “Imagine with Claude” for its Max subscribers. In this demonstration, the model generates software in real time in response to user requests, with no prewritten code. This preview is designed to showcase the capabilities of Son-net 4.5 when combined with the right infrastructure.

Availability and pricing

Claude Sonnet 4.5 is available now through the Claude API. The pricing is the same as the previous Claude Sonnet 4 model, at $3 per million input tokens and $15 per million output tokens.

Anthropic recommends upgrading to Sonnet 4.5 for all uses, as it provides improved performance for the same cost.

Claude Sonnet 4.5 vs ChatGPT-5: Which one should you use for your next project?

The release of Claude Sonnet 4.5 has intensified the competition at the forefront of artificial intelligence, directly challenging GPT-5.

While both models represent advanced AI development, they showcase distinct strengths, particularly in the realms of coding, agentic capabilities, and overall performance.

At a glance: Key differences

Feature Claude Sonnet 4.5 GPT-5
Primary strength Agentic coding, computer use, and long-duration autonomous tasks. Unified intelligence, advanced reasoning, and multimodal capabilities.
SWE-bench Verified 77.2% (Standard), 82% (High-compute). 72.8%.
OSWorld Benchmark 61.4%. Not specified, but Sonnet 4.5 leads the chart.
Developer tools Claude Agent SDK, native VS Code extension, Claude Code with checkpoints. Accessed via API and integrated into products like ChatGPT and Microsoft Copilot.
Unique features Can operate autonomously for over 30 hours. Enhanced safety and alignment features. Unified system that blends multiple AI models. Dynamically adjusts its reasoning approach based on task complexity.

Coding and developer focus

Claude Sonnet 4.5 has been positioned as the “best coding model in the world.” This claim is substantiated by its leading performance on several key benchmarks. On SWE-bench Verified, which measures a model’s ability to solve real-world GitHub issues, Sonnet 4.5 scores an impressive 77.2%, outperforming GPT-5’s 72.8%. With additional computing power, Sonnet 4.5’s score jumps to 82%.

Furthermore, on Terminal-Bench, a test of an AI’s ability to use a command-line interface, Sonnet 4.5 achieved a 50% success rate, significantly ahead of GPT-5’s 43.8%. This suggests that for developers and technical users who need an AI to perform complex, multi-step tasks in a terminal environment, Sonnet 4.5 holds a distinct advantage.

In contrast, GPT-5 is presented as a powerful, general-purpose coding model. While it set new state-of-the-art benchmarks at the time of its release, the specialized focus of Sonnet 4.5 appears to give it an edge in developer-centric tasks.

Agentic capabilities and computer use

A standout feature of Claude Sonnet 4.5 is its ability to function as a long-running autonomous agent. Reports indicate the model can maintain focus and performance on complex tasks for more than 30 hours, a significant increase from previous models. This endurance is crucial for tasks that require sustained effort, such as large-scale code refactoring or in-depth data analysis.

On the OSWorld benchmark, which evaluates an AI’s ability to perform real-world tasks on a computer, Sonnet 4.5 has taken the top spot with a success rate of 61.4%. This proficiency is further demonstrated in its tool use capabilities, where it scored a remarkable 98.0% in the Telecom domain of the τ-bench evaluations, nearly doubling the performance of its predecessor and surpassing GPT-5.

GPT-5, on the other hand, is designed as a unified system that can intelligently switch between different reasoning approaches based on the task’s complexity. This allows it to handle a wide variety of tasks efficiently, but it does not emphasize the same long-duration autonomy as Sonnet 4.5.

Reasoning, math, and general performance

In areas of general reasoning and mathematics, the competition is much closer. On the AIME 2025 high school math competition, Sonnet 4.5 achieved a perfect 100% score when using Python, slightly edging out GPT-5’s 99.6%. For graduate-level reasoning, as measured by the GPQA Diamond benchmark, the models are highly competitive, with GPT-5 holding a slight lead.

Early user reports and hands-on tests suggest that Sonnet 4.5 is noticeably faster…


Featured image credit

FAQs

Frequently Asked Questions

What is a Premium Domain Name?   A premium domain name is the digital equivalent of prime real estate. It’s a short, catchy, and highly desirable web address that can significantly boost your brand's impact. These exclusive domains are already owned but available for purchase, offering you a shortcut to a powerful online presence. Why Choose a Premium Domain? Instant Brand Boost: Premium domains are like instant credibility boosters. They command attention, inspire trust, and make your business look established from day one. Memorable and Magnetic: Short, sweet, and unforgettable - these domains stick in people's minds. This means more visitors, better recall, and ultimately, more business. Outshine the Competition: In a crowded digital world, a premium domain is your secret weapon. Stand out, get noticed, and leave a lasting impression. Smart Investment: Premium domains often appreciate in value, just like a well-chosen piece of property. Own a piece of the digital world that could pay dividends. What Sets Premium Domains Apart?   Unlike ordinary domain names, premium domains are carefully crafted to be exceptional. They are shorter, more memorable, and often include valuable keywords. Plus, they often come with a built-in advantage: established online presence and search engine visibility. How Much Does a Premium Domain Cost?   The price tag for a premium domain depends on its desirability. While they cost more than standard domains, the investment can be game-changing. Think of it as an upfront cost for a long-term return. BrandBucket offers transparent pricing, so you know exactly what you're getting. Premium Domains: Worth the Investment?   Absolutely! A premium domain is more than just a website address; it's a strategic asset. By choosing the right premium domain, you're investing in your brand's future and setting yourself up for long-term success. What Are the Costs Associated with a Premium Domain?   While the initial purchase price of a premium domain is typically higher than a standard domain, the annual renewal fees are usually the same. Additionally, you may incur transfer fees if you decide to sell or move the domain to a different registrar. Can I Negotiate the Price of a Premium Domain? In some cases, it may be possible to negotiate the price of a premium domain. However, the success of negotiations depends on factors such as the domain's demand, the seller's willingness to negotiate, and the overall market conditions. At BrandBucket, we offer transparent, upfront pricing, but if you see a name that you like and wish to discuss price, please reach out to our sales team. How Do I Transfer a Premium Domain?   Transferring a premium domain involves a few steps, including unlocking the domain, obtaining an authorization code from the current registrar, and initiating the transfer with the new registrar. Many domain name marketplaces, including BrandBucket, offer assistance with the transfer process.