Anthropic announces significant updates to Claude, including agentic powers

Anthropic unveils next-generation AI models and groundbreaking computer use capability: Anthropic has announced significant upgrades to its AI models, including an enhanced Claude 3.5 Sonnet and a new Claude 3.5 Haiku, along with a revolutionary computer use feature in public beta.

Upgraded Claude 3.5 Sonnet: A leap in AI-powered coding: The new version of Claude 3.5 Sonnet demonstrates substantial improvements across various benchmarks, with particular emphasis on coding and tool use tasks.

Performance on SWE-bench Verified increased from 33.4% to 49.0%, surpassing all publicly available models, including specialized systems for agentic coding.
TAU-bench scores improved from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the more challenging airline domain.
These advancements come at no additional cost or speed tradeoff compared to the previous version.

Industry feedback and real-world applications: Early adopters have reported significant improvements in AI-powered software development processes.

GitLab observed up to 10% stronger reasoning across use cases with no added latency.
Cognition noted substantial improvements in coding, planning, and problem-solving compared to the previous version.
The Browser Company found Claude 3.5 Sonnet outperformed all previously tested models for automating web-based workflows.

Introducing Claude 3.5 Haiku: Balancing performance and efficiency: The new Claude 3.5 Haiku model offers improved capabilities at the same cost and speed as its predecessor.

Claude 3.5 Haiku surpasses even Claude 3 Opus, the largest model in the previous generation, on many intelligence benchmarks.
It scores 40.6% on SWE-bench Verified, outperforming many agents using publicly available state-of-the-art models.
The model is well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from large datasets.

Pioneering computer use capability: Anthropic has introduced a groundbreaking feature allowing Claude to interact with computer interfaces like a human user.

The new API enables Claude to perceive and interact with computer interfaces, translating instructions into computer commands.
On OSWorld, which evaluates AI models’ ability to use computers like people, Claude 3.5 Sonnet scored 14.9% in the screenshot-only category, significantly higher than the next-best AI system’s score of 7.8%.
When given more steps to complete tasks, Claude’s score improved to 22.0%.

Responsible development and deployment: Anthropic emphasizes a proactive approach to safety and responsible AI development.

New classifiers have been developed to identify when computer use is being employed and to detect potential harm.
Joint pre-deployment testing was conducted with the US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI).
The ASL-2 Standard, as outlined in Anthropic’s Responsible Scaling Policy, remains appropriate for the upgraded Claude 3.5 Sonnet model.

Looking ahead: Implications and future developments: The introduction of these new models and capabilities represents a significant step forward in AI technology, with potential for wide-ranging applications across industries.

The computer use feature, while still in its early stages, opens up new possibilities for automating complex tasks and workflows.
Anthropic encourages developers to explore these new capabilities and provide feedback to help refine and improve the technology.
The company acknowledges that the computer use capability is still imperfect and recommends starting with low-risk tasks during the exploration phase.

Balancing innovation and responsibility: As AI systems become increasingly capable, Anthropic’s approach highlights the importance of responsible development and deployment.

The introduction of computer use capabilities raises new considerations for potential misuse, such as spam, misinformation, or fraud.
Anthropic’s proactive safety measures and collaboration with external experts demonstrate a commitment to addressing potential risks associated with advanced AI systems.
The public beta release of the computer use feature allows for real-world testing and feedback, which will be crucial for understanding both the potential and implications of this technology.

Anthropic announces significant updates to Claude, including agentic powers

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development

Outsider
Labs.

Anthropic announces significant updates to Claude, including agentic powers

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development

All Signal.No Noise.

OutsiderLabs.

All Signal.
No Noise.

Outsider
Labs.