×
Anthropic announces significant updates to Claude, including agentic powers
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Anthropic unveils next-generation AI models and groundbreaking computer use capability: Anthropic has announced significant upgrades to its AI models, including an enhanced Claude 3.5 Sonnet and a new Claude 3.5 Haiku, along with a revolutionary computer use feature in public beta.

Upgraded Claude 3.5 Sonnet: A leap in AI-powered coding: The new version of Claude 3.5 Sonnet demonstrates substantial improvements across various benchmarks, with particular emphasis on coding and tool use tasks.

  • Performance on SWE-bench Verified increased from 33.4% to 49.0%, surpassing all publicly available models, including specialized systems for agentic coding.
  • TAU-bench scores improved from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the more challenging airline domain.
  • These advancements come at no additional cost or speed tradeoff compared to the previous version.

Industry feedback and real-world applications: Early adopters have reported significant improvements in AI-powered software development processes.

  • GitLab observed up to 10% stronger reasoning across use cases with no added latency.
  • Cognition noted substantial improvements in coding, planning, and problem-solving compared to the previous version.
  • The Browser Company found Claude 3.5 Sonnet outperformed all previously tested models for automating web-based workflows.

Introducing Claude 3.5 Haiku: Balancing performance and efficiency: The new Claude 3.5 Haiku model offers improved capabilities at the same cost and speed as its predecessor.

  • Claude 3.5 Haiku surpasses even Claude 3 Opus, the largest model in the previous generation, on many intelligence benchmarks.
  • It scores 40.6% on SWE-bench Verified, outperforming many agents using publicly available state-of-the-art models.
  • The model is well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from large datasets.

Pioneering computer use capability: Anthropic has introduced a groundbreaking feature allowing Claude to interact with computer interfaces like a human user.

  • The new API enables Claude to perceive and interact with computer interfaces, translating instructions into computer commands.
  • On OSWorld, which evaluates AI models’ ability to use computers like people, Claude 3.5 Sonnet scored 14.9% in the screenshot-only category, significantly higher than the next-best AI system’s score of 7.8%.
  • When given more steps to complete tasks, Claude’s score improved to 22.0%.

Responsible development and deployment: Anthropic emphasizes a proactive approach to safety and responsible AI development.

  • New classifiers have been developed to identify when computer use is being employed and to detect potential harm.
  • Joint pre-deployment testing was conducted with the US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI).
  • The ASL-2 Standard, as outlined in Anthropic’s Responsible Scaling Policy, remains appropriate for the upgraded Claude 3.5 Sonnet model.

Looking ahead: Implications and future developments: The introduction of these new models and capabilities represents a significant step forward in AI technology, with potential for wide-ranging applications across industries.

  • The computer use feature, while still in its early stages, opens up new possibilities for automating complex tasks and workflows.
  • Anthropic encourages developers to explore these new capabilities and provide feedback to help refine and improve the technology.
  • The company acknowledges that the computer use capability is still imperfect and recommends starting with low-risk tasks during the exploration phase.

Balancing innovation and responsibility: As AI systems become increasingly capable, Anthropic’s approach highlights the importance of responsible development and deployment.

  • The introduction of computer use capabilities raises new considerations for potential misuse, such as spam, misinformation, or fraud.
  • Anthropic’s proactive safety measures and collaboration with external experts demonstrate a commitment to addressing potential risks associated with advanced AI systems.
  • The public beta release of the computer use feature allows for real-world testing and feedback, which will be crucial for understanding both the potential and implications of this technology.
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

Recent News

North Korea unveils AI-equipped suicide drones amid deepening Russia ties

North Korea's AI-equipped suicide drones reflect growing technological cooperation with Russia, potentially destabilizing security in an already tense Korean peninsula.

Rookie mistake: Police recruit fired for using ChatGPT on academy essay finds second chance

A promising police career was derailed then revived after an officer's use of AI revealed gaps in how law enforcement is adapting to new technology.

Auburn University launches AI-focused cybersecurity center to counter emerging threats

Auburn's new center brings together experts from multiple disciplines to develop defensive strategies against the rising tide of AI-powered cyber threats affecting 78 percent of security officers surveyed.