×
How Arm boosts AI performance with its new chip designs
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The advancement of AI and machine learning technologies has positioned Arm as a key enabler of AI acceleration across diverse computing platforms, leveraging its decade-long expertise in processor architecture and optimization.

Core technology capabilities: Arm’s processor architecture incorporates multiple technologies that enable efficient AI workload processing without dedicated neural processing units (NPUs).

  • Matrix multiplication, the fundamental operation in AI processing, is accelerated through built-in hardware features in both Arm v8 and v9 architectures
  • Technologies like Neon, Scalable Vector Extensions (SVE), and Scalable Matrix Extensions (SME) enable CPUs to perform accelerated matrix operations independently
  • These capabilities span across Arm’s processor families, including Cortex-A, Cortex-X, and Neoverse processors

Kleidi platform integration: Arm’s Kleidi technology serves as a comprehensive solution for AI acceleration on mobile and server platforms.

  • KleidiAI, a library of optimized machine learning kernels, leverages Arm’s hardware accelerators for enhanced performance
  • Integration with popular frameworks like PyTorch and ExecuTorch provides developers with up to 12x performance improvements
  • The technology is openly available on GitLab, making it accessible to the developer community

Large Language Model optimization: Arm’s collaboration with Meta has resulted in significant improvements in LLM performance on CPU-based systems.

  • The Llama 3.2 model demonstrates impressive performance on Arm CPUs, achieving 29.3 tokens per second on AWS Graviton4 processors
  • Smartphone implementations show 5x improvement in prompt processing and 3x improvement in token generation
  • Models ranging from 1 billion to 90 billion parameters can run effectively on Arm CPUs, supporting both edge and cloud deployments

Developer benefits and accessibility: Arm’s approach emphasizes performance portability and ease of implementation for developers.

  • The platform enables single-time optimization with deployment across multiple platforms without modifications
  • Comprehensive documentation and resources support developers in implementing AI and ML workloads on Arm CPUs
  • The technology allows for efficient AI processing without relying on specialized hardware like GPUs or NPUs

Future implications: The ability to run sophisticated AI models directly on CPU hardware represents a significant shift in AI deployment strategies, potentially democratizing access to AI capabilities across a broader range of devices while reducing dependency on cloud infrastructure and specialized processors.

Here's how Arm accelerates AI workloads

Recent News

How AI is personalizing travel experiences and transforming hospitality

AI helps travel companies analyze customer data to create tailored itineraries, automate customer service, and optimize behind-the-scenes operations from flight scheduling to room pricing.

Elon Musk acquires X for $45 billion, merging social media with his AI company

Musk's combination of social media and AI companies creates a $113 billion enterprise with X valued significantly below its 2022 purchase price.

The paradox of AI alignment: Why perfectly obedient AI might be dangerous

Strict obedience in AI systems may prevent them from developing the moral reasoning needed to make ethical decisions.