×
How Arm boosts AI performance with its new chip designs
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The advancement of AI and machine learning technologies has positioned Arm as a key enabler of AI acceleration across diverse computing platforms, leveraging its decade-long expertise in processor architecture and optimization.

Core technology capabilities: Arm’s processor architecture incorporates multiple technologies that enable efficient AI workload processing without dedicated neural processing units (NPUs).

  • Matrix multiplication, the fundamental operation in AI processing, is accelerated through built-in hardware features in both Arm v8 and v9 architectures
  • Technologies like Neon, Scalable Vector Extensions (SVE), and Scalable Matrix Extensions (SME) enable CPUs to perform accelerated matrix operations independently
  • These capabilities span across Arm’s processor families, including Cortex-A, Cortex-X, and Neoverse processors

Kleidi platform integration: Arm’s Kleidi technology serves as a comprehensive solution for AI acceleration on mobile and server platforms.

  • KleidiAI, a library of optimized machine learning kernels, leverages Arm’s hardware accelerators for enhanced performance
  • Integration with popular frameworks like PyTorch and ExecuTorch provides developers with up to 12x performance improvements
  • The technology is openly available on GitLab, making it accessible to the developer community

Large Language Model optimization: Arm’s collaboration with Meta has resulted in significant improvements in LLM performance on CPU-based systems.

  • The Llama 3.2 model demonstrates impressive performance on Arm CPUs, achieving 29.3 tokens per second on AWS Graviton4 processors
  • Smartphone implementations show 5x improvement in prompt processing and 3x improvement in token generation
  • Models ranging from 1 billion to 90 billion parameters can run effectively on Arm CPUs, supporting both edge and cloud deployments

Developer benefits and accessibility: Arm’s approach emphasizes performance portability and ease of implementation for developers.

  • The platform enables single-time optimization with deployment across multiple platforms without modifications
  • Comprehensive documentation and resources support developers in implementing AI and ML workloads on Arm CPUs
  • The technology allows for efficient AI processing without relying on specialized hardware like GPUs or NPUs

Future implications: The ability to run sophisticated AI models directly on CPU hardware represents a significant shift in AI deployment strategies, potentially democratizing access to AI capabilities across a broader range of devices while reducing dependency on cloud infrastructure and specialized processors.

Here's how Arm accelerates AI workloads

Recent News