×
How Arm boosts AI performance with its new chip designs
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The advancement of AI and machine learning technologies has positioned Arm as a key enabler of AI acceleration across diverse computing platforms, leveraging its decade-long expertise in processor architecture and optimization.

Core technology capabilities: Arm’s processor architecture incorporates multiple technologies that enable efficient AI workload processing without dedicated neural processing units (NPUs).

  • Matrix multiplication, the fundamental operation in AI processing, is accelerated through built-in hardware features in both Arm v8 and v9 architectures
  • Technologies like Neon, Scalable Vector Extensions (SVE), and Scalable Matrix Extensions (SME) enable CPUs to perform accelerated matrix operations independently
  • These capabilities span across Arm’s processor families, including Cortex-A, Cortex-X, and Neoverse processors

Kleidi platform integration: Arm’s Kleidi technology serves as a comprehensive solution for AI acceleration on mobile and server platforms.

  • KleidiAI, a library of optimized machine learning kernels, leverages Arm’s hardware accelerators for enhanced performance
  • Integration with popular frameworks like PyTorch and ExecuTorch provides developers with up to 12x performance improvements
  • The technology is openly available on GitLab, making it accessible to the developer community

Large Language Model optimization: Arm’s collaboration with Meta has resulted in significant improvements in LLM performance on CPU-based systems.

  • The Llama 3.2 model demonstrates impressive performance on Arm CPUs, achieving 29.3 tokens per second on AWS Graviton4 processors
  • Smartphone implementations show 5x improvement in prompt processing and 3x improvement in token generation
  • Models ranging from 1 billion to 90 billion parameters can run effectively on Arm CPUs, supporting both edge and cloud deployments

Developer benefits and accessibility: Arm’s approach emphasizes performance portability and ease of implementation for developers.

  • The platform enables single-time optimization with deployment across multiple platforms without modifications
  • Comprehensive documentation and resources support developers in implementing AI and ML workloads on Arm CPUs
  • The technology allows for efficient AI processing without relying on specialized hardware like GPUs or NPUs

Future implications: The ability to run sophisticated AI models directly on CPU hardware represents a significant shift in AI deployment strategies, potentially democratizing access to AI capabilities across a broader range of devices while reducing dependency on cloud infrastructure and specialized processors.

Here's how Arm accelerates AI workloads

Recent News

Veo 2 vs. Sora: A closer look at Google and OpenAI’s latest AI video tools

Tech companies unveil AI tools capable of generating realistic short videos from text prompts, though length and quality limitations persist as major hurdles.

7 essential ways to use ChatGPT’s new mobile search feature

OpenAI's mobile search upgrade enables business users to access current market data and news through conversational queries, marking a departure from traditional search methods.

FastVideo is an open-source framework that accelerates video diffusion models

New optimization techniques reduce the computing power needed for AI video generation from days to hours, though widespread adoption remains limited by hardware costs.