×
How Arm boosts AI performance with its new chip designs
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The advancement of AI and machine learning technologies has positioned Arm as a key enabler of AI acceleration across diverse computing platforms, leveraging its decade-long expertise in processor architecture and optimization.

Core technology capabilities: Arm’s processor architecture incorporates multiple technologies that enable efficient AI workload processing without dedicated neural processing units (NPUs).

  • Matrix multiplication, the fundamental operation in AI processing, is accelerated through built-in hardware features in both Arm v8 and v9 architectures
  • Technologies like Neon, Scalable Vector Extensions (SVE), and Scalable Matrix Extensions (SME) enable CPUs to perform accelerated matrix operations independently
  • These capabilities span across Arm’s processor families, including Cortex-A, Cortex-X, and Neoverse processors

Kleidi platform integration: Arm’s Kleidi technology serves as a comprehensive solution for AI acceleration on mobile and server platforms.

  • KleidiAI, a library of optimized machine learning kernels, leverages Arm’s hardware accelerators for enhanced performance
  • Integration with popular frameworks like PyTorch and ExecuTorch provides developers with up to 12x performance improvements
  • The technology is openly available on GitLab, making it accessible to the developer community

Large Language Model optimization: Arm’s collaboration with Meta has resulted in significant improvements in LLM performance on CPU-based systems.

  • The Llama 3.2 model demonstrates impressive performance on Arm CPUs, achieving 29.3 tokens per second on AWS Graviton4 processors
  • Smartphone implementations show 5x improvement in prompt processing and 3x improvement in token generation
  • Models ranging from 1 billion to 90 billion parameters can run effectively on Arm CPUs, supporting both edge and cloud deployments

Developer benefits and accessibility: Arm’s approach emphasizes performance portability and ease of implementation for developers.

  • The platform enables single-time optimization with deployment across multiple platforms without modifications
  • Comprehensive documentation and resources support developers in implementing AI and ML workloads on Arm CPUs
  • The technology allows for efficient AI processing without relying on specialized hardware like GPUs or NPUs

Future implications: The ability to run sophisticated AI models directly on CPU hardware represents a significant shift in AI deployment strategies, potentially democratizing access to AI capabilities across a broader range of devices while reducing dependency on cloud infrastructure and specialized processors.

Here's how Arm accelerates AI workloads

Recent News

Apple’s cheapest iPad is bad for AI

Apple's budget tablet lacks sufficient RAM to run upcoming AI features, widening the gap with pricier models in the lineup.

Mira Murati’s AI venture recruits ex-OpenAI leader among first hires

Former OpenAI exec's new AI startup lures top talent and seeks $100 million in early funding.

Microsoft is cracking down on malicious actors who bypass Copilot’s safeguards

Tech giant targets cybercriminals who created and sold tools to bypass AI security measures and generate harmful content.