The advancement of AI and machine learning technologies has positioned Arm as a key enabler of AI acceleration across diverse computing platforms, leveraging its decade-long expertise in processor architecture and optimization.
Core technology capabilities: Arm’s processor architecture incorporates multiple technologies that enable efficient AI workload processing without dedicated neural processing units (NPUs).
- Matrix multiplication, the fundamental operation in AI processing, is accelerated through built-in hardware features in both Arm v8 and v9 architectures
- Technologies like Neon, Scalable Vector Extensions (SVE), and Scalable Matrix Extensions (SME) enable CPUs to perform accelerated matrix operations independently
- These capabilities span across Arm’s processor families, including Cortex-A, Cortex-X, and Neoverse processors
Kleidi platform integration: Arm’s Kleidi technology serves as a comprehensive solution for AI acceleration on mobile and server platforms.
- KleidiAI, a library of optimized machine learning kernels, leverages Arm’s hardware accelerators for enhanced performance
- Integration with popular frameworks like PyTorch and ExecuTorch provides developers with up to 12x performance improvements
- The technology is openly available on GitLab, making it accessible to the developer community
Large Language Model optimization: Arm’s collaboration with Meta has resulted in significant improvements in LLM performance on CPU-based systems.
- The Llama 3.2 model demonstrates impressive performance on Arm CPUs, achieving 29.3 tokens per second on AWS Graviton4 processors
- Smartphone implementations show 5x improvement in prompt processing and 3x improvement in token generation
- Models ranging from 1 billion to 90 billion parameters can run effectively on Arm CPUs, supporting both edge and cloud deployments
Developer benefits and accessibility: Arm’s approach emphasizes performance portability and ease of implementation for developers.
- The platform enables single-time optimization with deployment across multiple platforms without modifications
- Comprehensive documentation and resources support developers in implementing AI and ML workloads on Arm CPUs
- The technology allows for efficient AI processing without relying on specialized hardware like GPUs or NPUs
Future implications: The ability to run sophisticated AI models directly on CPU hardware represents a significant shift in AI deployment strategies, potentially democratizing access to AI capabilities across a broader range of devices while reducing dependency on cloud infrastructure and specialized processors.
Here's how Arm accelerates AI workloads