AMD releases AMD-135M, its first open-source small language model

AMD’s Foray into Small Language Models: AMD has unveiled its first small language model (SLM), AMD-135M, marking a significant step in the company’s artificial intelligence initiatives.

AMD-135M is part of the Llama family of models and was trained from scratch on AMD Instinct™ MI250 accelerators.
The model comes in two variants: AMD-Llama-135M for general use and AMD-Llama-135M-code, which is fine-tuned for code-related tasks.
This release aligns with AMD’s commitment to an open approach to AI, aiming to foster inclusive, ethical, and innovative technological progress.

Training Process and Specifications: The development of AMD-135M involved substantial computational resources and time investment to create a capable small language model.

AMD-Llama-135M was trained on 670 billion tokens of general data over six days using four MI250 nodes.
The code-specific variant, AMD-Llama-135M-code, underwent additional fine-tuning with 20 billion tokens of code data, taking four days on the same hardware.
AMD has open-sourced the training code, dataset, and weights, enabling developers to reproduce the model and contribute to future SLM and LLM development.

Innovative Optimization Techniques: AMD has implemented speculative decoding to enhance the performance of its small language model, addressing key limitations in traditional language model inference.

Speculative decoding uses a small draft model to generate candidate tokens, which are then verified by the larger target model.
This approach allows for the generation of multiple tokens per forward pass, significantly reducing memory access consumption and improving inference speed.
The technique aims to overcome the limitations of traditional autoregressive approaches in large language models, which can only generate one token per forward pass.

Performance Improvements: Initial tests of AMD-135M demonstrate notable performance gains when used in conjunction with larger models.

AMD-Llama-135M-code was used as a draft model for CodeLlama-7b to test inference performance with and without speculative decoding.
Tests were conducted on both the MI250 accelerator for data centers and the Ryzen™ AI processor (with NPU) for AI PCs.
Significant speedups were observed on the Instinct MI250 accelerator, Ryzen AI CPU, and Ryzen AI NPU compared to inference without speculative decoding.

Implications for AI Development: The release of AMD-135M represents more than just a new model; it signifies AMD’s growing role in the AI ecosystem and its potential impact on future developments.

The model establishes an end-to-end workflow for both training and inferencing on select AMD platforms.
By open-sourcing the implementation, AMD is encouraging innovation and collaboration within the AI community.
This approach could lead to more rapid advancements in AI technology and potentially more diverse applications of small language models.

Future Outlook and Resources: AMD’s release of AMD-135M is accompanied by a suite of resources and opportunities for developers and researchers to engage with the technology.

A full technical blog post is available for those seeking more in-depth information about AMD-135M.
AMD has provided access to the code through its GitHub repository and model files via Hugging Face Model Card.
Developers can apply for access to Instinct accelerator cards on the AMD Developer Cloud, facilitating further experimentation and development.

Analyzing Deeper: While AMD’s entry into the small language model space is promising, its long-term impact remains to be seen. The success of AMD-135M could potentially challenge the dominance of larger tech companies in the AI model landscape, but it will depend on the model’s adoption rate and performance in real-world applications. Additionally, as AI technology continues to evolve rapidly, AMD will need to maintain a consistent pace of innovation to stay competitive in this fast-moving field.

AMD releases AMD-135M, its first open-source small language model

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development