×
AMD releases AMD-135M, its first open-source small language model
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AMD’s Foray into Small Language Models: AMD has unveiled its first small language model (SLM), AMD-135M, marking a significant step in the company’s artificial intelligence initiatives.

  • AMD-135M is part of the Llama family of models and was trained from scratch on AMD Instinct™ MI250 accelerators.
  • The model comes in two variants: AMD-Llama-135M for general use and AMD-Llama-135M-code, which is fine-tuned for code-related tasks.
  • This release aligns with AMD’s commitment to an open approach to AI, aiming to foster inclusive, ethical, and innovative technological progress.

Training Process and Specifications: The development of AMD-135M involved substantial computational resources and time investment to create a capable small language model.

  • AMD-Llama-135M was trained on 670 billion tokens of general data over six days using four MI250 nodes.
  • The code-specific variant, AMD-Llama-135M-code, underwent additional fine-tuning with 20 billion tokens of code data, taking four days on the same hardware.
  • AMD has open-sourced the training code, dataset, and weights, enabling developers to reproduce the model and contribute to future SLM and LLM development.

Innovative Optimization Techniques: AMD has implemented speculative decoding to enhance the performance of its small language model, addressing key limitations in traditional language model inference.

  • Speculative decoding uses a small draft model to generate candidate tokens, which are then verified by the larger target model.
  • This approach allows for the generation of multiple tokens per forward pass, significantly reducing memory access consumption and improving inference speed.
  • The technique aims to overcome the limitations of traditional autoregressive approaches in large language models, which can only generate one token per forward pass.

Performance Improvements: Initial tests of AMD-135M demonstrate notable performance gains when used in conjunction with larger models.

  • AMD-Llama-135M-code was used as a draft model for CodeLlama-7b to test inference performance with and without speculative decoding.
  • Tests were conducted on both the MI250 accelerator for data centers and the Ryzen™ AI processor (with NPU) for AI PCs.
  • Significant speedups were observed on the Instinct MI250 accelerator, Ryzen AI CPU, and Ryzen AI NPU compared to inference without speculative decoding.

Implications for AI Development: The release of AMD-135M represents more than just a new model; it signifies AMD’s growing role in the AI ecosystem and its potential impact on future developments.

  • The model establishes an end-to-end workflow for both training and inferencing on select AMD platforms.
  • By open-sourcing the implementation, AMD is encouraging innovation and collaboration within the AI community.
  • This approach could lead to more rapid advancements in AI technology and potentially more diverse applications of small language models.

Future Outlook and Resources: AMD’s release of AMD-135M is accompanied by a suite of resources and opportunities for developers and researchers to engage with the technology.

  • A full technical blog post is available for those seeking more in-depth information about AMD-135M.
  • AMD has provided access to the code through its GitHub repository and model files via Hugging Face Model Card.
  • Developers can apply for access to Instinct accelerator cards on the AMD Developer Cloud, facilitating further experimentation and development.

Analyzing Deeper: While AMD’s entry into the small language model space is promising, its long-term impact remains to be seen. The success of AMD-135M could potentially challenge the dominance of larger tech companies in the AI model landscape, but it will depend on the model’s adoption rate and performance in real-world applications. Additionally, as AI technology continues to evolve rapidly, AMD will need to maintain a consistent pace of innovation to stay competitive in this fast-moving field.

AMD Unveils Its First Small Language Model AMD-135M

Recent News

North Korea unveils AI-equipped suicide drones amid deepening Russia ties

North Korea's AI-equipped suicide drones reflect growing technological cooperation with Russia, potentially destabilizing security in an already tense Korean peninsula.

Rookie mistake: Police recruit fired for using ChatGPT on academy essay finds second chance

A promising police career was derailed then revived after an officer's use of AI revealed gaps in how law enforcement is adapting to new technology.

Auburn University launches AI-focused cybersecurity center to counter emerging threats

Auburn's new center brings together experts from multiple disciplines to develop defensive strategies against the rising tide of AI-powered cyber threats affecting 78 percent of security officers surveyed.