Apple researchers have developed a new compression technique for large language models that could significantly accelerate AI deployment on memory-constrained devices. SeedLM represents a novel approach to model compression that maintains performance while reducing memory requirements, potentially enabling more efficient AI systems across a range of hardware platforms. The technique’s data-free approach and ability to maintain accuracy even at high compression rates could help address one of the most significant barriers to widespread LLM implementation.
The big picture: Apple researchers have introduced SeedLM, a post-training compression method that efficiently encodes model weights using seeds from a pseudo-random generator, addressing the high runtime costs of large language models.
- SeedLM uses Linear Feedback Shift Registers (LFSRs) during inference to generate random matrices that, when combined with compressed coefficients, can reconstruct weight blocks.
- Unlike competing compression techniques, SeedLM operates without requiring calibration data, making it more versatile across different tasks and applications.
How it works: The technique trades compute for memory by generating weight matrices on-the-fly during inference rather than storing and retrieving them from memory.
- For each block of weights in the model, researchers find a seed that feeds into an LFSR to efficiently generate a random matrix during runtime.
- These generated matrices are linearly combined with compressed coefficients to reconstruct the original weight blocks, reducing both storage requirements and memory bandwidth needs.
Key results: Tests with the particularly challenging Llama3 70B model demonstrate that SeedLM maintains performance comparable to much larger models while achieving significant compression.
- The method’s zero-shot accuracy retention at 4-bit and 3-bit compression matches or exceeds state-of-the-art compression methods.
- FPGA-based testing shows that 4-bit SeedLM approaches a 4x speed-up over FP16 baselines as model size increases.
Why this matters: SeedLM addresses one of the fundamental bottlenecks in AI deployment by focusing on the memory bandwidth limitations that often constrain inference performance.
- By reducing memory access requirements, the technique could enable more efficient AI systems on resource-constrained devices like mobile phones and edge computing platforms.
- The data-free approach eliminates the need for task-specific calibration data, potentially making LLM deployment more practical across diverse applications.
In plain English: Apple researchers have created a clever way to shrink massive AI models without sacrificing performance by using mathematical shortcuts to generate parts of the model on-demand rather than storing everything in memory.
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...