Gradient and Crusoe collaborate to create open-source LLM with 1 million token context window, potentially reshuffling the AI landscape and unlocking new applications.
Key takeaways: Gradient and Crusoe have extended the context window of Llama-3 models to 1 million tokens, a significant milestone in the race to create open-source models with long context windows:
- Most LLMs with very long context windows, such as Anthropic Claude, OpenAI GPT-4, and Google Gemini, are private models.
- Open-source models with long context windows could reshuffle the LLM market and enable applications not possible with private models.
Enterprise need for open models: Gradient works with enterprise customers who require LLMs integrated into their workflows but face context limitations and data privacy restrictions:
- Extending the capabilities of coding copilots to generate entire code modules requires models to reference entire codebases, which is challenging with limited context windows.
- Many companies have restrictions on sending data to third parties, making private models like Gemini or Claude unsuitable.
Leveraging open research: Gradient relied heavily on open research from universities and institutes worldwide to develop their long-context models:
- They used Meta’s open model Llama 3 as the base, along with techniques from Berkeley AI Research, code from a Singapore research institute, and mathematical formulas from a Shanghai AI lab.
- Evaluation benchmarks from Nvidia helped track the performance of their models compared to other long-context LLMs.
Addressing compute challenges: Compute resources are a major bottleneck in LLM research, but Crusoe’s purpose-built AI cloud helped Gradient build and explore models cost-efficiently:
- Crusoe provided a customized Nvidia L40S cluster, optimized for Gradient’s specific needs, considerably reducing the cost of training the models.
- Close collaboration and open communication between Crusoe and Gradient enabled tailored compute offerings that are more difficult with other cloud providers.
Evaluating the models: Gradient used various benchmarks to assess the performance of their long-context models:
- The “needle in a haystack” test showed near-perfect performance up to around 2 million context length, comparable to Google’s Gemini 1.5 Pro.
- More advanced measures, such as multiple needles or adversarial needles, were also considered.
- The models were evaluated on Nvidia’s RULER benchmark, which includes 13 tasks for evaluating long-context LLMs.
Potential enterprise applications: Long-context open models could make it easier for companies and developers to build LLM-based applications:
- Agentic systems can do more with fewer calls by processing more information with each request.
- Complex data processing pipelines for tasks like style transfer could be simplified.
- The need for retrieval-augmented generation (RAG) could be reduced.
- Prototyping and demonstrating the possibilities of LLMs to enterprises becomes more accessible.
Analyzing deeper: While the creation of open-source, long-context LLMs is a significant milestone, it remains to be seen how they will compare to private models in terms of performance, safety, and scalability. Additionally, the compute resources required to train and deploy these models at scale may still be a barrier for many organizations. Nonetheless, the collaboration between Gradient and Crusoe showcases the potential for open research and purpose-built AI infrastructure to drive innovation in the rapidly evolving field of large language models.
How Gradient created an open LLM with a million-token context window