The rise of alternative AI architectures: AI21 CEO Ari Goshen argues that Transformer models, while popular, may not be the best choice for developing efficient AI agents due to their limitations and high costs.
- Goshen believes that alternative architectures, such as Mamba and AI21’s JAMBA, offer better performance and efficiency for AI agents.
- These architectures can provide faster inference times, longer context, and improved memory performance compared to Transformer models.
- AI21 is developing foundation models using its JAMBA architecture, which combines elements of Joint Attention and Mamba.
Challenges with Transformer models: The reliance on Large Language Models (LLMs) built with Transformer architecture has potentially hindered the widespread adoption and production deployment of AI agents.
- Transformer models can be expensive to run due to their token-based approach, making them less suitable for multi-agent ecosystems.
- The stochastic nature of Transformer models can lead to error perpetuation, affecting the reliability of AI agents.
- Goshen suggests that the lack of reliability is the main reason why many AI agents have not yet entered production environments.
Growing popularity of enterprise AI agents: Despite challenges, AI agents are emerging as a significant trend in enterprise AI, with several major companies launching agent platforms and integrations.
- ServiceNow has updated its Now Assist AI platform to include a library of AI agents for customers.
- Salesforce introduced Agentforce, a collection of AI agents for various tasks.
- Slack now allows users to integrate agents from multiple providers, including Salesforce, Cohere, Workday, Asana, and Adobe.
The potential of AI agents: Goshen envisions a future where AI agents can offer more sophisticated capabilities beyond simple chatbot-like interactions.
- He believes that “real intelligence” lies in connecting and retrieving information from various sources.
- AI21 is currently developing its own offerings in the AI agent space, aiming to leverage alternative architectures for improved performance.
Alternative architectures gaining traction: While Transformer models remain dominant, other architectures like Mamba are attracting attention from AI developers and researchers.
- Mamba-based models can prioritize data, assign weights to inputs, optimize memory usage, and efficiently utilize GPU processing power.
- Open-source AI developers have begun releasing Mamba-based models, including Mistral‘s Codestral Mamba 7B and Falcon’s Falcon Mamba 7B.
- However, Transformer architecture remains the default choice for most popular foundation models, including OpenAI’s GPT series.
Caution for enterprises: Goshen advises enterprises to approach AI adoption with care, emphasizing the importance of reliability over flashy demonstrations.
- While AI can be useful for research purposes, Goshen believes it’s not yet ready to inform critical business decisions.
- He warns against being swayed by charismatic demos that promise to solve numerous problems.
Looking ahead: The future of AI agents and enterprise AI solutions may depend on finding the right balance between model architectures and practical applications.
- As alternative architectures like Mamba and JAMBA continue to evolve, they may offer new possibilities for developing more efficient and reliable AI agents.
- The industry will likely see ongoing competition between different architectural approaches, driving innovation in AI model development.
- Enterprises will need to carefully evaluate the trade-offs between performance, cost, and reliability when choosing AI solutions for their specific needs.
AI21 CEO says transformers not right for AI agents due to error perpetuation