IBM’s David Cox champions open innovation in enterprise generative AI, emphasizing the importance of transparency, collaboration, and the integration of proprietary business data into AI models.
Nuanced view of openness in AI: Cox challenges the notion that openness in AI is a simple binary concept, highlighting the growing ecosystem of open models from various sources, including tech giants, universities, and nation-states:
- He raises concerns about the quality of openness in many large language models (LLMs), noting that some provide only a “bag of numbers” without clear information on how they were produced, making reproducibility difficult or impossible.
- Cox outlines key characteristics of successful open-source projects, such as frequent updates, structured release cycles, regular security fixes, and active community contributions, and argues that many current open LLMs lack these properties.
Integrating enterprise data into LLMs: Cox proposes a novel perspective on LLMs, framing them primarily as data representations rather than just conversational tools, and suggests a mission to represent enterprise data within foundation models:
- He points out a significant gap in current LLMs: the proprietary “secret sauce” of enterprises remains largely unrepresented, limiting the potential value of these models for businesses.
- To address this, Cox outlines a three-step approach for enterprises: finding an open, trusted base model; creating a new representation of business data; and deploying, scaling, and creating value.
InstructLab: Practical implementation of enterprise AI adoption: Cox introduces InstructLab, a collaborative project between IBM and Red Hat that brings his vision for integrating enterprise data with open-source LLMs to life:
- InstructLab addresses the challenge of incorporating proprietary enterprise knowledge into AI models by offering a “genuinely open-source contribution model for LLMs.”
- The project’s methodology revolves around a taxonomy of world knowledge and skills, enabling users to precisely target areas for model enhancement and facilitating the integration of enterprise-specific expertise.
- InstructLab’s use of a “teacher” model to generate synthetic training data allows for the integration of proprietary data while maintaining model performance and adding enterprise-specific capabilities.
Broader implications for the future of enterprise AI: Cox’s insights and IBM’s InstructLab point to a shift in enterprise AI adoption, moving from generic, off-the-shelf models to tailored solutions that reflect each company’s unique expertise. As this technology matures, the competitive edge may well belong to those who can most effectively turn their institutional knowledge into AI-powered insights, suggesting that the next chapter of AI is not just about smarter machines but about machines that understand businesses as well as their creators do.
IBM wants to teach AI the language of your business