Mistral AI expands into multimodal AI: Mistral AI, the French AI startup, has released Pixtral 12B, its first multimodal AI model combining language and vision processing capabilities.
- The model is not yet available on the public web, but its source code can be downloaded from Hugging Face or GitHub for testing on individual instances.
- Mistral initially released the model through a torrent link, continuing its unconventional approach to AI model releases.
- Sophia Yang, head of developer relations at Mistral, announced that the model will soon be available through the company’s web chatbot and La Platforme API.
Key features of Pixtral 12B: The new model aims to enable users to analyze images in combination with text prompts, though official details about its training data remain undisclosed.
- Users can upload images or provide links to them and ask questions about the subjects in the files.
- According to Yang, Pixtral 12B will natively support an arbitrary number of images of arbitrary sizes, setting it apart from competitors.
- Initial testers report that the 24GB model has 40 layers, 14,336 hidden dimension sizes, and 32 attention heads for extensive computational processing.
- The vision encoder supports 1024×1024 image resolution and has 24 hidden layers for advanced image processing.
Competitive landscape: Mistral’s entry into multimodal AI puts it in direct competition with established players in the field.
- OpenAI and Anthropic already offer models with image-processing capabilities.
- Mistral’s move further democratizes access to visual applications such as content and data analysis.
- The company’s aggressive approach in the AI domain has included partnerships with industry giants like Microsoft, AWS, and Snowflake.
Mistral’s rapid growth and innovation: The release of Pixtral 12B is part of Mistral’s broader strategy to compete with leading AI labs.
- Mistral recently raised $640 million at a $6B valuation, demonstrating strong investor confidence.
- The company has launched several advanced models, including Mistral Large 2, a GPT-4 class model with multilingual capabilities.
- Other notable releases include Mixtral 8x22B (a mixture-of-experts model), Codestral (a 22B parameter open-weight coding model), and a dedicated model for math-related reasoning and scientific discovery.
Future implications: Mistral’s entry into multimodal AI could significantly impact the competitive landscape and accelerate innovation in the field.
- The open-source nature of Pixtral 12B may encourage wider adoption and experimentation among developers and researchers.
- As Mistral continues to challenge established players, it could drive further advancements in AI capabilities and applications across various industries.
- The true potential and performance of Pixtral 12B remain to be seen, but its release marks a significant milestone in Mistral’s journey to become a major player in the AI space.
Pixtral 12B is here: Mistral releases its first-ever multimodal AI model