×
Salesforce’s TACO is a new family of multimodal AI models
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Salesforce has unveiled TACO, a new family of multimodal AI models that can process multiple types of data and perform complex reasoning tasks using a step-by-step approach.

Key Innovation: TACO represents a significant advancement in multimodal AI by combining chains-of-thought-and-action (CoTA) with the ability to process various data types including images, text, and numerical calculations.

  • The system utilizes external tools like optical character recognition (OCR), depth estimation, and calculators to process different types of information
  • TACO can break down complex questions into smaller, manageable steps and execute them sequentially
  • The model demonstrates particular strength in tasks requiring both visual understanding and mathematical reasoning

Technical Implementation: Salesforce developed TACO through an extensive training process designed to enhance its problem-solving capabilities.

  • The model was trained using over 1 million synthetic CoTA traces
  • Training incorporated both model-based and programmatic generation methods
  • TACO showed 30-50% better performance compared to traditional direct-answer models
  • The system achieved up to 20% improvement over baseline models on the MMVet benchmark

Practical Applications: TACO’s architecture enables it to tackle real-world problems that require multiple steps and different types of reasoning.

  • The model can handle practical questions like calculating gas purchases from photographed price signs
  • Future applications could include medical question answering and web navigation tasks
  • The framework is designed to be adaptable for training new models with different actions across various domains

Looking Ahead: While TACO represents a significant step forward in multimodal AI capabilities, its true impact will likely depend on how effectively it can be integrated into practical applications and whether it can maintain consistent performance across diverse real-world scenarios.

Salesforce Introduces New Family of Multimodal Action Models Named TACO

Recent News

Apple’s cheapest iPad is bad for AI

Apple's budget tablet lacks sufficient RAM to run upcoming AI features, widening the gap with pricier models in the lineup.

Mira Murati’s AI venture recruits ex-OpenAI leader among first hires

Former OpenAI exec's new AI startup lures top talent and seeks $100 million in early funding.

Microsoft is cracking down on malicious actors who bypass Copilot’s safeguards

Tech giant targets cybercriminals who created and sold tools to bypass AI security measures and generate harmful content.