×
Salesforce’s TACO is a new family of multimodal AI models
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Salesforce has unveiled TACO, a new family of multimodal AI models that can process multiple types of data and perform complex reasoning tasks using a step-by-step approach.

Key Innovation: TACO represents a significant advancement in multimodal AI by combining chains-of-thought-and-action (CoTA) with the ability to process various data types including images, text, and numerical calculations.

  • The system utilizes external tools like optical character recognition (OCR), depth estimation, and calculators to process different types of information
  • TACO can break down complex questions into smaller, manageable steps and execute them sequentially
  • The model demonstrates particular strength in tasks requiring both visual understanding and mathematical reasoning

Technical Implementation: Salesforce developed TACO through an extensive training process designed to enhance its problem-solving capabilities.

  • The model was trained using over 1 million synthetic CoTA traces
  • Training incorporated both model-based and programmatic generation methods
  • TACO showed 30-50% better performance compared to traditional direct-answer models
  • The system achieved up to 20% improvement over baseline models on the MMVet benchmark

Practical Applications: TACO’s architecture enables it to tackle real-world problems that require multiple steps and different types of reasoning.

  • The model can handle practical questions like calculating gas purchases from photographed price signs
  • Future applications could include medical question answering and web navigation tasks
  • The framework is designed to be adaptable for training new models with different actions across various domains

Looking Ahead: While TACO represents a significant step forward in multimodal AI capabilities, its true impact will likely depend on how effectively it can be integrated into practical applications and whether it can maintain consistent performance across diverse real-world scenarios.

Salesforce Introduces New Family of Multimodal Action Models Named TACO

Recent News

New framework prevents AI agents from taking unsafe actions in enterprise settings

The framework provides runtime guardrails that intercept unsafe AI agent actions while preserving core functionality, addressing a key barrier to enterprise adoption.

Leaked database reveals China’s AI-powered censorship system targeting political content

The leaked database exposes how China is using advanced language models to automatically identify and censor indirect references to politically sensitive topics beyond traditional keyword filtering.

Study: Anthropic uncovers neural circuits behind AI hallucinations

Anthropic researchers have identified specific neural pathways that determine when AI models fabricate information versus admitting uncertainty, offering new insights into the mechanics behind artificial intelligence hallucinations.