×
Salesforce’s TACO is a new family of multimodal AI models
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Salesforce has unveiled TACO, a new family of multimodal AI models that can process multiple types of data and perform complex reasoning tasks using a step-by-step approach.

Key Innovation: TACO represents a significant advancement in multimodal AI by combining chains-of-thought-and-action (CoTA) with the ability to process various data types including images, text, and numerical calculations.

  • The system utilizes external tools like optical character recognition (OCR), depth estimation, and calculators to process different types of information
  • TACO can break down complex questions into smaller, manageable steps and execute them sequentially
  • The model demonstrates particular strength in tasks requiring both visual understanding and mathematical reasoning

Technical Implementation: Salesforce developed TACO through an extensive training process designed to enhance its problem-solving capabilities.

  • The model was trained using over 1 million synthetic CoTA traces
  • Training incorporated both model-based and programmatic generation methods
  • TACO showed 30-50% better performance compared to traditional direct-answer models
  • The system achieved up to 20% improvement over baseline models on the MMVet benchmark

Practical Applications: TACO’s architecture enables it to tackle real-world problems that require multiple steps and different types of reasoning.

  • The model can handle practical questions like calculating gas purchases from photographed price signs
  • Future applications could include medical question answering and web navigation tasks
  • The framework is designed to be adaptable for training new models with different actions across various domains

Looking Ahead: While TACO represents a significant step forward in multimodal AI capabilities, its true impact will likely depend on how effectively it can be integrated into practical applications and whether it can maintain consistent performance across diverse real-world scenarios.

Salesforce Introduces New Family of Multimodal Action Models Named TACO

Recent News

North Korea unveils AI-equipped suicide drones amid deepening Russia ties

North Korea's AI-equipped suicide drones reflect growing technological cooperation with Russia, potentially destabilizing security in an already tense Korean peninsula.

Rookie mistake: Police recruit fired for using ChatGPT on academy essay finds second chance

A promising police career was derailed then revived after an officer's use of AI revealed gaps in how law enforcement is adapting to new technology.

Auburn University launches AI-focused cybersecurity center to counter emerging threats

Auburn's new center brings together experts from multiple disciplines to develop defensive strategies against the rising tide of AI-powered cyber threats affecting 78 percent of security officers surveyed.