Salesforce’s TACO is a new family of multimodal AI models

The AI system processes images, text, and numbers while explaining its decision-making process, showing a 30-50% accuracy improvement over traditional models.

Written by CO/AI Bot

Published on January 17th, 2025 12:20 PM

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Salesforce has unveiled TACO, a new family of multimodal AI models that can process multiple types of data and perform complex reasoning tasks using a step-by-step approach.

Key Innovation: TACO represents a significant advancement in multimodal AI by combining chains-of-thought-and-action (CoTA) with the ability to process various data types including images, text, and numerical calculations.

The system utilizes external tools like optical character recognition (OCR), depth estimation, and calculators to process different types of information
TACO can break down complex questions into smaller, manageable steps and execute them sequentially
The model demonstrates particular strength in tasks requiring both visual understanding and mathematical reasoning

Technical Implementation: Salesforce developed TACO through an extensive training process designed to enhance its problem-solving capabilities.

The model was trained using over 1 million synthetic CoTA traces
Training incorporated both model-based and programmatic generation methods
TACO showed 30-50% better performance compared to traditional direct-answer models
The system achieved up to 20% improvement over baseline models on the MMVet benchmark

Practical Applications: TACO’s architecture enables it to tackle real-world problems that require multiple steps and different types of reasoning.

The model can handle practical questions like calculating gas purchases from photographed price signs
Future applications could include medical question answering and web navigation tasks
The framework is designed to be adaptable for training new models with different actions across various domains

Looking Ahead: While TACO represents a significant step forward in multimodal AI capabilities, its true impact will likely depend on how effectively it can be integrated into practical applications and whether it can maintain consistent performance across diverse real-world scenarios.

Salesforce Introduces New Family of Multimodal Action Models Named TACO

TelecomTalk

OnePlus 13 gets AI Perfect Shot to fix blinks and bad expressions

The feature swaps unwanted facial expressions with better alternatives from the same photo sequence.

X plans to embed ads inside Grok’s AI answers, ending AI neutrality

Promotional messaging becomes indistinguishable from AI reasoning itself.

High-low split: 75% of executives think AI is working, employees disagree

Frustrated employees create security risks by entering company data into unapproved AI platforms.

No hype. No doom. Just actionable resources and strategies to accelerate your success in the age of AI.

Join the revolution

AI is moving at lightning speed, but we won’t let you get left behind. Sign up for our newsletter and get notified of the latest AI news, research, tools, and our expert-written prompts & playbooks.

Join our newsletter!

Outsider Labs, Inc. Venice, CA 90291

Menu

Salesforce’s TACO is a new family of multimodal AI models

Recent News

OnePlus 13 gets AI Perfect Shot to fix blinks and bad expressions

X plans to embed ads inside Grok’s AI answers, ending AI neutrality

High-low split: 75% of executives think AI is working, employees disagree

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Salesforce’s TACO is a new family of multimodal AI models

Recent News

OnePlus 13 gets AI Perfect Shot to fix blinks and bad expressions

X plans to embed ads inside Grok’s AI answers, ending AI neutrality

High-low split: 75% of executives think AI is working, employees disagree

Join the revolution

CO/AI

Resources

Join the revolution