How multimodal AI models are unlocking opportunities for vertical applications

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Multimodal AI: Expanding Vertical AI’s Impact: The emergence of multimodal models capable of processing audio, video, voice, and vision data is creating new opportunities for vertical AI applications to transform a wider range of industries and workflows.

Key advancements in multimodal architecture:

Recent models have demonstrated improved context understanding, reduced hallucinations, and enhanced reasoning capabilities.
Performance in speech recognition, image processing, and voice generation is approaching or surpassing human capabilities in some cases.
New speech-native models, like OpenAI’s Realtime API and Kyutai’s Moshi, are replacing cascading architecture with lower latency and better context capture.

Voice capabilities and use cases:

Transcription applications are freeing up time for professionals in various fields:
- Abridge’s medical transcription tool generates notes and identifies follow-ups from clinical conversations.
- Rillavoice records and transcribes conversations for sales training in the home services industry.
End-to-end voice agents are showing promise in multiple areas:
- Inbound sales: Fielding customer calls after hours and booking appointments.
- Customer support: Providing more effective responses than traditional IVR systems.
- Outbound calls: Automating initial contact for sales and recruiting teams.

Vision capabilities and applications:

Models like GPT-4V and Gemini 1.5 Pro can interpret images, respond to questions, and process raw images and video.
Key use cases include:
- Data extraction from unstructured documents (e.g., Raft’s platform for freight forwarding).
- Visual inspection augmentation (e.g., xBuild’s AI construction platform).
- 2D and 3D design generation (e.g., Snaptrude’s 3D building design tool).
- Video analytics for safety monitoring and object tracking.

The rise of AI agents:

Progress in constraining tasks for AI agents has led to reduced errors in multi-step reasoning.
Reasoning-focused foundation models like OpenAI’s o1 are showing promise in complex problem-solving.
Current applications include:
- Sales and marketing: Researching prospects and crafting personalized outreach.
- Negotiations: Automating legal and commercial term negotiations.
- Investigations: Assisting with initial phases of cybersecurity alert investigations.

Broader implications for vertical AI:

Multimodal capabilities are expanding the potential impact of vertical AI across industries.
As underlying models become commoditized, it will be more sustainable for companies to build applications on top of powerful foundation models.
The integration of these new capabilities is expected to fundamentally change how we work and interact with the world.

Looking ahead: The next wave of vertical AI applications will likely focus on addressing complex workflows autonomously, leveraging advancements in reasoning-based models. As the technology continues to evolve, we can expect to see novel business models emerge, including copilots, agents, and AI-enabled services, opening up new opportunities in previously untapped industries.

Part II: Multimodal capabilities unlock new opportunities in Vertical AI

Bessemer Venture Partners

Menu

How multimodal AI models are unlocking opportunities for vertical applications

Recent News

Arc browser maker launches $20/month AI subscription tier

OpenAI releases first open-source models with Phi-like synthetic training

When to use AI coding tools, when to avoid them, and when to split the difference

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

How multimodal AI models are unlocking opportunities for vertical applications

Recent News

Arc browser maker launches $20/month AI subscription tier

OpenAI releases first open-source models with Phi-like synthetic training

When to use AI coding tools, when to avoid them, and when to split the difference

Join the revolution

CO/AI

Resources

Join the revolution