×
How multimodal AI models are unlocking opportunities for vertical applications
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Multimodal AI: Expanding Vertical AI’s Impact: The emergence of multimodal models capable of processing audio, video, voice, and vision data is creating new opportunities for vertical AI applications to transform a wider range of industries and workflows.

Key advancements in multimodal architecture:

  • Recent models have demonstrated improved context understanding, reduced hallucinations, and enhanced reasoning capabilities.
  • Performance in speech recognition, image processing, and voice generation is approaching or surpassing human capabilities in some cases.
  • New speech-native models, like OpenAI’s Realtime API and Kyutai’s Moshi, are replacing cascading architecture with lower latency and better context capture.

Voice capabilities and use cases:

  • Transcription applications are freeing up time for professionals in various fields:

    • Abridge’s medical transcription tool generates notes and identifies follow-ups from clinical conversations.
    • Rillavoice records and transcribes conversations for sales training in the home services industry.
  • End-to-end voice agents are showing promise in multiple areas:

    • Inbound sales: Fielding customer calls after hours and booking appointments.
    • Customer support: Providing more effective responses than traditional IVR systems.
    • Outbound calls: Automating initial contact for sales and recruiting teams.

Vision capabilities and applications:

  • Models like GPT-4V and Gemini 1.5 Pro can interpret images, respond to questions, and process raw images and video.
  • Key use cases include:
    • Data extraction from unstructured documents (e.g., Raft’s platform for freight forwarding).
    • Visual inspection augmentation (e.g., xBuild’s AI construction platform).
    • 2D and 3D design generation (e.g., Snaptrude’s 3D building design tool).
    • Video analytics for safety monitoring and object tracking.

The rise of AI agents:

  • Progress in constraining tasks for AI agents has led to reduced errors in multi-step reasoning.
  • Reasoning-focused foundation models like OpenAI’s o1 are showing promise in complex problem-solving.
  • Current applications include:
    • Sales and marketing: Researching prospects and crafting personalized outreach.
    • Negotiations: Automating legal and commercial term negotiations.
    • Investigations: Assisting with initial phases of cybersecurity alert investigations.

Broader implications for vertical AI:

  • Multimodal capabilities are expanding the potential impact of vertical AI across industries.
  • As underlying models become commoditized, it will be more sustainable for companies to build applications on top of powerful foundation models.
  • The integration of these new capabilities is expected to fundamentally change how we work and interact with the world.

Looking ahead: The next wave of vertical AI applications will likely focus on addressing complex workflows autonomously, leveraging advancements in reasoning-based models. As the technology continues to evolve, we can expect to see novel business models emerge, including copilots, agents, and AI-enabled services, opening up new opportunities in previously untapped industries.

Part II: Multimodal capabilities unlock new opportunities in Vertical AI

Recent News

Amazon chief says GenAI is growing 3X faster than cloud computing

Amazon's AWS division sees AI services growing three times faster than traditional cloud offerings as enterprise customers rush to adopt artificial intelligence tools.

Microsoft’s 10 new AI agents fortify its grip on enterprise AI

Microsoft's enterprise AI agents gain rapid adoption as 100,000 organizations deploy automated business tools across customer service, finance, and supply chain operations.

Former BP CEO joins AI data center startup

Energy veterans and tech companies forge new alliances as AI computing centers strain power grids and demand sustainable solutions.