×
How multimodal AI models are unlocking opportunities for vertical applications
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Multimodal AI: Expanding Vertical AI’s Impact: The emergence of multimodal models capable of processing audio, video, voice, and vision data is creating new opportunities for vertical AI applications to transform a wider range of industries and workflows.

Key advancements in multimodal architecture:

  • Recent models have demonstrated improved context understanding, reduced hallucinations, and enhanced reasoning capabilities.
  • Performance in speech recognition, image processing, and voice generation is approaching or surpassing human capabilities in some cases.
  • New speech-native models, like OpenAI’s Realtime API and Kyutai’s Moshi, are replacing cascading architecture with lower latency and better context capture.

Voice capabilities and use cases:

  • Transcription applications are freeing up time for professionals in various fields:

    • Abridge’s medical transcription tool generates notes and identifies follow-ups from clinical conversations.
    • Rillavoice records and transcribes conversations for sales training in the home services industry.
  • End-to-end voice agents are showing promise in multiple areas:

    • Inbound sales: Fielding customer calls after hours and booking appointments.
    • Customer support: Providing more effective responses than traditional IVR systems.
    • Outbound calls: Automating initial contact for sales and recruiting teams.

Vision capabilities and applications:

  • Models like GPT-4V and Gemini 1.5 Pro can interpret images, respond to questions, and process raw images and video.
  • Key use cases include:
    • Data extraction from unstructured documents (e.g., Raft’s platform for freight forwarding).
    • Visual inspection augmentation (e.g., xBuild’s AI construction platform).
    • 2D and 3D design generation (e.g., Snaptrude’s 3D building design tool).
    • Video analytics for safety monitoring and object tracking.

The rise of AI agents:

  • Progress in constraining tasks for AI agents has led to reduced errors in multi-step reasoning.
  • Reasoning-focused foundation models like OpenAI’s o1 are showing promise in complex problem-solving.
  • Current applications include:
    • Sales and marketing: Researching prospects and crafting personalized outreach.
    • Negotiations: Automating legal and commercial term negotiations.
    • Investigations: Assisting with initial phases of cybersecurity alert investigations.

Broader implications for vertical AI:

  • Multimodal capabilities are expanding the potential impact of vertical AI across industries.
  • As underlying models become commoditized, it will be more sustainable for companies to build applications on top of powerful foundation models.
  • The integration of these new capabilities is expected to fundamentally change how we work and interact with the world.

Looking ahead: The next wave of vertical AI applications will likely focus on addressing complex workflows autonomously, leveraging advancements in reasoning-based models. As the technology continues to evolve, we can expect to see novel business models emerge, including copilots, agents, and AI-enabled services, opening up new opportunities in previously untapped industries.

Part II: Multimodal capabilities unlock new opportunities in Vertical AI

Recent News

Sakana AI’s new tech is searching for signs of artificial life emerging from simulations

A self-learning AI system discovers complex cellular patterns and behaviors in digital simulations, automating what was previously months of manual scientific observation.

Dating app usage hit record highs in 2024, but even AI isn’t making daters happier

Growth in dating apps driven by older demographics and AI features masks persistent user dissatisfaction with the digital dating experience.

Craft personalized video messages from Santa with Synthesia’s new tool

Major tech platforms delivered customized Santa videos and messages powered by AI, allowing parents to create personalized holiday greetings in multiple languages.