×
How multimodal AI models are unlocking opportunities for vertical applications
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Multimodal AI: Expanding Vertical AI’s Impact: The emergence of multimodal models capable of processing audio, video, voice, and vision data is creating new opportunities for vertical AI applications to transform a wider range of industries and workflows.

Key advancements in multimodal architecture:

  • Recent models have demonstrated improved context understanding, reduced hallucinations, and enhanced reasoning capabilities.
  • Performance in speech recognition, image processing, and voice generation is approaching or surpassing human capabilities in some cases.
  • New speech-native models, like OpenAI’s Realtime API and Kyutai’s Moshi, are replacing cascading architecture with lower latency and better context capture.

Voice capabilities and use cases:

  • Transcription applications are freeing up time for professionals in various fields:

    • Abridge’s medical transcription tool generates notes and identifies follow-ups from clinical conversations.
    • Rillavoice records and transcribes conversations for sales training in the home services industry.
  • End-to-end voice agents are showing promise in multiple areas:

    • Inbound sales: Fielding customer calls after hours and booking appointments.
    • Customer support: Providing more effective responses than traditional IVR systems.
    • Outbound calls: Automating initial contact for sales and recruiting teams.

Vision capabilities and applications:

  • Models like GPT-4V and Gemini 1.5 Pro can interpret images, respond to questions, and process raw images and video.
  • Key use cases include:
    • Data extraction from unstructured documents (e.g., Raft’s platform for freight forwarding).
    • Visual inspection augmentation (e.g., xBuild’s AI construction platform).
    • 2D and 3D design generation (e.g., Snaptrude’s 3D building design tool).
    • Video analytics for safety monitoring and object tracking.

The rise of AI agents:

  • Progress in constraining tasks for AI agents has led to reduced errors in multi-step reasoning.
  • Reasoning-focused foundation models like OpenAI’s o1 are showing promise in complex problem-solving.
  • Current applications include:
    • Sales and marketing: Researching prospects and crafting personalized outreach.
    • Negotiations: Automating legal and commercial term negotiations.
    • Investigations: Assisting with initial phases of cybersecurity alert investigations.

Broader implications for vertical AI:

  • Multimodal capabilities are expanding the potential impact of vertical AI across industries.
  • As underlying models become commoditized, it will be more sustainable for companies to build applications on top of powerful foundation models.
  • The integration of these new capabilities is expected to fundamentally change how we work and interact with the world.

Looking ahead: The next wave of vertical AI applications will likely focus on addressing complex workflows autonomously, leveraging advancements in reasoning-based models. As the technology continues to evolve, we can expect to see novel business models emerge, including copilots, agents, and AI-enabled services, opening up new opportunities in previously untapped industries.

Part II: Multimodal capabilities unlock new opportunities in Vertical AI

Recent News

Vivo unveils AI-powered FunTouch OS 15 upgrades

The Chinese smartphone maker introduces eight new AI tools for photo editing, language translation, and note-taking that mirror features previously exclusive to Google Pixel devices.

Microsoft’s AI-generated ad goes unnoticed by viewers

Microsoft's Surface ad used AI for 90% time and cost savings, blending synthetic and traditional footage without viewers detecting the difference.

Nvidia launches NeMo to simplify AI agent creation

The microservices framework enables enterprises to build self-improving AI agents that integrate with business systems and continuously learn from organizational data.