Hume’s EVI 2: A leap forward in AI voice technology: Hume, an AI startup, has unveiled Empathic Voice Interface 2 (EVI 2), a significant upgrade to its AI voice model and API, offering enhanced naturalness, emotional responsiveness, and customizability.
Key improvements and features: EVI 2 brings substantial enhancements over its predecessor, addressing critical aspects of AI voice interaction.
- The new version boasts a 40% reduction in latency, with response times ranging from 500 to 800 milliseconds, greatly improving real-time conversation capabilities.
- EVI 2 demonstrates improved emotional intelligence, better understanding and responding to the emotional context of user inputs.
- The system now allows for customizable voices, enabling adjustments to voice parameters without the need for voice cloning.
- Users can modify the AI’s speaking style dynamically during conversations through in-conversation prompts.
- While currently supporting English, EVI 2 has plans to expand its multilingual capabilities in the future.
Technical advancements: The core technology behind EVI 2 represents a significant step forward in AI voice processing.
- EVI 2 converts audio signals directly into tokens, a method similar to OpenAI’s GPT-4o, potentially improving efficiency and accuracy in voice recognition and processing.
- The system allows for direct integration within apps, eliminating the need for separate voice assistants and potentially streamlining user experiences.
Cost-effectiveness and pricing: Hume has made EVI 2 more accessible to developers and businesses through competitive pricing.
- The new version is priced at $0.072 per minute, marking a 30% reduction from its predecessor.
- Enterprise users can benefit from volume discounts, making it more attractive for large-scale implementations.
Additional offerings and APIs: Alongside EVI 2, Hume provides complementary tools to enhance voice and expression analysis.
- The Expression Measurement API, starting at $0.0276 per minute for video with audio, measures various aspects of communication including speech prosody and facial expressions.
- Hume offers a Custom Models API with free model training, with inference costs matching those of the Expression Measurement API.
Future roadmap and industry positioning: Hume has outlined ambitious plans for EVI 2’s future development and market positioning.
- The company aims to expand language support, further improve the naturalness of voice outputs, and enhance overall reliability.
- Hume is positioning EVI 2 to work seamlessly with other large language models and integrate with tools like web search, potentially broadening its applicability and market reach.
Implications for AI voice technology: EVI 2’s launch signals a potential shift in the AI voice interaction landscape, with implications for various industries.
- The improved emotional responsiveness and customizability could lead to more natural and engaging AI-human interactions in customer service, healthcare, and entertainment sectors.
- The reduced latency and costs may accelerate the adoption of AI voice technologies in a wider range of applications and businesses.
- As EVI 2 integrates with other language models and tools, it could contribute to the development of more sophisticated and versatile AI assistants.
Who needs GPT-4o Voice Mode? Hume’s EVI 2 is here with emotionally inflected voice AI and API