Voice - CO/AI

News/Voice

Oct 30, 2024

OpenAI enhances Realtime API with new voices and lower prices

OpenAI expands Realtime API capabilities: OpenAI has introduced significant updates to its Realtime API, enhancing its speech-to-speech functionality and reducing costs for developers. The update includes five new voices for speech-to-speech applications, with names like Ash, Verse, and Ballad, the latter having a British accent. These new voices are described as more expressive and steerable compared to previous versions. The native speech-to-speech feature boasts low latency and nuanced output, potentially revolutionizing the development of voice assistants. Cost reduction through prompt caching: OpenAI has implemented a pricing strategy that significantly lowers the cost of using the Realtime API. Cached text inputs...

read Oct 30, 2024

How DeepMind is pushing the frontiers of AI audio generation

Advancing speech generation technology: Google researchers have made significant strides in developing more natural and dynamic audio generation models, paving the way for enhanced digital experiences and AI-powered tools. The team has created models capable of generating high-quality, natural speech from various inputs, including text, tempo controls, and specific voices. This technology is already being implemented in several Google products and experiments, such as Gemini Live, Project Astra, Journey Voices, and YouTube's auto dubbing feature. Recent advancements have enabled the generation of long-form, multi-speaker dialogue, making complex content more accessible. Key innovations in audio generation: SoundStorm: This research demonstrated the...

read Oct 30, 2024

Actors union strikes AI voice deal with startup Ethovox

Groundbreaking AI voice model agreement: SAG-AFTRA, the union representing actors and performers, has reached a significant deal with Ethovox, a company developing an advanced AI voice model. The agreement focuses on fair compensation for voice performers whose work contributes to training Ethovox's AI model, which could be used to create "digital replicas" of voices. This deal marks a notable step in addressing concerns about AI's impact on the entertainment industry and performers' rights. Key provisions of the agreement: The contract between SAG-AFTRA and Ethovox sets important precedents for performer compensation and protection in the age of AI voice technology. Voice...

read Oct 29, 2024

Hedra launches new voice cloning feature — here’s how to use it

AI-powered character creation evolves: Hedra, an AI character maker, has introduced a new voice cloning feature, allowing users to create digital avatars with personalized voices, marking a significant advancement in AI-driven content creation. Hedra's voice cloning technology enables users to replicate their voice using a short audio clip, which can then be applied to any character created or uploaded on the platform. The feature is exclusively available to paid subscribers, with plans starting at $10 per month. Hedra has gained popularity for its user-friendly interface and accurate mouth movement synchronization in character animations. How the voice cloning process works: Users...

read Oct 28, 2024

AI voice cloning scam targets police chief, alarming authorities

AI-Powered Police Impersonation Scams on the Rise: Law enforcement agencies across the globe are warning citizens about a new wave of sophisticated scams using artificial intelligence to clone the voices of police officers and government officials. The Salt Lake City incident: A recent scam in Salt Lake City highlights the growing sophistication of these AI-powered deceptions. The Salt Lake City Police Department (SLCPD) alerted the public to an email scam that used AI to clone the voice of Police Chief Mike Brown. Scammers created a video combining real footage from a TV interview with AI-generated audio, claiming the recipient owed...

read Oct 28, 2024

Google develops AI-powered chat feature to enhance Search

Google's AI-powered voice search: A new frontier in mobile search technology: Google is developing an innovative AI-powered voice search feature for its Android app, aiming to enhance user experience through conversational query refinement. Key features and functionality: The new AI search tool is designed to use follow-up questions to refine user queries, potentially revolutionizing how people interact with mobile search. Discovered through an APK teardown of the Google app beta version 15.43.36.28, the feature is still in development and not fully implemented. Code strings suggest an AI mode with close and enter buttons, a prompt for follow-up questions, a listening...

read Oct 23, 2024

ElevenLabs now allows user to create custom voices from text prompts

AI-Generated Custom Voices: A New Frontier in Audio Creation: ElevenLabs has introduced a new feature that allows users to create custom voices from text prompts, potentially revolutionizing the way we approach voice production in various industries. The technology behind the innovation: ElevenLabs' Voice Design engine leverages advanced AI to generate synthetic voices based on detailed text descriptions, offering a new level of customization and flexibility in voice creation. The system can produce a unique voice in seconds, responding to specific prompts that describe desired vocal characteristics, accents, and tone. This technology builds upon ElevenLabs' existing capabilities in synthetic voice modeling...

read Oct 23, 2024

Otter.ai expands transcription to 2 new languages, with more on the way

Otter.ai expands transcription services: Otter.ai, a leading AI-powered transcription service, has announced the addition of Spanish and French language support to its platform, broadening its appeal to a global audience. The expansion allows users to transcribe audio in real-time for both in-person and virtual meetings on popular platforms such as Zoom, Microsoft Teams, and Google Meet. Users can now interact with Otter's AI Chat feature in their preferred language, asking follow-up questions about the transcriptions. Switching between languages is straightforward, requiring only a simple change in the settings menu. Enhanced accuracy and industry leadership: Otter.ai claims its transcription quality surpasses...

read Oct 19, 2024

Meta just released Spirit LM, an open-source multimodal AI model

Introducing Meta Spirit LM: Meta has unveiled a groundbreaking open-source multimodal language model that seamlessly integrates text and speech inputs and outputs, challenging competitors like OpenAI's GPT-4o and Hume's EVI 2. Developed by Meta's Fundamental AI Research (FAIR) team, Spirit LM aims to address limitations in existing AI voice experiences by offering more expressive and natural-sounding speech generation. The model is capable of learning tasks across modalities, including automatic speech recognition (ASR), text-to-speech (TTS), and speech classification. Currently, Spirit LM is only available for non-commercial usage under Meta's FAIR Noncommercial Research License. Advanced approach to text and speech processing: Spirit...

read Oct 15, 2024

Are AI and talking cars the future of driving? New Purdue study suggests yes

AI-powered autonomous vehicles: A transformative shift in transportation: Recent research demonstrates the potential of conversational AI to guide autonomous vehicles, marking a significant advancement in the field of intelligent transportation systems. Researchers from Purdue University presented a study at the 27th IEEE International Conference on Intelligent Transportation Systems, showcasing a conversational AI system called Talk2Drive that can interpret human commands to guide autonomous vehicles. This groundbreaking field experiment is the first of its kind to deploy large language models (LLMs) on a real-world autonomous vehicle, bridging the gap between AI technology and practical transportation applications. Historical context and industry growth:...

read Oct 14, 2024

AI voice cloning is raising hairy legal and ethical questions for businesses

AI-generated influencer endorsements emerge: A gadget manufacturer has utilized artificial intelligence to create a voice resembling that of popular tech reviewer Marques Brownlee in an Instagram promotion, raising concerns about the ethical implications and potential misuse of AI-generated content in advertising. The AI-generated voice, while not perfect, was convincing enough to potentially mislead viewers into believing it was a genuine endorsement from Brownlee. This incident highlights the growing capability of AI to mimic human voices and its potential use in creating fake celebrity endorsements. The company behind the advertisement has not yet responded to inquiries about the use of the...

read Oct 11, 2024

Meta’s new AI Voice is pretty, pretty good

Meta's AI Voice Assistant Debuts: Meta has launched a new voice mode for its Meta AI chatbot, integrating it into popular platforms like WhatsApp, Instagram, Facebook, and Ray-Ban smart glasses, with varying availability across regions. The voice assistant employs a unique approach, first converting speech to text, then generating a text response, and finally reading it aloud. Users can choose from a variety of voice options, including celebrity voices like Judi Dench and Awkwafina, as well as non-celebrity options. To access the voice mode, users initiate a chat with the AI and activate it using a waveform icon in the...

read Oct 7, 2024

New Caribou album to feature extensive use of AI-generated vocals

Caribou's new AI-infused album is a mixed bag of innovation and controversy: Caribou's latest release "Honey" showcases the potential and pitfalls of incorporating artificial intelligence into music production, sparking discussions about creative boundaries and ethical considerations in the industry. The album extensively utilizes AI-generated vocal effects, marking a significant shift in Caribou's artistic approach and reflecting the growing influence of AI technology in music production. Many of the AI-enhanced vocals on "Honey" are praised for their effectiveness, demonstrating how this technology can be used to expand an artist's creative palette and breathe new life into their sound. The implementation of...

read Oct 7, 2024

How ChatGPT Voice is making the world more accessible

AI-powered voice interaction revolutionizes accessibility: ChatGPT's new Advanced Voice feature is transforming communication and education by breaking down language barriers and offering personalized, interactive learning experiences. Key features of Advanced Voice mode: Enables natural, flowing audio conversations with the AI chatbot using GPT-4's audio capabilities Available to ChatGPT Plus and Team subscribers via the mobile app Understands context, responds to emotions, and maintains coherent dialogues even when interrupted Supports 50 languages, making it a powerful tool for language learning and global communication Remembers past conversations for more personalized and contextual interactions over time Development and launch challenges: The launch was...

read Oct 3, 2024

OpenAI’s new API lets you build real-time voice apps — at a substantial premium

OpenAI expands developer offerings with real-time voice API: The company's annual developer day introduced several new features, with the centerpiece being a real-time application programming interface (API) for voice interactions, albeit at a premium price point. Real-time voice capabilities and pricing structure: OpenAI's new API enables developers to create applications with fluid, real-time conversations between users and language models. The real-time API is based on the GPT-4o large language model, which costs $2.50 per million input tokens and $10 per million output tokens for text-only interactions. For real-time voice applications, the pricing is at least double, with input and output...

read Oct 3, 2024

Priceline launches AI voice assistant powered by OpenAI

AI-Powered Travel Planning: Priceline Introduces Penny Voice: Priceline has launched an innovative AI voice assistant called Penny Voice, leveraging OpenAI's technology to revolutionize the travel booking experience. Penny Voice: The Next-Generation Travel Agent: Penny Voice is designed to function as a virtual travel agent, offering natural, conversational interactions to help users book trips efficiently. The AI assistant utilizes OpenAI's GPT-4 and Advanced Voice Mode technology to understand complex requests and anticipate user needs. Penny Voice can remember previous interactions and preferences, allowing for personalized responses in future conversations. Initially focused on hotel reservations, Priceline plans to expand Penny's capabilities to...

read Oct 3, 2024

ChatGPT Advanced Voice debuts for free with a twist

OpenAI's Advanced Voice: A game-changing feature for ChatGPT: OpenAI is offering free ChatGPT users a limited preview of its impressive Advanced Voice mode, showcasing a significant leap in AI-powered conversation technology. What sets Advanced Voice apart: The key differentiator of Advanced Voice lies in its native speech-to-speech capability, offering a more natural and nuanced interaction compared to traditional text-based AI systems. Advanced Voice directly processes spoken language, understanding subtle cues like tone and emotion, rather than converting speech to text and back again. This approach allows for a more fluid and human-like conversation experience. The technology represents a significant advancement...

read Sep 30, 2024

Meta’s new AI voice feature will be available ‘in the coming days’

Meta AI Introduces Celebrity Voices: Meta AI is set to launch a new feature that allows users to interact with their AI assistant using the voices of popular celebrities, bringing a personalized touch to AI interactions across Meta's platforms. Meta's Generative AI lead, Ahmad Al-Dahle, shared a behind-the-scenes video on X (formerly Twitter) showcasing the recording process with celebrities. The 48-second clip features Keegan-Michael Key, Kristen Bell, John Cena, Awkwafina, and Dame Judi Dench recording lines for the AI assistant. Dame Judi Dench appears to be taking the recording process most seriously among the featured celebrities. Rollout and Availability: The...

read Sep 30, 2024

Wispr secures $12M to make voice dictation to text even better

Voice-to-text revolution: Wispr, a technology company focused on natural voice interaction, has secured $12 million in funding to launch Wispr Flow, an advanced voice-first productivity tool for Apple Macs. Wispr Flow allows users to speak naturally into a microphone and see their words appear on screen almost instantly, potentially tripling text input speed compared to traditional typing. The software can be activated by pressing the Fn key twice, making it easily accessible across various applications like Microsoft Word, Apple Mail, and even ChatGPT. Wispr Flow's beta version has shown impressive capabilities in filtering out verbal tics such as "um" and...

read Sep 30, 2024

ChatGPT’s 5 new voices outshine rival AI in hands-on test

ChatGPT's voice upgrade: A leap in AI communication: OpenAI has significantly enhanced ChatGPT's Advanced Voice feature, introducing five new voices that bring the total to nine distinct options, marking a notable advancement in AI-powered vocal interactions. The new voices are described as more natural and realistic than previous AI voices, incorporating human-like inflections and breathing sounds to create a more authentic conversational experience. OpenAI's latest voice offerings are positioned as superior to competitors like Meta AI Voice and Google's Gemini Live, potentially setting a new standard in the field of AI-generated speech. Meet the new voices: Each of the five...

read Sep 26, 2024

New Meta AI tools will translate audio reels in real time

AI-powered translation revolutionizes social media content consumption: Meta has unveiled a groundbreaking AI translation tool that promises to transform how users interact with Facebook and Instagram Reels across language barriers. Key features of the new AI translation tool: Automatically translates audio in Reels to the viewer's preferred language Utilizes advanced dubbing and lip-syncing technology to simulate the speaker's voice in the target language Synchronizes lip movements to match the translated audio, creating a more natural viewing experience Current testing and future expansion: Meta is conducting small-scale tests on its platforms, focusing on English and Spanish translations Initial tests involve creators...

read Sep 26, 2024

How MetaAI Voice compares to other AI assistants

The dawn of conversational AI assistants: Meta has entered the race to create the ultimate voice assistant with MetaAI Voice, joining competitors like OpenAI's ChatGPT Voice and Google's Gemini Voice in the rapidly evolving landscape of conversational AI technology. MetaAI Voice is designed to provide natural language interaction for Meta's products, including Ray-Ban smart glasses and Quest VR headsets, where traditional input methods like keyboards or touch screens are impractical. The new voice assistant can handle complex and vague queries, demonstrating its ability to understand context and provide relevant responses. Unlike its competitors, MetaAI Voice offers celebrity voices, including Dame...

read Sep 26, 2024

Kristen Bell told Meta to ‘get rid of AI’ — now she’s its voice

Meta's AI voice initiative faces unexpected twist: Kristen Bell, previously critical of AI data use, has become one of the official voices for Meta's AI chatbot, highlighting the complex relationship between celebrities and AI technology. • Meta has secured deals with high-profile actors, including Kristen Bell, to provide voices for its Meta AI chatbot.• Bell's participation comes as a surprise, given her previous public opposition to Meta's AI using her data.• In June, Bell reposted a popular Instagram message refusing consent for Meta to use her content and likeness for AI training. Celebrity concerns and misconceptions: The reposting of AI...

read Sep 25, 2024

Meta AI Voice challenges ChatGPT with celebrity-voiced assistant

Meta's AI voice assistant breakthrough: Meta has announced a new voice-enabled AI assistant featuring celebrity voices, positioning itself as a strong competitor to ChatGPT and Google's Gemini Live in the generative AI space. The new Meta AI voice feature will be available across Facebook, Instagram, and WhatsApp, rolling out this week in the U.S. and other English-speaking markets. Users can choose from a range of celebrity voices, including Dame Judi Dench, John Cena, Kristen Bell, Awkwafina, and Keegan-Michael Key, or opt for more generic voices. Meta has signed formal agreements with the actors to avoid legal issues and ensure an...

read