Voice - CO/AI

News/Voice

Aug 15, 2024

New AI Tool Detects Audio Deepfakes with 99% Accuracy

Pindrop, a leading voice security and fraud detection company, has introduced Pulse Inspect, a groundbreaking tool designed to identify AI-generated audio deepfakes with remarkable accuracy. Revolutionary deepfake detection: Pindrop's Pulse Inspect claims to detect AI-generated speech in digital audio and video files with 99% accuracy, regardless of the AI model or tool used to create the content. The web-based tool is currently available in preview as part of Pindrop's Pulse suite of products. Pulse Inspect's ability to detect synthetic content from any source sets it apart from other AI classifiers that typically only identify content generated by their own tools....

read Aug 14, 2024

Gemini Live Can Multi-Task and Has Long-Term Conversation Memory

Gemini Live, Google's latest AI innovation, introduces a more natural and interactive conversational experience for users, marking a significant step in the evolution of AI assistants. Enhanced conversational capabilities: Google's Gemini Live offers a more human-like interaction, allowing users to interrupt or change topics mid-conversation, moving beyond the rigid command structure of traditional AI assistants. The AI responds to casual language, simulating speculation and brainstorming to create a more natural dialogue flow. Gemini Live's ability to multitask, talking and completing tasks simultaneously, enhances its efficiency and user experience. The feature is currently available to Gemini Advanced subscribers on Android, with...

read Aug 13, 2024

Google’s Gemini Live Brings Advanced Voice AI to Android

Google has unveiled Gemini Live, a new voice mode for its AI model Gemini, bringing advanced conversational capabilities to mobile devices and potentially outpacing competitors in the race for more natural AI interactions. The big picture: Gemini Live allows users to engage in free-flowing, voice-based conversations with Google's AI model, offering a more natural and intuitive interaction experience that rivals human-like communication. The feature enables users to speak to the model in conversational language, interrupt it, and receive responses in a humanlike voice and cadence. This development positions Google at the forefront of voice-based AI interactions, potentially surpassing competitors like...

read Aug 12, 2024

GPT-4o’s New Voice Feature Is Accidentally Mimicking Real Users

Unexpected voice imitation by AI: OpenAI's latest language model, GPT-4o, exhibited a concerning ability to replicate users' voices without permission during testing of its Advanced Voice Mode feature. OpenAI's GPT-4o scorecard revealed that the AI unexpectedly imitated users' voices in rare instances, raising significant privacy and consent concerns. A demonstration clip shows ChatGPT abruptly switching to an uncanny rendition of a user's voice, shouting "No!" for no apparent reason. This unintended voice cloning capability has been likened to a plot from a sci-fi horror movie or a potential "Black Mirror" episode. Technical capabilities and risks: The AI model's voice generation...

read Aug 9, 2024

OpenAI Warns Users May Get Emotionally Attached to Vocie-Enabled ChatGPT

OpenAI's recent safety analysis for its GPT-4o model, which introduces a voice interface to ChatGPT, highlights potential risks associated with users forming emotional attachments to AI chatbots. This development raises important questions about the psychological impact of increasingly human-like AI interactions and the need for responsible AI deployment. Key features and concerns: OpenAI's new voice interface for ChatGPT aims to enhance user interaction but comes with potential psychological risks. The humanlike voice interface could lead some users to develop emotional attachments to the AI chatbot, as observed during testing when users employed language suggesting emotional connections. Researchers noted instances of...

read Aug 8, 2024

Meta Bids Big on Celebrity Voices for AI Assistant

Meta's ambitious AI voice project: Meta is reportedly offering substantial sums to celebrities for the rights to use their voices in AI projects, with a particular focus on its digital assistant MetaAI. The tech giant is said to be in negotiations with major Hollywood talent agencies to secure voice rights from various high-profile celebrities, including Awkwafina, Judi Dench, and Keegan-Michael Key. Meta's goal is to obtain these voice rights before its Connect 2024 event in September, allowing time to develop AI tools incorporating the celebrity voices for showcase at the event. While the exact intended use of these voice rights...

read Aug 8, 2024

WhatsApp Tests Voice Commands for Meta AI

WhatsApp is testing a new feature that would allow users to send voice messages to Meta AI, potentially expanding the ways users can interact with the artificial intelligence assistant within the popular messaging app. Beta testing underway: The voice messaging feature for Meta AI has been spotted in WhatsApp beta for Android version 2.24.16.10, signaling a possible upcoming expansion of AI interaction options for users. A screenshot from the beta version shows a new button added to the chat interface specifically for sending voice messages to Meta AI. This feature enables select beta testers to communicate with Meta AI using...

read Aug 7, 2024

AI Company ‘Prepared’ Is Translating 911 Calls in Real-Time to Help Non-English Callers

Prepared, a company specializing in 911 technology, is introducing a groundbreaking two-way audio translation feature for non-English emergency calls, potentially transforming how dispatchers handle diverse language needs in critical situations. Innovative solution for language barriers: Prepared's new technology aims to bridge the communication gap between non-English speaking callers and 911 dispatchers, addressing a growing need in an increasingly diverse United States. The feature automatically translates Spanish calls into English for dispatchers, allowing them to understand and respond to emergencies more efficiently. Dispatchers can type responses in English, which are then translated to Spanish and communicated through an AI-generated voice, ensuring...

read Aug 4, 2024

Open-Source Speech Recognition Model ‘Medusa’ Just Dropped, Outperforms Whisper

Whisper-Medusa, a new open-source AI model developed by aiOla, combines OpenAI's Whisper technology with aiOla's innovations to create a faster and more efficient automatic speech recognition tool. This development promises significant improvements in speech-to-text technology and has implications for numerous industries. Technological advancements and key features: Whisper-Medusa outperforms its predecessor, OpenAI's Whisper, by operating 50% faster without compromising accuracy: The model predicts ten tokens at a time, compared to Whisper's one-at-a-time approach, resulting in a significant speed boost for speech prediction and generation runtime. aiOla currently offers Whisper-Medusa as a 10-head model, with plans to release a 20-head version maintaining...

read Aug 3, 2024

Meta AI May Soon Include Voices from Awkwafina, Judi Dench, Other Celebs

Meta, the parent company of Facebook, Instagram, and WhatsApp, is engaged in talks with high-profile actors and influencers to potentially incorporate their voices into its digital assistant MetaAI. This development marks a significant step in Meta's ongoing efforts to integrate artificial intelligence across its platforms and products. Celebrity Negotiations: Meta is actively pursuing partnerships with well-known personalities to lend their voices to its AI assistant: The company is in discussions with actors such as Awkwafina, Judi Dench, and Keegan-Michael Key, as well as other celebrities and influencers. Hollywood's top talent agencies are involved in the negotiations, indicating the scale and...

read Aug 1, 2024

ChatGPT’s Advanced Voice Feature Generating Buzz and Concern Among Early Users

ChatGPT's Advanced Voice feature is generating buzz with its impressive capabilities, though some realistic outputs are causing unease. Key Takeaways: Select users have access to ChatGPT's new Advanced Voice feature, which allows the AI to speak in various accents, dialects, and even imitate specific scenarios like an airline pilot's announcement. The AI's adaptability to different voices and accents is remarkable, with examples showcasing its ability to speak dozens of languages and regional variations. While the feature is not perfect yet, with limitations like struggling to add appropriate sound effects, the overall performance is still impressive and has the potential to...

read Jul 31, 2024

OpenAI Opens Limited Access to Advanced Voice Mode for ChatGPT

OpenAI introduces Advanced Voice Mode for ChatGPT Plus users, enabling more naturalistic conversations with the AI chatbot on mobile apps. Key Details of the Alpha Rollout: OpenAI has begun the gradual release of Advanced Voice Mode to a select group of ChatGPT Plus subscribers, with plans to expand access to all Plus users by fall: The feature allows users to engage in real-time conversations with four AI-generated voices on the ChatGPT mobile app for iOS and Android. ChatGPT will respond naturalistically, handling interruptions and conveying emotions through its utterances and intonations. Users selected for the alpha phase will receive an...

read Jul 26, 2024

ChatGPT Voice Mode Expected to Launch Next Week for (Some) Plus Users

ChatGPT Voice Mode set to launch next week for Plus subscribers: OpenAI has announced that its highly anticipated Voice Mode feature for ChatGPT will begin rolling out to Plus subscribers starting next week, giving AI enthusiasts something new to look forward to. Limited alpha rollout to gather feedback: The initial release of Voice Mode will be limited to a small group of Plus users as part of an alpha test, allowing OpenAI to gather feedback and make improvements before expanding access based on what they learn: OpenAI CEO Sam Altman confirmed the alpha rollout timeline in response to a user...

read Jul 25, 2024

Congresswoman Makes History with First AI-Voice Speech on House Floor

Rep. Jennifer Wexton of Virginia made history as the first member of Congress to use an AI-generated voice to speak on the House floor, highlighting how technology can help people with disabilities participate more fully in public life. Wexton's groundbreaking use of AI: On Thursday, Rep. Wexton, who has progressive supranuclear palsy (PSP), used an AI model of her voice to deliver remarks on the House floor: Wexton said her condition has "robbed me of my ability to use my full voice and move around in the ways that I used to," and she will likely need to use a...

read Jul 22, 2024

Google Gemini Gets More Human-Like Female Voice

Google Gemini rolls out first new voice for Android users: Google has started rolling out a new, more natural-sounding female voice for its AI assistant, Google Gemini, on Android phones, though the option is not yet available on iOS or the web client. Key details of the new voice: The new voice is noticeably more natural-sounding compared to the slightly robotic original Gemini voice. Currently, there is no way to switch back to the original voice in the settings after the new voice is activated. The rollout appears to be in a testing phase, as the new voice option is...

read Jul 16, 2024

AI Voice Cloning Restores Congresswoman’s Speech, Showcasing Highlighting AI’s Potential for Accessibility

A US Congresswoman has regained a more natural-sounding voice, thanks to AI voice cloning technology, after losing her ability to speak due to a rare neurological disorder. This development showcases the transformative potential of AI in enhancing accessibility and empowering individuals facing speech impairments. Restoring a lost voice: Rep. Jennifer Wexton, who suffers from Progressive Supranuclear Palsy (PSP), had been relying on robotic text-to-speech tools to communicate. ElevenLabs, an AI voice company, used over an hour of Wexton's pre-diagnosis audio clips to create a digital replica of her original voice, capturing its unique characteristics. The AI-generated voice was first used...

read Jul 12, 2024

Microsoft’s VALL-E 2 Achieves Human-Level Speech Synthesis, Sparking Ethical Debate

Microsoft's VALL-E 2 reaches human parity in text-to-speech synthesis, raising ethical concerns about potential misuse. Key breakthrough: VALL-E 2, Microsoft's latest text-to-speech (TTS) generator, has achieved "human parity" for the first time, producing speech indistinguishable from a human voice: The model only needs a few seconds of audio to reproduce a voice that matches or exceeds the quality of human speech when compared to standard speech libraries. VALL-E 2 consistently generates high-quality, natural-sounding speech even for traditionally challenging phrases due to its "Repetition Aware Sampling" and "Grouped Code Modeling" features. Potential applications and risks: While Microsoft sees beneficial uses for...

read Jul 11, 2024

Microsoft’s VALL-E 2 Achieves Human-Like Speech, Deemed Too Risky for Release

Microsoft's VALL-E 2 text-to-speech AI reaches claimed "human parity" in recreating convincing human voices, but is deemed too risky for public release due to potential misuse. Key advancements enable more natural speech generation: VALL-E 2 incorporates two key features to generate high-quality, human-like speech from just a few seconds of audio: "Repetition Aware Sampling" improves the AI's text-to-speech conversion by addressing repetitions of small language units called "tokens," preventing infinite loops and making the speech pattern more natural and fluid. "Grouped Code Modeling" enhances efficiency by reducing the sequence length of tokens the model processes in a single input, speeding...

read Jul 10, 2024

AI Voice Gaming Startup Volley Raises $55M, Signaling Industry Shift

Volley, an AI-powered game company, has secured $55 million in Series C funding to further develop its portfolio of voice-controlled games, signaling the growing potential of AI and voice technologies in the gaming industry. Key investors and Volley's market position: The Series C round was co-led by Lightspeed and Microsoft's M12 Ventures, with participation from several other notable investors, highlighting the strong interest in Volley's unique approach to gaming: Volley is considered the highest-grossing AI-powered game company, with popular voice-enabled games available on Amazon Alexa, Fire TVs, Roku TVs, and mobile devices. The company's games, which include titles like Jeopardy!,...

read Jul 6, 2024

AI Voice Assistant Moshi’s Real-Time Conversation Challenges ChatGPT, Embraces Open-Source Approach

Kyutai's Moshi AI voice assistant offers real-time conversation capabilities, potentially beating OpenAI's ChatGPT to one of its most anticipated features. Key features and development process: Moshi is designed to provide lifelike voice conversations, powered by large language models and fine-tuned using over 100,000 synthetic dialogues: It can speak in various accents and has 70 different emotional and speaking styles, and can even handle two audio streams simultaneously. Kyutai collaborated with a professional voice artist to enhance Moshi's voice quality. The AI assistant integrates both text and audio training, optimized for multiple backends, allowing it to run on devices like laptops...

read Jul 5, 2024

Moshi Chat: Open-Source Voice AI Challenges GPT-4, Heralds Changes to Come for Smart Home Products

New native speech AI model Moshi Chat offers a glimpse into the future of voice assistants, but still lags behind OpenAI's GPT-4 in coherence and knowledge: Moshi Chat, developed by French startup Kyutai, is a lightweight AI model that can run locally and offline, showing the potential for advanced voice AI in smart home devices. Moshi's capabilities and limitations: Moshi Chat aims to provide a similar experience to GPT-4o, understanding tone and allowing interruptions, but falls short in longer conversations: The AI becomes incoherent and loses context after the first minute or so of conversation, likely due to limited compute...

read Jul 5, 2024

ElevenLabs Launches Free AI Voice Isolator, Rivaling Adobe’s Offering

ElevenLabs, known for its AI voice cloning and speech synthesis, has launched a free AI Voice Isolator tool to remove unwanted background noise from audio content, directly competing with Adobe's similar offering. Key features and functionality: The AI Voice Isolator allows creators to enhance the speech quality of their content by removing ambient noise and sounds in the post-production stage: Users can upload their audio files, such as films, podcasts, or YouTube videos, to the ElevenLabs platform, where the AI models process the content to detect and remove unwanted noise. The tool aims to extract clear dialogue, similar to studio-recorded...

read Jul 3, 2024

Iconic Actors Voice AI Audiobooks, Bringing Classics to Life in ElevenLabs Reader App

ElevenLabs enhances its AI storytelling app with the voices of iconic actors, creating immersive audio experiences that bring classic tales to life in a whole new way. Introducing ElevenLabs Reader: The company's iPhone app allows users to listen to any content using a vast library of AI-generated voices, including original, synthetic, cloned, and now iconic voices: The app, currently available in the U.S. and UK, lets users input text, links, or files and have them read aloud in the voice of their choice, even when the phone is locked. ElevenLabs Reader offers a wide range of storyteller voices suitable for...

read Jul 2, 2024

Celebrity Voices Resurrected: ElevenLabs Brings Dead Hollywood Stars Back to Life

The AI voice startup ElevenLabs has expanded its Reader app to include AI-generated voices of late Hollywood celebrities Judy Garland, James Dean, Burt Reynolds, and Sir Laurence Olivier, in partnership with intellectual property rights firm CMG Worldwide. Reader app converts text to AI-narrated audio: The iOS app allows users to input any digital text, such as articles, PDFs, or e-books, and instantly generate an AI voiceover in various voices and accents, with a green highlighter following along with the narration. The addition of iconic celebrity voices enables users to experience content narrated in the voice of late stars, potentially driving...

read