Audio – Page 3

News/Audio

May 23, 2025

Google’s Veo 3 AI creates stunning videos: 5 wildest examples

Google's new Veo 3 AI video generator has rapidly captivated the internet with its ability to create hyper-realistic videos complete with native audio generation. Released just days ago, the technology represents a significant advancement in AI-generated media, enabling users to create videos with voice-overs, soundscapes, music, and dialogue simply through text prompts. As this technology becomes more accessible, it raises important questions about the future of media creation while simultaneously showcasing the remarkable capabilities of modern AI systems. The big picture: Google's Veo 3 has gone viral for its unprecedented ability to generate realistic videos with accompanying audio that are...

read May 23, 2025

ScotRail riders push back on new AI-generated voice announcer Iona

ScotRail's introduction of an AI-generated voice announcer named Iona has sparked mixed reactions among Scottish train passengers. This deployment represents a significant shift in how public transportation systems are incorporating synthetic voice technology, balancing cost efficiency with the cultural importance of regional accents. The controversy highlights broader tensions around AI replacing traditionally human roles, particularly in contexts where local dialect and pronunciation carry cultural significance. The big picture: ScotRail has implemented a synthetic Scottish-accented voice called Iona to deliver station announcements, replacing pre-recorded human voice artists. The AI-generated voice, developed by ReadSpeaker, reads out typed messages entered by train drivers....

read May 22, 2025

Melania Trump partners with ElevenLabs to launch AI audiobook

Melania Trump has embraced AI technology for her memoir's audiobook release, partnering with ElevenLabs to create an AI-generated version of her voice. This collaboration represents a significant intersection of celebrity publishing and artificial intelligence, potentially setting a precedent for how public figures might leverage voice synthesis technology to expand their reach across multiple languages and platforms. The big picture: Former First Lady Melania Trump has released an AI-narrated audiobook version of her memoir "Melania," using a synthetic replica of her own voice created by AI startup ElevenLabs. The $25 audiobook launched May 22 in English, with versions in Spanish, Portuguese,...

read May 22, 2025

Gaming YouTuber claims AI voice cloning has taken the words right out of his mouth

YouTube gaming commentator Mark Brown is facing an increasingly common problem in the AI era: someone has stolen his voice. A channel called Game Offline Lore published videos using an AI-generated version of Brown's distinctive British voice, creating content he never narrated or authorized. This incident highlights how AI voice cloning is enabling a new form of identity theft that goes beyond traditional content plagiarism to appropriate someone's actual vocal identity. The big picture: A YouTube channel is using an AI clone of gaming commentator Mark Brown's voice without permission, representing a disturbing evolution of digital identity theft. The unauthorized...

read May 22, 2025

AI brings Orson Welles back as virtual tour guide

Orson Welles returns as an AI-generated voice nearly four decades after his death, bringing his iconic vocal presence to location-based storytelling app Storyrabbit. This partnership between podcast company Treefort Media and the Orson Welles Estate represents a growing trend of reviving cultural icons through AI voice technology, blending nostalgia with cutting-edge tech to create new forms of interactive entertainment experiences. How it works: Storyrabbit generates personalized 60-90 second audio stories about specific locations, delivered by the host of your choice, including the newly added Welles voice option. The app uses your current location or a selected point of interest to...

read May 22, 2025

Amazon AI summarizes products into audio snippets for shoppers

Amazon is bringing AI-generated audio product overviews to its shopping experience, allowing customers to listen to key features instead of just reading them. This feature represents Amazon's continued push to enhance shopping with AI technologies, similar to its existing AI assistants and review summaries. The audio highlights aim to recreate the in-store experience of having a salesperson explain products, potentially addressing customer pain points around product research. The big picture: Amazon has begun testing AI-powered audio summaries for select products through a new "Hear the highlights" button that lets customers listen to product features instead of reading them. How it...

read May 22, 2025

AI masters audio-visual links without human guidance

MIT researchers have developed a new approach that helps artificial intelligence learn connections between audio and visual data in the same way humans naturally connect sight and sound. This advancement could enhance applications in journalism and film production through automatic video and audio retrieval, while eventually improving robots' ability to understand real-world environments where visual and auditory information are closely linked. The technique builds on previous work but achieves finer-grained alignment between video frames and corresponding audio without requiring human labels. The big picture: MIT's improved AI system learns to match audio and visual elements in videos with greater precision,...

read May 21, 2025

Google’s new Veo 3 AI video generation is scary good and shockingly impressive

Google's Veo 3 video generation engine represents a significant leap in AI-generated content, blurring the line between synthetic and authentic media. By adding synchronized audio capabilities and enhanced visual fidelity to AI-generated videos, Google has created a tool that produces content nearly indistinguishable from human-created footage. This technological advancement signals a concerning new phase in the evolution of digital misinformation, as these hyperrealistic AI videos could potentially deceive viewers and further complicate an internet landscape already struggling with truth and authenticity. The big picture: Google's newly announced Veo 3 engine combines synchronized audio with enhanced video generation to create remarkably...

read May 20, 2025

AI expands creative horizons with 4 new directions including audio and XR

Artificial intelligence continues to evolve beyond text-based applications into new creative domains, opening fresh possibilities for businesses and creators. A recent panel at an Imagination in Action event revealed how AI researchers and companies are expanding neural network technologies into audio processing, gaming analysis, extended reality, and biomimicry. These emerging directions highlight how AI is becoming increasingly multisensory and adaptive, requiring enterprises to understand both technological capabilities and the changing expectations of AI-native generations. 1. Mining the audio world Audio represents a largely untapped frontier for AI development compared to text-based systems. While text has been the primary focus of...

read May 19, 2025

AI assistant Gemini coming to Samsung and Sony earbuds

Google plans to expand Gemini AI assistant access to various audio devices across its ecosystem, marking a strategic shift away from Google Assistant as the company pushes its newer AI technology to more hardware. This expansion brings enhanced AI capabilities to earbuds and wearables, allowing for more sophisticated interactions than previously possible with Google Assistant. The big picture: Samsung will bring Gemini to its Galaxy Buds 3 and Galaxy Watches in the "coming months," while Sony earbuds will also gain Gemini integration as part of Google's broader expansion to cars, TVs, and Pixel Watches. Key capabilities: Gemini on these audio...

read May 19, 2025

Whisper AI transcribes 10x faster with new Inference Endpoints

Hugging Face has launched a dramatically improved Whisper model deployment option on Inference Endpoints, delivering up to 8x faster performance for audio transcription services. This advancement makes powerful transcription capabilities more accessible and cost-effective, bringing enterprise-grade speech recognition within reach of more organizations through optimized open-source technology. The big picture: Hugging Face's new Whisper deployment leverages the open-source vLLM project to achieve substantial performance gains without sacrificing transcription quality. The solution specifically targets audio transcription efficiency using Whisper Large V3, which demonstrates nearly 8x improvement in real-time factor (RTFx) compared to previous versions. Word Error Rate (WER) evaluations across eight...

read May 19, 2025

NBC adopts AI-generated voice for NBA broadcasts

NBC's decision to resurrect the voice of deceased sportscaster Jim Fagan using AI for its upcoming NBA coverage marks a significant advancement in synthetic media use in mainstream broadcasting. By digitally recreating the voice of a beloved narrator who died in 2017, NBC is blending nostalgia with cutting-edge technology to enhance its basketball programming. This development highlights the growing trend of posthumous digital recreations in entertainment and raises important questions about the future relationship between AI-generated content and traditional media production. The big picture: NBC will use an AI-generated version of Jim Fagan's voice across selected NBA content beginning October...

read May 17, 2025

AI-powered Darth Vader shocks fans with unexpected profanity

Artificial intelligence voice technology faces a new embarrassing mishap as Darth Vader's character in Fortnite temporarily spewed profanity and inappropriate language. The incident highlights ongoing challenges with AI voice implementations in mainstream entertainment, exposing the risks of deploying synthetic voice technology based on iconic performers in interactive environments where user input can trigger unexpected responses. The incident: Epic Games' Fortnite featured an AI-powered Darth Vader that briefly responded to players with profanity and inappropriate language. A viral exchange occurred when Twitch streamer Loserfruit triggered the character to repeat expletives, with Vader responding, "Such vulgarity does not become you, Padme." According...

read May 16, 2025

Evernote adds AI-powered transcription for audio, video, and images

Evernote's new AI Transcribe feature represents a significant value addition to the note-taking platform following its difficult transition under Bending Spoons ownership. The multimodal transcription tool, which quickly and accurately converts audio, video, and image files to text, marks a notable technological advancement for the service that has recently faced criticism for staff reductions and price increases. The big picture: Evernote has launched a powerful AI-based transcription tool that works across multiple media formats with impressive speed and accuracy. The feature is available both within the main Evernote application and as a standalone web tool for easy access. Users can...

read May 15, 2025

AI narration impresses on Audible, but human voices still reign

Audible's introduction of AI-powered narration technology represents a significant shift in audiobook production, creating tension between accessibility and artistic quality in digital storytelling. While this technology promises to transform millions of unrecorded books into accessible audio formats, it raises important questions about the value of human performance, the future of voice acting careers, and where automation should complement rather than replace human artistry in creative industries. The big picture: Audible is rolling out a fully integrated AI production pipeline that can auto-generate entire audiobooks with synthetic voices, targeting publishers looking for faster and cheaper alternatives to human narration. The technology...

read May 13, 2025

What’s your pleasure? Spotify AI DJ now responds to user song requests

Spotify is expanding the functionality of its AI DJ feature by introducing voice command capability, giving Premium subscribers greater control over their music experience. This update transforms the DJ feature from a passive, algorithm-driven playlist generator into an interactive assistant that can respond to specific requests for artists, genres, moods, and even conceptual music experiences. The enhancement addresses a key limitation of the original feature by allowing users to actively shape their listening sessions rather than simply accepting the AI's selections. How it works: Spotify Premium users can now use English voice commands to personalize their AI DJ experience by...

read May 12, 2025

SoundCloud faces backlash over announcement-less AI terms in user agreement

SoundCloud's quiet addition of an AI training clause to its terms of service has sparked user concern, becoming the latest in a series of controversies where tech companies claim broad rights to use creative content for AI development. This incident highlights the growing tension between digital platforms' AI ambitions and creators' rights to control how their work is used, especially as companies increasingly look to leverage user-generated content for advancing their AI capabilities. The big picture: SoundCloud has joined a growing list of tech companies facing backlash after adding ambiguous AI training provisions to their terms of service without clear...

read May 9, 2025

AI headphones clone multiple voices for real-time translation

Yes, please do listen to the voices in your head. Researchers have developed a groundbreaking AI headphone system that can simultaneously translate multiple speakers in real-time, potentially eliminating language barriers in multilingual group conversations. The Spatial Speech Translation system not only converts foreign languages into English text but also preserves each speaker's unique vocal characteristics and emotional tone, creating a more natural translation experience than existing technologies. This innovation could transform international communication by enabling people to confidently express themselves across language divides. How it works: The University of Washington's Spatial Speech Translation system uses AI to track and translate...

read May 8, 2025

Is voice-driven ambient AI our future, as Zuckerberg predicts?

The convergence of voice computing and AI represents a potential paradigm shift in personal computing, with both Meta CEO Mark Zuckerberg and Apple SVP Eddy Cue pointing toward a future where typing may no longer dominate our digital interactions. This alignment between two tech giants that often clash suggests a shared vision of how AI could fundamentally transform how humans interact with technology in the coming decade. The big picture: Mark Zuckerberg highlighted at LlamaCon that voice interaction is currently "under-indexed," with 95% of digital interaction being text-based despite voice likely playing a much larger role in the future. Why...

read May 7, 2025

AI-powered signing avatars bring communication breakthrough for deaf users

Silence Speaks is pioneering AI-powered sign language translation to bridge critical communication gaps for deaf and hard-of-hearing communities. This British startup addresses a global challenge affecting over 70 million sign language users who face isolation in noisy environments like train stations, hospitals, and classrooms where background noise and crosstalk make understanding speech nearly impossible. By developing an avatar that accurately translates text to British Sign Language (BSL) with emotional nuance, the company aims to transform accessibility for the 150,000+ BSL users in the UK. The big picture: Silence Speaks has created an AI-powered avatar that translates text into British Sign...

read May 7, 2025

AI revives Jim Fagan’s voice for NBC’s NBA comeback

NBC Sports is reviving a voice from its basketball broadcasting past through AI technology, setting the stage for its NBA comeback after more than two decades. The network announced it will use an AI-generated voice of the late Jim Fagan, who died in 2017, for its NBA coverage beginning in October 2025 when NBC's new 11-year media rights deal takes effect. This nostalgic move aims to recreate the iconic sound that defined NBC's NBA coverage during the Jordan-era 1990s, while raising questions about how networks will blend tradition with technology in sports broadcasting. The big picture: NBC Sports is employing...

read May 6, 2025

Suno 4.5 AI music creator launches with major upgrades

Suno's latest AI music generation tool introduces significant vocal enhancements, expanded genre options, and a new prompt helper that elevates its creative capabilities. The release of version 4.5 marks a substantial upgrade for subscribers, though some limitations remain regarding song structure control. This upgrade represents the continuing evolution of AI music creation tools as they become increasingly sophisticated at translating human descriptions into complex musical compositions. The big picture: Suno has released version 4.5 of its AI music creation platform exclusively for Pro and Premium subscribers, offering substantial improvements to music generation capabilities. The update features enhanced vocals, expanded genre...

read May 5, 2025

RealtimeVoiceChat enables natural AI conversations on GitHub

Real-time voice chat technology is advancing rapidly, enabling natural-sounding AI conversations with minimal latency. This open-source project demonstrates how sophisticated speech recognition, large language models, and text-to-speech systems can be integrated to create fluid, interruptible voice interactions that mimic human conversation patterns, showcasing the potential for more intuitive human-AI interfaces. Key features of this real-time AI voice chat system 1. End-to-end voice conversation architectureThe system creates a complete voice interaction loop by capturing user speech through the browser, processing it server-side, and returning AI-generated speech. This architecture prioritizes low latency and natural conversational flow above all else. 2. Real-time processing...

read May 5, 2025

Motorola app disrupts Spotify’s personalized music recommendations

Motorola's new Playlist Studio introduces AI music curation that provides a refreshing alternative to the algorithmic limitations of major streaming services like Spotify. This innovative feature generates customized playlists based on creative prompts, potentially addressing a common frustration among streaming users: algorithmic recommendations that become too predictable and repetitive over time. However, the feature's exclusivity to Amazon Music presents a significant limitation for users invested in other streaming platforms. The big picture: Motorola has entered the AI race with its Razr 2025 series, offering features beyond the standard AI toolkit that most manufacturers implement. While the company includes expected AI...

read