Audio – Page 2

News/Audio

Aug 1, 2025

Fal builds generative media cloud to solve AI’s speed problem

Fal has built a generative media cloud platform optimized for speed and performance in AI inference, particularly for image, video, and audio models. The company's approach emerged from necessity during the AI infrastructure crunch, when the founding team had to engineer creative solutions around limited GPU capacity to deliver fast, reliable AI model inference. What you should know: Fal positions itself as more than just an inference platform, branding their service as a "generative media cloud" that prioritizes user experience alongside technical performance. The platform specializes in image, video, and audio model inference, addressing the common frustration users face with...

read Jul 29, 2025

Google’s NotebookLM adds video overviews with AI-generated slideshows

Google's NotebookLM has launched Video Overviews, a new AI feature that creates narrated slideshows from users' saved documents. The addition gives users a visual alternative to the platform's existing Audio Overviews, helping make complex information more accessible through AI-generated visuals combined with content from uploaded materials. What you should know: Video Overviews transforms documents into visual presentations with AI-generated narration and graphics. The AI host creates new visuals to illustrate key points while incorporating images, diagrams, quotes, and numbers directly from user documents. Google describes the feature as "uniquely effective for explaining data, demonstrating processes and making abstract concepts more...

read Jul 24, 2025

Sora vs. Veo 3, Fight! OpenAI readies Sora 2 with audio to challenge Google

OpenAI is reportedly preparing to release Sora 2, the next iteration of its text-to-video AI model, setting up a direct confrontation with Google's Veo 3 model. This battle represents more than just visual quality—it's about creating complete audiovisual experiences that could define the future of AI-generated content creation. The big picture: While OpenAI's original Sora impressed with high-quality visuals, it produced silent videos, whereas Google's Veo 3 already integrates synchronized audio, speech, and environmental sounds. Veo 3 can generate videos where viewers hear "the gentle splash of liquid, the clink of ceramic, and even the hum of a diner around...

read Jul 22, 2025

Amazon acquires Bee’s $49.99 AI wearable for conversation tracking

Amazon is acquiring Bee, a startup that makes a $49.99 AI-powered wearable device that continuously listens to and transcribes conversations to generate daily summaries and insights. The acquisition signals Amazon's push into personal AI wearables, though the device has faced accuracy challenges in distinguishing real conversations from background media. What you should know: Bee's wearable functions like a Fitbit but focuses on audio processing rather than fitness tracking. The device transcribes conversations between users and people around them, creating personalized daily summaries, reminders, and suggestions through the Bee app. Users can grant the device access to emails, contacts, location data,...

read Jul 21, 2025

AI sonar “AquaEye” helps Texas dive teams find flood victims 80% faster

Dive teams in Central Texas are using AI-powered sonar devices called AquaEye to search for victims in flood recovery operations following deadly floods that killed at least 135 people and left over 100 missing. The handheld devices use artificial intelligence to distinguish between debris and human bodies underwater, with search teams reporting an 80-90 percent reduction in search times compared to traditional methods. Why this matters: Recovery efforts face severe challenges from poor visibility, debris-filled rivers, and ongoing flash flooding along the Guadalupe River, making traditional search methods nearly impossible in conditions with only 6-inch visibility. How it works: The...

read Jul 17, 2025

Adobe launches AI filmmaking tools with voice-powered sound generation

Adobe has launched new generative AI filmmaking tools that allow users to create sound effects using voice recordings of silly noises and control video generation with reference footage. The tools, available in beta on the Firefly app, provide filmmakers with more intuitive ways to generate custom audio and video content compared to traditional text-only prompts. How the sound effects tool works: The Generate Sound Effects feature lets users record onomatopoeia-like sounds that are synchronized with video footage to create realistic audio effects. Users can play a video of a horse walking and simultaneously record "clip clop" noises in time with...

read Jul 16, 2025

Watch out secretaries, Google’s AI assistant now makes business calls for users nationwide

Google has launched a new AI-powered feature that allows users across the United States to have artificial intelligence make phone calls to local businesses on their behalf. The service, now available through Google Search, enables users to request pricing and availability information from businesses like pet groomers, dry cleaners, and auto shops without having to make the calls themselves. How it works: Google's Gemini model uses Duplex technology to conduct business calls, announcing itself as an AI assistant representing a customer. Users can select "have AI check pricing" from business listings in Search results for supported service types. The system...

read Jul 16, 2025

Mistral releases Voxtral, an open-source voice AI that challenges paid alternatives

Mistral has released Voxtral, an open-source voice AI model that goes beyond basic transcription to offer summarization and speech-triggered functions, challenging paid alternatives from companies like ElevenLabs and Hume AI. The Apache 2.0-licensed model comes in 24B and 3B parameter versions, with Mistral claiming it bridges the gap between proprietary speech recognition systems and existing open-source alternatives that often lack semantic understanding. What you should know: Voxtral offers comprehensive voice processing capabilities that extend far beyond traditional transcription services. The model can process up to 30 minutes of audio for transcription or 40 minutes for audio understanding with a 32K...

read Jul 10, 2025

Google’s Gemini AI now turns photos into 8-second videos with audio

Google has launched a new Gemini AI feature that transforms photos into video clips using its Veo 3 video model. The feature generates eight-second videos complete with AI-generated audio, including background noises, environmental sounds, and speech that sync to the visuals, expanding Google's AI capabilities beyond text and static images. What you should know: The photo-to-video capability is now available to Google AI Ultra and Pro subscribers in select regions, rolling out on web today and mobile devices throughout the week. Users access the feature by clicking "tools" in the prompt bar, selecting "video," and uploading their photo alongside a...

read Jun 23, 2025

Google’s Flow AI creates cinematic videos with sound and dialogue. Here’s how to enjoy it.

Google's Flow AI transforms text descriptions into professional-quality videos complete with sound effects, dialogue, and sophisticated camera work. Unlike basic AI video generators that produce silent clips from simple prompts, Flow offers filmmaking-grade controls that let users specify camera angles, add orchestral soundtracks, and script character dialogue—essentially putting a full video production studio at your fingertips. This comprehensive capability comes with a learning curve, but the results can be remarkably cinematic. After extensively testing Flow's features across multiple video projects, here's a complete guide to creating compelling AI-generated videos that go far beyond basic text-to-video conversion. Understanding Flow's pricing and...

read Jun 20, 2025

Deezer battles AI music fraud as streaming scams reach 20K daily tracks

Music streaming service Deezer will begin flagging AI-generated songs on its platform as part of an escalating battle against streaming fraud. The Paris-based company reports that 18% of daily uploads—roughly 20,000 tracks—are now completely AI-generated, nearly doubling from 10% just three months earlier, with fraudsters using these songs to manipulate streams and collect royalties illegally. The big picture: AI-generated music is becoming a vehicle for large-scale streaming fraud, with Deezer estimating that seven in 10 listens of AI songs come from bots rather than humans. Fraudsters "create tons of songs" and use automated systems to inflate play counts, earning substantial...

read Jun 16, 2025

ByteDance’s free AI tool Pippit transforms photos into talking videos

Social media's latest viral sensation isn't a dance or a meme—it's artificial intelligence creating talking baby podcasts. This unexpected trend showcases the growing influence of AI-powered video creation tools that are transforming how businesses and creators produce content at scale. At the center of this phenomenon is Pippit, a free AI video generator developed by CapCut, ByteDance's video editing platform. While ByteDance is best known for owning TikTok, CapCut has emerged as a major player in the content creation space, and Pippit represents the company's push into AI-generated video content. The tool allows users to transform static images into dynamic...

read Jun 13, 2025

Google tests AI audio overviews directly in mobile search results

Google has rolled out a test feature that integrates AI-generated Audio Overviews directly into mobile search results, allowing users to generate podcast-style discussions about their queries. The experiment, available through Google Labs, expands the company's audio AI capabilities beyond existing tools like NotebookLM and Gemini, bringing conversational AI content directly to the search experience. How it works: Users can generate AI-powered audio discussions for certain search queries through a new button that appears on the first page of mobile search results. When searching for topics like "How do noise cancellation headphones work?", a "Generate Audio Overview" button appears beneath the...

read Jun 13, 2025

San Francisco police test AI software that writes reports from body camera footage

The San Francisco Police Department is testing artificial intelligence software called Draft One that generates first drafts of police reports from body-worn camera recordings. The pilot program, which began last month, involves 54 officers at two stations using the AI tool for citations and misdemeanor cases, excluding domestic violence, sexual assault, and DUI incidents. What you should know: The AI software extracts information from body camera footage to create initial report drafts, though officers must still review, edit, and sign off on accuracy before submission. As of Tuesday, "no current report narrative drafted by Draft One involves an arrest," according...

read Jun 13, 2025

No Suno for you! Sound editors ban AI from Golden Reel Awards over ethical concerns

The Motion Picture Sound Editors (MPSE) has banned generative AI-created sound work from eligibility for its Golden Reel Awards, citing unresolved legal and ethical standards around AI use. The decision positions the prestigious sound editing awards as a key battleground in the entertainment industry's ongoing struggle to define boundaries for artificial intelligence in creative work. What you should know: The MPSE board made the decision based on concerns about the current lack of established standards for AI use in creative fields. "Standards for the legal and ethical use of Generative AI have yet to be established and are far from...

read Jun 12, 2025

AI decodes sperm whale language revealing complex communication patterns

Researchers at the CETI project are using artificial intelligence to decode sperm whale communication, revealing that these marine mammals possess a far more sophisticated language system than previously understood. The breakthrough suggests that sperm whales may be the first non-human species to demonstrate "duality of patterning"—a linguistic complexity once thought to be uniquely human—potentially revolutionizing our understanding of animal intelligence and communication. What they discovered: CETI researchers have identified 21 distinct types of "codas" or call systems in sperm whale communication that show remarkable complexity. The team found that sperm whales can produce 10 times more meanings than previously believed,...

read Jun 9, 2025

YouTube creators fight AI music flood with “No AI” playlist labels

YouTube creators are increasingly labeling their background music playlists as "No AI" as artificial intelligence-generated content floods the platform's ambient music space. The trend highlights how AI-generated music has become so pervasive that curators must actively distinguish their human-created content, while some channels exploit AI tools to rapidly produce monetizable playlists without crediting artists or disclosing their methods. What you should know: AI-generated music has infiltrated YouTube's popular background music ecosystem, particularly affecting lo-fi and instrumental playlists that millions use for studying and relaxation. Music influencer Derrick Gee discovered that one popular lo-fi playlist channel used almost entirely AI-generated tracks,...

read Jun 2, 2025

ElevenLabs launches advanced AI voice agents for conversations

ElevenLabs' Conversational AI 2.0 marks a significant evolution in voice agent technology, introducing advanced capabilities that make AI interactions more natural and effective. Released just five months after the original platform, this update delivers enterprise-ready features designed to create more sophisticated and trustworthy AI communications. The new system combines human-like conversation capabilities with enhanced knowledge integration and operational efficiency, positioning it as a comprehensive solution for businesses seeking to deploy advanced voice AI. The big picture: ElevenLabs has engineered more natural human-AI interactions through custom models that make conversations flow more intuitively. The new turn-taking model analyzes conversational cues like...

read May 28, 2025

Veo 3 brings audio to AI video and tackles the Will Smith Test

Google's Veo 3 represents a significant leap in AI video generation by introducing synchronized audio capabilities, enabling users to create realistic videos with voices, dialog, and sound effects. This advancement marks a notable evolution from the silent AI videos of 2022-2024, though it still exhibits quirks like the infamous "crunchy spaghetti" effect when generating eating sounds. As AI video technology rapidly improves, it raises important questions about the potential for creating increasingly convincing synthetic content of real people. The big picture: Google has launched Veo 3, a groundbreaking AI video synthesis model that generates synchronized audio tracks with eight-second high-definition...

read May 27, 2025

Google AI creates lifelike Will Smith double eating virtual spaghetti

Google's Veo 3 marks a significant leap in AI video generation by introducing synchronized audio capabilities, a feature previously absent from major AI video tools. This development enables AI-generated videos to include voices, dialog, and sound effects within eight-second HD clips, significantly advancing the realism of synthetic media. The evolution from the primitive, silent AI videos of 2022-2024 to today's more sophisticated audiovisual creations demonstrates the accelerating development of generative AI technology. The big picture: Google's new Veo 3 AI model represents the first major AI video generator capable of creating synchronized audio tracks alongside video content. The model can...

read May 26, 2025

NotebookLM Android app falls short of its AI desktop counterpart

Google's NotebookLM stands out in the AI landscape by offering users control over their information sources, making it uniquely valuable for research and reference tasks. While the tool has gained a devoted following for its ability to help users parse complex documents like insurance quotes and technical manuals, its newly released Android app falls significantly short of the web version's capabilities. This disconnect between platforms highlights the challenges tech companies face when translating feature-rich web applications to mobile environments. The big picture: Google's NotebookLM distinguishes itself from general-purpose AI chatbots by allowing users to specify which sources the AI draws...

read May 26, 2025

Deciphering Flipper: AI system DolphinGemma aims to decode dolphin language

Google's quest to decipher dolphin communication marks a significant leap in interspecies understanding, blending cutting-edge AI with decades of marine research. The DolphinGemma project represents the first serious attempt to apply large language model technology to non-human vocalizations, potentially opening new windows into how these highly intelligent marine mammals communicate and paving the way for more meaningful interactions between humans and other species. The big picture: Google has partnered with Georgia Tech and the Wild Dolphin Project to develop DolphinGemma, a large language model specifically designed to analyze and generate dolphin vocalizations. The project leverages 40 years of acoustic data...

read May 24, 2025

Google’s Veo 3 offers realism and speed but faces technical hurdles

Google's Veo 3 represents a significant advancement in AI video generation technology, offering unprecedented realism and audio capabilities for casual creators. While the tool can produce remarkably lifelike 8-second clips complete with dialogue and sound effects in under two minutes, it still faces challenges with prompt interpretation, audio consistency, and complex scene handling. As AI-generated video becomes increasingly indistinguishable from human-made content, Veo 3 raises important questions about the blurring lines between fact and fiction in digital media. The big picture: Google's latest AI video generator, Veo 3, delivers impressive realism but comes with limitations and a substantial price tag...

read May 23, 2025

AI Will Smith Eating Spaghetti 2: Impresario of Disgust

Google DeepMind's latest Veo 3 video generation model represents a significant leap in AI-generated video realism, now capable of creating photorealistic footage complete with synchronized audio. The evolution is starkly illustrated by comparing a viral "Will Smith eating spaghetti" video from two years ago with a new version that features dramatically improved visuals plus AI-generated eating sounds—highlighting both the impressive technical progress and potential discomfort these advances might trigger as the technology becomes increasingly indistinguishable from reality. The big picture: Google's Veo 3 model marks a transition from silent AI-generated videos to fully audio-enabled synthetic content that approaches photorealism. The...

read