News/Audio

Oct 17, 2024

This Chrome extension detects AI deepfakes in seconds

A new tool to combat deepfakes: Hiya's Deepfake Voice Detector, a Chrome extension, aims to identify AI-generated audio content across various online platforms, addressing growing concerns about misinformation and fraud. The extension can detect deepfaked audio on popular sites like YouTube, X/Twitter, and Facebook, requiring a verified email address for access. It analyzes a few seconds of audio to determine authenticity, providing an "Authenticity Score" for each piece of content examined. The tool's release is timed to help prevent political deepfakes from influencing viewers in the lead-up to the US federal election. How it works: The Deepfake Voice Detector focuses...

read
Oct 15, 2024

AI startup Gladia secures $16M for transcription tech

AI-powered transcription breakthrough: Gladia, a Paris-based startup, has secured $16 million in funding to develop an advanced real-time audio transcription and analytics engine, challenging established players in the industry. Gladia's innovative technology aims to provide multilingual, accent-adaptive transcription services with enhanced accuracy and speed. The funding round was led by XAnge, with participation from several other venture capital firms, bringing Gladia's total funding to $20.3 million since its founding in 2022. The company plans to use the capital to advance its R&D efforts and expand its product offerings, including the development of additional AI models. Technological edge and market positioning:...

read
Oct 10, 2024

OpenAI’s new Realtime API may supercharge all your smart speakers

OpenAI's Realtime API, a game-changer for voice interaction: OpenAI has introduced a new technology that could significantly enhance the capabilities of smart speakers and other voice-controlled devices, potentially transforming the way we interact with AI-powered assistants. Key features of the Realtime API: Enables developers to build fast speech-to-speech experiences into their applications Offers functionality similar to ChatGPT's Advanced Voice Mode Streams audio and input directly, allowing for natural interruptions in conversations Improves upon previous methods that relied on transcribing scripts using speech recognition applications Potential improvements for smart speakers: Better interruption detection, allowing users to correct misinterpreted commands more easily...

read
Oct 8, 2024

The ChromeOS Recorder app is coming to all Chromebooks

New ChromeOS Recorder app brings advanced audio capabilities to all Chromebooks: Google is set to introduce a new Recorder app with ChromeOS 130, offering enhanced recording and transcription features to users across all Chromebook devices. Key features and functionality: The Recorder app brings a range of powerful audio tools to Chromebooks, with some AI-powered capabilities reserved for Chromebook Plus devices. All Chromebooks will have access to the core recording and transcription features, which work offline and in real-time after downloading a relatively small 100MB model. The app's interface is inspired by the Pixel Recorder app, featuring a dual-column layout and...

read
Oct 7, 2024

Meta’s new AI tool generates and edits realistic videos

Meta unveils Movie Gen: A new frontier in AI-generated video and audio: Meta has announced Movie Gen, a cutting-edge AI tool capable of creating and editing realistic video and audio content based on text prompts, marking a significant advancement in the field of AI-generated media. Capabilities and features: Movie Gen boasts an impressive array of functionalities that position it as a powerful contender in the rapidly evolving AI video generation landscape. The system can generate 1080p videos up to 16 seconds long in various aspect ratios, offering flexibility for different platforms and use cases. Movie Gen extends its capabilities to...

read
Oct 2, 2024

Google to enhance controls for AI-generated podcasts, says NotebookLM lead

Unveiling AI-powered podcasts: Google's NotebookLM has introduced a groundbreaking feature called "Audio Overviews," enabling users to create custom, AI-generated podcasts with artificial host voices discussing user-uploaded content. The new feature transforms text-based content into engaging audio formats, opening up new possibilities for information consumption and sharing. Users can leverage AI technology to generate podcast-like content from their own uploaded materials, potentially revolutionizing how information is disseminated and consumed. Upcoming enhancements: Raiza Martin, NotebookLM's product leader, has announced forthcoming updates that will provide users with greater control over Audio Overviews. Users will soon be able to select different "personas" for AI...

read
Sep 30, 2024

AI-generated podcasts are coming. Would you listen?

AI-generated podcasts: A new frontier in content consumption: Google's recent launch of a feature that can generate podcast conversations from articles represents a significant advancement in AI-driven content creation, potentially transforming how we consume information. The technology in action: Google's AI-powered podcast generation tool demonstrates impressive capabilities, producing human-like conversations complete with natural interactions and content awareness. The AI-generated podcasts include elements such as chuckling, question-asking, and interplay between "hosts," mimicking the style of traditional radio shows or podcasts. The system can reference images mentioned in the original article, adding depth to the audio content. However, the current output lacks...

read
Sep 24, 2024

Zoom’s AI Companion Will Summarize Meetings and Create Action Items for You

Zoom's AI Companion: A digital assistant for enhanced meeting productivity: Zoom has introduced an AI-powered feature called AI Companion, available to all paid subscribers, designed to streamline meeting processes and boost productivity. Key features and capabilities: AI Companion offers a range of functionalities to support users during and after Zoom meetings, leveraging advanced language models from tech giants. The AI can generate meeting summaries, create action items, and analyze participant engagement by identifying who spoke the most. It utilizes language models from OpenAI, Anthropic, and Meta, trained on Zoom employee meetings to ensure relevance and effectiveness. Users can access AI...

read
Sep 22, 2024

Google’s NotebookLM Turns Text to Podcasts — Here’s How to Try It

Google's NotebookLM: A new AI tool for content creators: Google has introduced NotebookLM, an innovative AI research assistant that transforms written content into realistic audio conversations, mimicking the format of podcast discussions. Key features and functionality: NotebookLM analyzes uploaded articles and generates an audio conversation that sounds like two podcast hosts discussing the content, offering a novel way to engage with written material. The tool accepts various input formats, including articles, PDFs, and plain text. Users can upload multiple sources to create a more comprehensive discussion. The generated audio currently uses two American-accented voices, though customization options are limited. The...

read
Sep 20, 2024

Suno Releases ‘Covers,’ an AI Feature to Reimagine Songs in New Genres

AI-powered musical transformation: Suno has introduced a new feature called Covers that allows users to reimagine existing songs in different genres, preserving the original melody while altering the musical style. The Covers feature, currently in beta, can transform various audio inputs, including produced tracks and casual recordings, into multiple musical forms. Users can select a song from their library and choose the "Cover Song" option to process the track into a new style. The AI-powered tool can also add genre-appropriate lyrics to instrumental tracks, though Suno recommends users refine the generated lyrics for coherence. Expanding creative possibilities: Suno's new feature...

read
Sep 18, 2024

Hume Launches ‘EVI 2’ AI Voice Model with Emotional Responsiveness

Hume's EVI 2: A leap forward in AI voice technology: Hume, an AI startup, has unveiled Empathic Voice Interface 2 (EVI 2), a significant upgrade to its AI voice model and API, offering enhanced naturalness, emotional responsiveness, and customizability. Key improvements and features: EVI 2 brings substantial enhancements over its predecessor, addressing critical aspects of AI voice interaction. The new version boasts a 40% reduction in latency, with response times ranging from 500 to 800 milliseconds, greatly improving real-time conversation capabilities. EVI 2 demonstrates improved emotional intelligence, better understanding and responding to the emotional context of user inputs. The system...

read
Sep 18, 2024

Tencent, Johns Hopkins Unveil New Text-to-Audio AI Model

Breakthrough in AI-generated audio: Tencent AI Lab and Johns Hopkins University researchers have unveiled EzAudio, a revolutionary text-to-audio (T2A) generation model that produces high-quality sound effects from text prompts with remarkable efficiency. Key innovations driving EzAudio: The model's architecture, called EzAudio-DiT (Diffusion Transformer), introduces several technical advancements to enhance performance and efficiency. EzAudio operates in the latent space of audio waveforms, departing from traditional spectrogram-based methods and eliminating the need for a neural vocoder. The model incorporates a new adaptive layer normalization technique called AdaLN-SOLA, long-skip connections, and advanced positioning techniques like Rotary Position Embedding (RoPE). These innovations allow for...

read
Sep 18, 2024

Bose Challenges Apple and Sonos with New AI-Powered Audio Gear

Bose Unveils New Audio Products: Bose has announced two new products aimed at competing with Apple and Sonos in the earbud and soundbar markets, respectively. QuietComfort Earbuds: A More Affordable ANC Option: Bose's new QuietComfort Earbuds offer active noise cancellation (ANC) at a competitive price point of $179, directly challenging Apple's AirPods 4. The in-ear buds feature Bose-tuned audio, user-adjustable controls, and voice command capabilities. With six microphones in total, the earbuds prioritize noise cancellation and call quality. Users can choose between three listening modes: ANC, transparency, and off. The earbuds boast an IPX4 water resistance rating, making them suitable...

read
Sep 15, 2024

New AI Tool Lets You Upload Silent Videos and Read Speakers’ Lips

AI-powered lip reading technology debuts: Symphonic Labs, an audio tech startup, has launched an online tool showcasing their AI's lip reading capabilities, potentially revolutionizing speech understanding in various contexts. The San Francisco and Canada-based company specializes in "multimodal speech understanding" tools, with applications ranging from voice calls in noisy environments to whispering commands to voice assistants in public. The startup's new website, readtheirlips.com, allows users to upload short video clips of speakers and receive text transcriptions of what the AI calculates is being said, even when the audio is inaudible. The tool requires clear visibility of the speaker's face and...

read
Sep 14, 2024

Audible’s AI Voice Cloning Initiative is Part of a Larger Trend of AI Integration

Audible's new AI voice cloning program for audiobook narrators marks a significant shift in the audiobook industry, potentially transforming the way audio content is produced and consumed. Revolutionizing audiobook production: Amazon's Audible is launching a trial program that allows narrators to create and monetize AI-generated replicas of their own voices for audiobook narration. The beta program, currently limited to a select group of narrators in the United States, represents a pioneering step in integrating AI technology into the audiobook creation process. Participating narrators will have control over which projects their AI-cloned voices are used for, ensuring a level of autonomy...

read
Sep 14, 2024

Review: Plaud’s New ‘NotePin’ is a Great Voice Recorder that May Stand No Chance

AI-powered voice recording evolves: The Plaud NotePin represents a significant advancement in dedicated AI voice recording devices, offering accurate transcription and summarization capabilities in a convenient form factor. The $169 NotePin is a pill-shaped voice recorder that can transcribe, summarize, and extract important information from audio recordings. Plaud has successfully delivered on its promises, creating a functional AI gadget in a year marked by high-profile failures and AI vaporware. The device leverages mature technologies, including miniature microphones, speech-to-text transcription, and natural language processing for AI summarization. Competitive landscape and market trends: Despite the NotePin's effectiveness, the rapidly evolving AI voice...

read
Sep 11, 2024

Google’s NotebookLM AI Turns Notes into Engaging Podcasts

AI-powered podcast generation from notes: Google has introduced an experimental feature in NotebookLM, its AI note-taking app, that transforms user research into AI-generated podcasts with two virtual hosts discussing the content. The new feature, called Audio Overview, utilizes Google's Gemini AI model to create podcast-like discussions based on users' notes, transcripts, and research documents. AI hosts summarize the material, make connections between topics, and engage in banter, providing an audio version of NotebookLM's existing summarization capabilities. The feature aims to make learning from research more engaging by presenting information in a conversational format. User experience and AI capabilities: The AI-generated...

read
Sep 11, 2024

NVIDIA’s Holoscan Allows Developers to Integrate AI Into Live Media

NVIDIA recently introduced Holoscan for Media, a new platform designed to streamline AI integration into live media applications. This software-defined system, which has just begun its limited release, enables media companies to operate their live media pipelines and AI processes on a single infrastructure. The platform is engineered to simplify the process for developers incorporating AI capabilities into their real-time media projects. AI revolutionizing broadcast industry: NVIDIA's Holoscan for Media platform is transforming content creation, distribution, and consumption by simplifying AI integration into live media applications. Key features of Holoscan for Media: Launched in limited availability, it allows developers to...

read
Sep 4, 2024

BBC Sounds Tests AI-Generated Subtitles to Boost Accessibility

BBC Sounds introduces AI-generated subtitles: The BBC is trialing artificial intelligence-powered subtitles and transcripts for select shows on its popular audio platform, BBC Sounds, marking a significant step towards improving accessibility. The three-month trial currently includes five shows: In Touch, Access All, Profile, Sporting Witness, and Economics with Subtitles. Users can access the AI-generated subtitles via a new button in the playbar on the BBC Sounds website. The trial is also expanding to the BBC Sounds app, starting with Android devices and later iOS. How it works: The BBC is utilizing Whisper AI, a speech-to-text artificial intelligence system, to generate...

read
Sep 4, 2024

ElevenLabs Just Released an AI-Powered SFX Library

AI-powered sound effects library launched by ElevenLabs: The leading artificial intelligence sound generation company, ElevenLabs, has released a comprehensive library of AI-generated sound effects, marking a significant development in the audio production industry. The new SFX Explore library encompasses a wide range of categories, including instruments, nature sounds, emotions, and animal noises. This collection combines sound effects created by ElevenLabs itself and those contributed by individual users of the AI audio platform. The library is designed to enhance various media projects, from AI-generated films and podcasts to video games, by providing easy access to diverse and customizable sound effects. Key...

read
Aug 30, 2024

AI Music Generator LoudMe Offers Free Songs from Text Prompts

AI-powered music creation: LoudMe emerges as a free AI music generator, offering users the ability to create full-length songs from text prompts, challenging traditional music production methods. LoudMe generates 2-3 minute tracks based on user-provided text descriptions, catering to those who need quick musical solutions without concern for high audio quality or extensive editing capabilities. The platform allows users to incorporate their own lyrics into the AI-generated compositions, providing a degree of personalization in the music creation process. In addition to full song generation, LoudMe offers a free sound effects generator and a search function for royalty-free sounds and music,...

read
Aug 15, 2024

New AI Tech Allows Smartphone Microphones to Detect Early Heart Problems

A new technology can analyze cardiac rhythms using only a smartphone microphone. This breakthrough has the potential to make continuous health tracking more accessible and detect issues earlier. A personal mission drives innovation: Roeland Decorte's journey into AI-powered health monitoring began with his father's misdiagnosed heart condition, inspiring him to develop technology that could prevent similar occurrences. Decorte's unique background, growing up in a Belgian nursing home and attending Cambridge University at 17, provided him with diverse experiences that shaped his approach to problem-solving. His specialization in ancient codebreaking at Cambridge laid the foundation for his current work in pattern...

read
Jun 29, 2024

AI vs AI: The High-Stakes Battle to Detect Deepfakes and Defend Reality

Deepfakes are becoming more sophisticated and accessible, posing risks for businesses and democracy. A new company founded by image manipulation expert Hany Farid aims to combat the problem with AI and traditional forensic techniques. Key takeaways: Get Real Labs has developed software to detect AI-generated and manipulated images, audio, and video that is being tested by Fortune 500 companies to spot deepfake job seekers: Some companies have lost money to scammers using deepfakes to impersonate real people in video interviews, taking signing bonuses and disappearing. The FBI and others have warned about the growing threat of deepfakes being used in...

read
Jun 29, 2024

Resemble AI’s Detect-2B: A Leap Forward in Combating Deepfake Audio

Resemble AI has released Detect-2B, a next-generation AI audio detection model that can identify deepfake audio with 94% accuracy, marking a significant advancement in the fight against misinformation and erosion of trust in an era of increasingly sophisticated generative AI. Key features of Detect-2B: The new model utilizes a series of pre-trained sub-models and fine-tuning techniques to analyze audio clips and determine whether they were generated using AI: The sub-models consist of a frozen audio representation model with an adaptation module inserted into its key layers, allowing the model to focus on artifacts that often distinguish real audio from fake...

read
Load More