Audio – Page 5

News/Audio

Mar 25, 2025

How Otter.ai grew to 1 billion meetings with AI transcription and zero salespeople

Otter.ai has leveraged AI-powered transcription services to disrupt a traditional market through a strategically freemium business model. The company has grown exponentially by processing over one billion meetings between 2017-2023, developing its own AI infrastructure, and implementing a product-led growth strategy without employing a single sales representative. Founded by Sam Liang, a Stanford PhD with experience at Google and a successful exit selling Location Labs for $220 million, Otter.ai demonstrates how AI companies can achieve massive scale through innovative pricing, technology ownership, and viral product features. 1. Disruptive freemium strategy Otter.ai offers 600 free transcription minutes monthly, worth approximately $600...

read Mar 24, 2025

Google’s Gemini now turns AI research into interactive podcast-style conversations

Google's latest integration of Audio Overviews with Gemini's Deep Research feature bridges the gap between AI-generated research and audio content consumption. By transforming comprehensive reports into conversational podcast-style formats, Google continues expanding its AI capabilities across different mediums and use cases. This development represents another step in Google's strategy to make AI-generated content more accessible and engaging through different sensory experiences. The big picture: Google has expanded its Audio Overviews feature to work with Gemini's Deep Research function, allowing users to convert AI-generated research reports into conversational podcasts between two AI hosts. How it works: Users can now select a...

read Mar 20, 2025

AI voice recorder Plaud Note transforms speech into summaries and mind maps

The Plaud Note AI voice recorder represents a significant advancement in voice recording technology, merging physical hardware with powerful AI capabilities to transform how professionals capture and process spoken information. By combining instantaneous transcription with AI-generated summaries and visualizations, this device addresses the growing demand for tools that not only record information but also help organize and extract meaning from it—particularly valuable in an era where efficient information processing has become a competitive advantage in many industries. The device's capabilities: Plaud Note is a credit card-sized hardware recorder that syncs with a companion mobile app to provide AI-powered features beyond...

read Mar 10, 2025

Little Country, Big Interest: Uruguay leads in audio AI adoption as smaller nations shape global AI landscape

Who knew? Global data reveals distinct national preferences in AI usage, with Uruguay showing remarkable interest in audio AI applications despite being a smaller country. This emerging pattern of varied AI adoption across different countries provides valuable insight into how artificial intelligence technologies may develop along regional lines, challenging assumptions about which nations will lead in specific AI domains. The big picture: New data highlights significant variation in global AI adoption patterns, with smaller countries often showing surprisingly high engagement with specific AI technologies despite the perception of larger nations driving innovation. A September 2023 PCMag analysis reveals distinctive AI...

read Mar 7, 2025

TooFar Media to debut AI-powered immersive story experience at SXSW 2025

There's New Media, and then there's New Media. AI-driven immersive storytelling continues to break new ground as TooFar Media prepares to debut "The Hornet's Spell" at SXSW 2025. This innovative project represents a significant evolution in narrative experiences, blending literature with AI-generated visuals, animation, and spatial audio to create a multi-dimensional storyworld that surrounds and responds to audiences rather than simply being consumed passively. The big picture: TooFar Media is pioneering a new approach to storytelling that merges traditional literary fiction with cutting-edge AI technology to create fully immersive narrative experiences. This experimental format challenges conventional storytelling boundaries by pulling...

read Mar 6, 2025

Insta-pop: New open source AI DiffRhythm creates complete songs in just 10 seconds

Northwestern Polytechnical University researchers have developed DiffRhythm, an open source AI music generator that creates complete songs with synchronized vocals and instruments in just 10 seconds. This breakthrough in music generation technology demonstrates how latent diffusion models can revolutionize creative production, offering a simplified approach that requires only lyrics and style prompts to generate high-quality musical compositions up to 4 minutes and 45 seconds long. The big picture: DiffRhythm represents the first latent diffusion-based song generation model that produces complete musical compositions with perfectly synchronized vocals and instrumentals in a single process. Key technical innovations: The system employs a two-stage...

read Mar 5, 2025

Amazon Prime Video launches AI dubbing in English and Spanish for select content

In infrequent caption-based news, Amazon's Prime Video is introducing AI-assisted dubbing capabilities in English and Spanish, leveraging artificial intelligence to expand audience reach for its content catalog. This development represents another step in media companies adopting AI technology to enhance viewer experiences and potentially capture larger international audiences who prefer watching content in their native languages. The big picture: Prime Video will launch AI-based dubbing for 12 licensed movies and series starting immediately, focusing exclusively on content that doesn't already have existing dubbing options. Key details: The feature aims to serve Prime Video's global customer base, which exceeds 200 million...

read Mar 5, 2025

Perfectly imperfect: AI voice companions evolve beyond ChatGPT with unsettling realism

A new conversational AI called Sesame is raising eyebrows with its uncannily human-like speech patterns, complete with hesitations, self-corrections, and natural interruptions. Unlike traditional AI assistants that simply convert text to speech, Sesame's breakthrough Conversational Speech Model (CSM) generates speech in a way that mirrors authentic human conversation, potentially marking a significant shift in how we interact with AI systems. The big picture: Sesame represents a departure from conventional AI voice assistants by deliberately incorporating human imperfections rather than striving for polished perfection. How it works: Sesame's Conversational Speech Model combines text and audio processing into a single unified system,...

read Mar 4, 2025

Microsoft’s Dragon Copilot AI deftly reduces clinician burnout with ambient listening tech

Whew! Microsoft's latest healthcare AI innovation combines natural language processing with ambient listening technology to tackle the persistent challenge of clinician burnout. By integrating Dragon Medical One's dictation capabilities with Dragon Ambient eXperience's conversational AI, this new tool represents a significant advancement in automating clinical documentation and administrative tasks, potentially reshaping how healthcare professionals manage their daily workflows. The big picture: Microsoft has unveiled Dragon Copilot, an AI-powered clinical workflow assistant that integrates with Microsoft Cloud for Healthcare to streamline medical documentation and reduce administrative burden. The system combines Dragon Medical One's dictation technology, which has processed billions of patient...

read Mar 4, 2025

Spotify introduces AI-powered voice translation for podcasts in 100+ languages

Spotify's strategic push into AI-powered music features signals a significant evolution in how streaming platforms are approaching personalized listening experiences. As negotiations for "Streaming 2.0" unfold between artists, record companies, and streaming providers, Spotify's rumored Music Pro tier with AI capabilities could reshape how users interact with their favorite songs, while raising important questions about intellectual property rights in the age of AI-enabled music manipulation. The big picture: Spotify's history of successful AI implementation has helped transform it from a startup into a major competitor against tech giants like Amazon, Apple, and Google. The company's rumored new Music Pro tier...

read Feb 27, 2025

Multimodal Mosaic: Hugging Face and IISc team up to boost AI for diverse Indian languages

The Indian Institute of Science (IISc) and Hugging Face have formed a strategic partnership to make Vaani, India's largest open source multilingual dataset, more accessible to developers worldwide. This collaboration builds upon Project Vaani's initial launch in 2022 with Google, which aims to create a comprehensive dataset representing India's vast linguistic landscape. Project overview: The Vaani dataset represents a groundbreaking effort to capture India's linguistic diversity through a geo-centric approach that includes remote dialects and languages often overlooked in mainstream datasets. The project targets collecting over 150,000 hours of speech and 15,000 hours of transcribed text from 1 million people...

read Feb 27, 2025

¿Decir qué? Google Translate unveils powerful new AI translation feature

The past year has seen a significant transformation in language translation capabilities through advanced AI integration. Google Translate, used by over 1 billion people worldwide, is poised to introduce a major enhancement that will allow users to interact with translations in more nuanced ways. The core innovation: Google Translate's forthcoming "Ask a Follow-up" feature represents a significant advancement in translation technology by offering interactive, context-aware translation modifications. The feature adds a new button at the bottom of translation results that opens up additional options for users to refine and understand translations Users can modify translations with options including Formal, Casual,...

read Feb 26, 2025

Luma Labs now lets you add audio to your AI-generated videos

AI video pioneer Luma Labs has enhanced its Dream Machine platform with a new audio generation feature, marking a significant advancement in AI-generated content creation. The new capability allows users to add AI-generated soundtracks to their videos, either through custom prompts or automated suggestions, addressing a crucial gap in the AI video production landscape. The innovation explained: Dream Machine's audio feature introduces free AI-generated sound effects and ambient audio that can be seamlessly integrated with AI-generated videos. Users can either allow the AI to automatically generate appropriate audio or provide specific text prompts to guide the sound creation The feature...

read Feb 23, 2025

Google’s NotebookLM Plus: Is the AI tool worth the cost?

Google's NotebookLM, now bundled with its $20 monthly One AI Premium subscription, aims to transform how users interact with their documents by functioning as an AI research assistant that can analyze and explain complex content. The tool, which processes everything from PDFs to Google Docs, joins Gemini Advanced and 2TB of cloud storage in Google's expanding AI services portfolio. But is it really worth the cost? Core functionality: NotebookLM functions as a personal AI research assistant that analyzes uploaded documents and responds to user questions about their content. Users can upload PDFs, Google Docs, and webpages to create topic-specific notebooks...

read Feb 23, 2025

5 practical tips to boost your productivity with NotebookLM

AI audio learning has evolved significantly, with Google's NotebookLM now offering the ability to transform written content into AI-generated podcast discussions. This tool allows users to convert documents and research materials into conversational audio formats, making complex information more digestible through simulated expert discussions. Here are five productivity-boosting tips to get the most out of this new tool. Comprehensive Source Collection Compile all relevant materials into a single notebook before starting Upload documents simultaneously to help NotebookLM identify patterns and connections Create a centralized repository of information for better context and insights Interactive Engagement Take advantage of the tool's interactive...

read Feb 20, 2025

Pump up the Jam: AI adoption surges among broadcasters, doubling since last year

Recent data from Haivision's 2025 Broadcast Transformation Report reveals a significant surge in AI adoption among broadcasters worldwide, with 25% now incorporating AI technologies compared to just 9% in 2024. The annual survey, based on responses from approximately 900 broadcast and media professionals, provides insights into how the broadcasting industry is evolving with new technologies. Key findings: The report highlights a dramatic increase in AI implementation across the broadcasting sector, reflecting a growing recognition of AI's potential to transform traditional broadcasting operations. One quarter of broadcasters have now integrated AI into their operations, marking a nearly threefold increase from the...

read Feb 18, 2025

Open-source AI converts books to audio for free, though with narration challenges

Take a book. Leave a book. Or make a book! The emergence of artificial intelligence text-to-speech technology has enabled new tools for creating audiobooks, with open source solutions now becoming available to the public. Autiobooks represents a significant development in this space, offering free audiobook creation capabilities to users with technical knowledge. Key Features and Setup: Autiobooks leverages the Kokoro AI text-to-speech model to transform epub files into audiobooks through a home computer setup. The installation process requires command line experience, presenting a barrier for non-technical users Once configured, the tool provides a straightforward graphical interface for converting ebooks The...

read Feb 10, 2025

NotebookLM Plus now available to all Google One AI Premium subscribers

The first AI-powered note-taking tool NotebookLM has evolved into a more robust platform with the introduction of NotebookLM Plus, marking Google's latest advancement in AI-powered productivity tools. Google's integration of NotebookLM Plus into its Google One AI Premium subscription represents a significant expansion of its AI offerings, particularly in the educational and professional productivity space. New features and capabilities: NotebookLM Plus introduces enhanced functionality and expanded usage limits to help users better manage and interpret information. Users now receive five times the standard allowances for Audio Overviews, notebooks, and sources per notebook The platform converts written content into customizable podcast-style...

read Feb 7, 2025

Meta AI research focuses on human-centered approaches for improved communication, robotics

Meta FAIR (Fundamental AI Research) has released new advancements in robotics, language technology, and audio processing aimed at developing more sophisticated and socially intelligent AI systems. Core developments: Meta's latest research focuses on three major areas: human-robot collaboration, language technology democratization, and audio processing enhancement. The PARTNR framework introduces a new benchmark and dataset for training robots to collaborate with humans on everyday tasks A language technology partnership program aims to expand support for underserved and indigenous languages Meta Audiobox Aesthetics provides a new standard for evaluating audio quality across different modalities PARTNR breakthrough: Meta's new robotics framework represents a...

read Feb 5, 2025

India builds first open-source audio language model using Llama

Sarvam AI has developed India's first open-source audio language model, Shuka v1, by integrating Meta's Llama model to process voice queries across multiple Indian languages. Project overview: Shuka v1 represents a significant breakthrough in multilingual audio comprehension, combining Llama's language processing capabilities with a custom audio encoder to handle voice interactions in ten Indian languages. The system utilizes Llama as a decoder to process audio tokens generated by Sarvam's proprietary audio encoder Shuka v1 can accurately interpret and respond to voice queries in languages including Gujarati, Hindi, Kannada, and Marathi The open-source nature of the model allows government departments and...

read Jan 31, 2025

ElevenLabs raises funding at $3.3B valuation for voice AI tech

Artificial intelligence voice cloning startup ElevenLabs has secured $180 million in Series C funding, reaching a $3.3 billion valuation. Funding details: The latest investment round was jointly led by Andreessen Horowitz and Iconiq Growth, with participation from several new and existing investors. NEA, World Innovation Lab, Valor, Endeavor Catalyst Fund, and Lunate joined as new investors Existing investors, including Sequoia Capital, Salesforce Ventures, and Smash Capital, increased their investments The company has now raised a total of $281 million since its founding in 2022 Company background: ElevenLabs, headquartered in London, specializes in developing advanced voice AI technology that can generate...

read Jan 10, 2025

AnyVoice’s AI generates realistic voices from 3-second audio samples

AnyVoice is a new AI voice cloning platform that creates synthetic voices from just 3 seconds of audio input, marking a significant reduction in the sample length traditionally required for voice cloning. Core capabilities: The service enables rapid voice cloning across four major Asian and Western languages while requiring minimal input audio. The platform currently supports English, Chinese, Japanese, and Korean languages Users can generate synthetic voices from extremely short 3-10 second audio samples The technology claims to produce highly natural-sounding voice replications Technical requirements and guidelines: AnyVoice emphasizes specific recording conditions to ensure optimal voice cloning results. Recordings must...

read Jan 10, 2025

Google’s Daily Listen creates personalized podcasts from your search history

Google has begun testing Daily Listen, a new AI feature that creates personalized short podcasts based on users' search history and preferred news topics in the Discover feed. Key Details: The experimental feature appears below the search bar in the Google app for both Android and iOS platforms, currently limited to US users enrolled in Search Labs experiments. Users receive a daily 5-minute podcast overview of their favorite topics Each podcast is labeled "Made for you" and includes the date A full-screen player provides audio playback and transcription Standard playback controls and speed adjustment options are available Users can provide...

read Jan 9, 2025

MIT unveils AI that can mimic sounds with human-like precision

MIT researchers have developed an artificial intelligence system capable of vocally imitating sounds without requiring specific training, marking a significant advancement in AI-generated audio. Key innovation: The system draws inspiration from human vocal communication patterns and leverages a model of the human vocal tract to generate authentic sound imitations. The technology can successfully replicate diverse sounds ranging from natural phenomena like rustling leaves to mechanical noises such as ambulance sirens The system demonstrates bi-directional capabilities, both producing vocal imitations and identifying real-world sounds from human vocal recreations Technical approach: MIT CSAIL's research team developed three distinct model variations to achieve...

read