×
AI headphones clone multiple voices for real-time translation
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Yes, please do listen to the voices in your head.

Researchers have developed a groundbreaking AI headphone system that can simultaneously translate multiple speakers in real-time, potentially eliminating language barriers in multilingual group conversations. The Spatial Speech Translation system not only converts foreign languages into English text but also preserves each speaker’s unique vocal characteristics and emotional tone, creating a more natural translation experience than existing technologies. This innovation could transform international communication by enabling people to confidently express themselves across language divides.

How it works: The University of Washington’s Spatial Speech Translation system uses AI to track and translate multiple speakers simultaneously in group settings.

  • The technology works with standard noise-canceling headphones connected to a laptop running Apple‘s M2 chip, which supports the necessary neural networks.
  • The system employs two AI models: the first identifies speakers and their locations, while the second translates their speech from French, German, or Spanish into English text.

The big picture: Unlike existing translation tools that focus on single speakers, this system addresses the challenge of following conversations where multiple people speak different languages simultaneously.

  • The technology preserves speakers’ unique vocal characteristics, essentially creating a “cloned” voice that maintains the emotional tone of the original speaker.
  • Researchers presented their work at the ACM CHI Conference on Human Factors in Computing Systems in Japan this month.

Why this matters: The technology could break down significant communication barriers for non-native speakers in various professional and social contexts.

  • “There are so many smart people across the world, and the language barrier prevents them from having the confidence to communicate,” explains Shyam Gollakota, a professor who worked on the project.
  • Gollakota shares that his mother has “incredible ideas when she’s speaking in Telugu,” but struggles to communicate with people during visits to the US from India.

What’s next: Researchers are now working to reduce the system’s latency to under one second to enable more natural conversational flow.

  • The team aims to maintain the “conversational vibe” by minimizing delays between when someone speaks and when the translation is delivered to the listener.
  • Current technology requires the headphones to be connected to a laptop, though the same M2 chip that powers the system is also present in Apple’s Vision Pro headset, suggesting potential for more portable implementations.
A new AI translation system for headphones clones multiple voices simultaneously

Recent News

AI chip tracking bill aims to curb China’s tech access

New legislation would mandate location-tracking systems on export-controlled AI semiconductors to prevent smuggling to China, as bipartisan concerns grow over technology diversion that could enhance military capabilities.

AI transforms motivation and learning approaches, study finds

AI systems showcase learning without traditional motivation, suggesting educators should focus more on designing effective learning environments rather than inspiring students.

Chrome’s new AI mode blocks scams: How it safeguards users

Chrome's new AI protection system analyzes suspicious patterns to identify and block previously undetected scams across Google's platforms, keeping users twice as safe from phishing attempts.