×
Nvidia claims its new Fugatto model is the world’s most versatile sound machine
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The development of Fugatto, NVIDIA’s groundbreaking generative AI sound model, represents a significant advancement in audio technology, offering unprecedented flexibility in creating and manipulating various types of sound using text and audio inputs.

The breakthrough innovation: NVIDIA has introduced Fugatto, a versatile generative AI model that can create and transform any combination of music, voices, and sounds using text prompts and audio file inputs.

  • The model’s name stands for Foundational Generative Audio Transformer Opus 1
  • Multi-platinum producer Ido Zmishlany describes the technology as “wild,” highlighting its potential to create entirely new sounds in real-time
  • The system demonstrates emergent properties, allowing it to perform tasks beyond its initial training

Technical capabilities: Fugatto represents a significant leap forward in audio AI technology, combining multiple sophisticated features into a single system.

  • The model utilizes 2.5 billion parameters and was trained on NVIDIA DGX systems with 32 NVIDIA H100 Tensor Core GPUs
  • It employs ComposableART technology to combine different audio instructions that were originally trained separately
  • Users can control the degree of various effects through temporal interpolation, allowing for nuanced adjustments to accents, emotions, and sound transitions

Practical applications: The technology offers diverse use cases across multiple industries and creative fields.

  • Music producers can quickly prototype songs and experiment with different styles and instruments
  • Advertising agencies can adapt campaigns with various accents and emotional tones
  • Language learning tools can be personalized with familiar voices
  • Video game developers can modify audio assets in real-time based on gameplay

Innovative features: The model breaks new ground in audio manipulation and generation capabilities.

  • Creates novel sound combinations, such as making a trumpet bark or a saxophone meow
  • Generates high-quality singing voices from text prompts with minimal training data
  • Produces dynamic soundscapes that evolve over time, such as transitioning from a thunderstorm to birdsong at dawn
  • Enables fine-grained control over sound attributes and transitions

Development process: The creation of Fugatto involved a diverse international team and sophisticated data preparation methods.

  • The project spanned over a year and included team members from India, Brazil, China, Jordan, and South Korea
  • Researchers developed a complex strategy for generating training data, including millions of audio samples
  • The team’s diverse background contributed to enhanced multi-accent and multilingual capabilities

Looking ahead: The emergence of Fugatto suggests a transformation in how we create and interact with sound, potentially leading to new forms of artistic expression and practical applications across industries, while raising questions about the future relationship between AI and human creativity in audio production.

Now Hear This: World’s Most Flexible Sound Machine Debuts

Recent News

Benchmark limitations and the need for new ways to measure AI progress

I don't see the headline or article that you'd like me to analyze. Once you share those, I can create an appropriate excerpt/sub-headline following the style guidelines you've provided.

Developers say Stack Overflow isn’t keeping up with needs of the AI era — here’s why

Community-driven toxicity and rigid moderation are eroding Stack Overflow's dominance as the go-to resource for developers seeking coding solutions.

AI voice company ElevenLabs offers 2-month free trial in Black Friday sale

ElevenLabs slashes prices of its AI voice synthesis tools as competition intensifies among providers of synthetic speech technology.