×
Nvidia claims its new Fugatto model is the world’s most versatile sound machine
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The development of Fugatto, NVIDIA’s groundbreaking generative AI sound model, represents a significant advancement in audio technology, offering unprecedented flexibility in creating and manipulating various types of sound using text and audio inputs.

The breakthrough innovation: NVIDIA has introduced Fugatto, a versatile generative AI model that can create and transform any combination of music, voices, and sounds using text prompts and audio file inputs.

  • The model’s name stands for Foundational Generative Audio Transformer Opus 1
  • Multi-platinum producer Ido Zmishlany describes the technology as “wild,” highlighting its potential to create entirely new sounds in real-time
  • The system demonstrates emergent properties, allowing it to perform tasks beyond its initial training

Technical capabilities: Fugatto represents a significant leap forward in audio AI technology, combining multiple sophisticated features into a single system.

  • The model utilizes 2.5 billion parameters and was trained on NVIDIA DGX systems with 32 NVIDIA H100 Tensor Core GPUs
  • It employs ComposableART technology to combine different audio instructions that were originally trained separately
  • Users can control the degree of various effects through temporal interpolation, allowing for nuanced adjustments to accents, emotions, and sound transitions

Practical applications: The technology offers diverse use cases across multiple industries and creative fields.

  • Music producers can quickly prototype songs and experiment with different styles and instruments
  • Advertising agencies can adapt campaigns with various accents and emotional tones
  • Language learning tools can be personalized with familiar voices
  • Video game developers can modify audio assets in real-time based on gameplay

Innovative features: The model breaks new ground in audio manipulation and generation capabilities.

  • Creates novel sound combinations, such as making a trumpet bark or a saxophone meow
  • Generates high-quality singing voices from text prompts with minimal training data
  • Produces dynamic soundscapes that evolve over time, such as transitioning from a thunderstorm to birdsong at dawn
  • Enables fine-grained control over sound attributes and transitions

Development process: The creation of Fugatto involved a diverse international team and sophisticated data preparation methods.

  • The project spanned over a year and included team members from India, Brazil, China, Jordan, and South Korea
  • Researchers developed a complex strategy for generating training data, including millions of audio samples
  • The team’s diverse background contributed to enhanced multi-accent and multilingual capabilities

Looking ahead: The emergence of Fugatto suggests a transformation in how we create and interact with sound, potentially leading to new forms of artistic expression and practical applications across industries, while raising questions about the future relationship between AI and human creativity in audio production.

Now Hear This: World’s Most Flexible Sound Machine Debuts

Recent News

AI agents vs human agency: How to navigate our AI-driven world

As artificial intelligence gains sophisticated human-like abilities, business leaders grapple with balancing automation benefits against risks to workplace autonomy and culture.

Ecosia’s strategy to build a sustainable AI search engine

The world's largest not-for-profit search engine balances AI innovation with its mission of planting trees using ad revenue.

How Zordi grows flavorful strawberries indoors using AI and robotics

Advanced robotics and AI systems are enabling year-round indoor strawberry production while using 95% less water than traditional farming methods.