Nvidia’s new AI generates music from text and audio inputs

The artificial intelligence industry continues to push boundaries in audio generation, with Nvidia announcing Fugatto, a new AI tool capable of creating unique sounds and music through text and audio inputs.

Key Innovation: Nvidia’s Fugatto represents a significant advancement in AI-generated audio by producing novel sound combinations and transformations that claim to be entirely original.

The system can create unconventional audio combinations, such as a trumpet that meows or a saxophone that transitions from howling to barking before merging into electronic music
Fugatto can generate specific sound effects based on detailed descriptions, including complex audio scenarios like simulating the sound of a sentient machine awakening
The tool offers voice modification capabilities, allowing users to alter accents and emotional tones

Technical Capabilities: Fugatto demonstrates versatility in audio manipulation and creation through a comprehensive set of features.

Users can isolate vocals within existing songs and add new instruments
The system enables melody transformation by switching between different instruments and vocal styles
Built on millions of audio samples, including BBC sound effects, the model uses specialized instructions to expand its task range and accuracy

Industry Context: While several major tech companies offer AI audio tools, Fugatto’s claimed ability to create entirely new sounds sets it apart in a competitive landscape.

Existing players in the AI audio space include Stability AI, OpenAI, Google DeepMind, ElevenLabs, and Adobe
The industry faces ongoing legal challenges, with some AI startups dealing with copyright lawsuits over music creation tools
Recent reports have highlighted that Nvidia, among other companies, trained AI models using YouTube video subtitles

Development Status: The path to public availability remains unclear, though technical details have been disclosed.

Nvidia has released a research paper detailing the datasets used in training
The company has not announced any timeline for public release or commercial availability
The technology required extensive training on millions of audio samples to achieve its current capabilities

Looking Ahead: While Fugatto’s ability to create novel sounds could open new creative possibilities for music and audio production, questions about commercial viability, copyright implications, and potential misuse will likely influence its eventual deployment and adoption in the broader market.

Nvidia’s new AI generates music from text and audio inputs

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development