×

What does it do?

  • Voice Conversion
  • Singing Voice Conversion
  • Music Production
  • Audio Processing
  • Speech Processing

How is it used?

  • train models
  • Install via pip
  • preprocess audio
  • use GUI/CLI.
  • 1. Install w/ pip or pipx
See more

Who is it good for?

  • Developers
  • Researchers
  • Musicians
  • Audio Engineers
  • Voice Actors

Details & Features

  • Made By

    Voicepaw
  • Released On

SoftVC VITS Singing Voice Conversion Fork is a software tool designed for singing voice conversion. It allows users to transform vocal recordings from one singer to another, offering real-time conversion capabilities and a user-friendly interface for both novice and experienced users.

Key features:
- Real-time Voice Conversion: Enables live transformation of vocal input.
- QuickVC Integration: Incorporates QuickVC technology for enhanced voice conversion quality.
- User Interface Options: Provides both a graphical user interface and a command-line interface for flexible usage.
- Improved Pitch Estimation: Utilizes CREPE for more accurate pitch detection and analysis.
- Accelerated Training: Offers training speeds approximately twice as fast as the original repository.
- Simplified Installation: Supports installation via pip or pipx with automatic model and dependency downloads.
- Code Quality: Implements code formatting using black, isort, and autoflake for improved readability.
- Audio Preprocessing Tools: Includes utilities for resampling and speech diarization of audio files.
- Versatile Model Handling: Supports both training new models and inference with existing ones.
- Cloud Compatibility: Can be utilized on cloud platforms like Google Colab and Paperspace for users without high-performance GPUs.
- Local Processing: Operates on local systems with GPUs having at least 4 GB of VRAM.
- Model Exportation: Supports exporting models to ONNX format (functionality currently limited).
- Comprehensive Documentation: Provides detailed guides and help commands for user assistance.

How it works:
1. Install the package using pip or pipx, which automatically sets up required models and dependencies.
2. Preprocess audio files using commands such as 'pre-resample', 'pre-config', and 'pre-hubert'.
3. Train models using the 'train' command.
4. Perform voice conversion using the 'infer' command or the graphical interface.

Integrations:
PyTorch, Hugging Face, CIVITAI, Pyannote.audio

Use of AI:
The software employs generative AI techniques for real-time voice conversion. It uses CREPE for pitch estimation and leverages PyTorch for model training, incorporating features like QuickVC.

AI foundation model:
The project is built on PyTorch, a widely-used deep learning framework.

Target users:
- Developers working on singing voice conversion projects
- Researchers studying voice conversion and analysis
- Musicians experimenting with vocal transformations

How to access:
The software is available as an open-source project on GitHub. It can be installed and used via command-line interface on local systems or compatible cloud platforms.

  • Supported ecosystems
    GitHub, Adobe, Hugging Face, Google
  • What does it do?
    Voice Conversion, Singing Voice Conversion, Music Production, Audio Processing, Speech Processing
  • Who is it good for?
    Developers, Researchers, Musicians, Audio Engineers, Voice Actors

Alternatives

CapCut transforms voices and enhances videos with effects for content creators and social media users.
CapCut transforms voices and enhances videos with effects for content creators and social media users.
Auphonic automates audio post-production for podcasts, videos, and broadcasts with AI processing.
Auphonic automates audio post-production for podcasts, videos, and broadcasts with AI processing.
Create synthetic vocals for music, offering voice customization and lyric generation
Separate vocals from instrumentals in songs to create karaoke and acapella versions online
Remove vocals from audio files to create instrumentals or karaoke tracks on Windows, Mac, and Linux
Create and manage radio ads with AI-generated voices and real-time customization.
AudioShake splits audio into stems for mixing, mastering, and creative applications
Bland.ai enables developers to create and deploy intelligent voice applications for phone calls.