Introducing Vocal Separator V2

2024-05-151 Min read

Subscribe to our newsletter

By subscribing you agree to with our Privacy Policy.

Share this post

As a starting point of any creative process or digital content management in music production and distribution, you’ll find source separation. And vocal isolation in the first place, be it for sampling, karaoke version, or lyrics transcription and alignment (who said voice cloning?). We launched our dedicated Vocal Separator module last year, as one of the first on the roster line-up for our platform grand opening. And while we were really pleased with the vocal extraction performance it achieved, we knew we wanted to push further, to not only deliver results in high-quality sample rate, but also improving the isolation rate to a point we’re really proud of.

What route have we taken?

Because we foster a secure and transparent music ecosystem in the audio AI era, and not only generative, we strongly believe companies should always disclose their models training data sources. On our side we acquired, from a dedicated operator, a complete musical catalog comprised of 2,000 unmixed tracks, carefully meta-tagged, with separated stems for each. We proceeded with label verification, to fix some potential errors (luckily very few), and ensure the base was as clean as possible. Eventually we took the existing model to train again from scratch on this comprehensive high-quality dataset, through a convolutional neuronal network combined with signal processing analysis. This hybrid approach led to pristine clear vocal isolation, cleared from any artifact!

Improvements overview

Trained on a huge proprietary dataset: This catalog, spanning many different genres backed with accurate metadata allowed us to set an exceptional ground truth for the model supervision.
Increased versatility: Capable of isolating sung vocals of all types and even whispers in the background, from all genres of music, old and new, classical piece as well as industrial noise tracks.
Improved SDR: the Signal-to-Distortion Ratio (most common indicator in academical research for measuring separation quality) of the vocal stem extracted went up to 9.3, and we keep improving!

Take the opportunity to test it right away

👉 Sign up and give it a try on your own files through the “Tasks” feature on your user dashboard

Think we're on the same wavelength?

Get in touch

What route have we taken?

Improvements overview

Take the opportunity to test it right away

Related posts

AI Music Detector now detects Sonauto

AI Music Detector now detects Riffusion

AI Music Detector now detects Boomy