Our AI-Generated Music Identification journey
From music classification to AI music models' footprints, we unveil the paths followed in the...
As a brief reminder, "Spatial Audio" encompasses a range of audio playback technologies designed to replicate real-world sound experiences in three dimensions. In contrast to any stereo mixes with only left and right channels, where audio sources are limited to a small scene in front of the listener, spatial audio configurations aim to envelop us from all directions, providing a heightened sense of immersion.
As soon as we went from mono output to two-channel stereo sound (with modern stereophonic technology invention in the 1930s), the ambition to come closer to the natural listening experience has always been the next step.
But it is not until the early eighties with the home video industry development and more truly the mid-nineties with the LaserDisc and DVDs that came surround sound. Starting with 5.1 setup, it added front/back dimension, thus covering all the horizontal plane, let alone the low-frequency channel routed to a subwoofer (the “.1”). Soon followed by 7.1, reinforcing left/right dimension.
Remember, it is commonly accepted (as far as industry experts) that we can speak of “spatial” when sound is powered in 3 dimensions (left/right, front/back, and up/down, or differently said angle, distance, and elevation). Well, here it came, this third dimension was introduced by the 7.1.4 setup (adding different levels of heights for the sound sources), with the top four speakers added.
Three representations are possible when it comes to spatial audio mix:
Not your usual few steps process.
For a track to be compliant with a spatial audio standard must be decided at the very early stages of the production. It requires recording and mixing the composition with the spatial dimension in mind. In this context each track or instrument has the potential to occupy its own distinct space within a three-dimensional sound field, adhering to the specifications outlined by the original composer or mixer.
And there’s more as the Dolby Atmos format is concerned:
On the production side, the sound engineers monitor their mix by listening over speakers and headphones using the Atmos renderer*. With a DAW (digital audio workstation) they can modify audio settings to control the final mix. Then they export the mixing session as an Atmos BWF Master file to be sent to music streaming services and download stores.
(*) Atmos renderer converts the channel and object based mix into a channel based stream according to the user audio device
As one can easily figure, these additional production steps involve two scarce resources: time and money. As spatial audio encompasses several industry formats (Dolby Atmos, Sony 360 Reality Audio, Ambisonics, the future IAMF), each requiring its own process.
You need to book a studio equipped with the appropriate software and gear, along with a skilled sound engineer. Depending on the latter availability and the studio planning, it could take as long as 3 weeks to spatialize an album.
As time is money, budget will follow. Even more to hire a certified Dolby Atmos sound engineer, which proves a highly advisable move, Dolby Atmos being the spatial standard elected by the biggest streaming platform offering spatial audio to their clients (Apple Music). If we take the album as a unit it will cost around $6,000 and up to $600 for a one-shot single track. Can independent artists or record labels afford spatial audio in these conditions? Clearly no.
Besides, crafting spatial tracks that truly exude pristine sound quality can be exceedingly challenging, particularly when working from stems, which is the conventional approach these days. An engineer doing a spatial version in studio will usually manipulate exported stems not always designed for a spatial dimension. It often leads to a deceptive experience for the listener, sounding artificial, especially when played back on headphones.
This is why we’re convinced that an alternative, seamless, affordable and device agnostic spatialization process for record labels and artists will remove these barriers and foster spatial audio as the go-to format for listeners, on whatever device.
And what about an old track or all the back catalogs worldwide, obviously not recorded in spatial audio and even without stems available in the first place? Do we resolve to keep them locked in the stereo age?
The spatialization process from the stereo master file presents the advantage to stick to the artist’s original vision, mixed and mastered all the way through in a stereo context.
Basically, a stereo file includes 2 channels intended for 2 speakers. With that said, the lazy way would consist of just sending the same signal on whatever number of speakers. While you will be surrounded by X same signals, it will definitely not result as an immersive experience.
Actually it’s all about routing to each channel and thus to each speaker a dedicated information, to create the spatial landscape. That’s the principle we need to build, with different paths to go through, basic and more innovative ones.
So far, we came up with two ways of processing the stereo file up to spatial format: (remind the “up” word for later)
What triggered our vision for enabling spatial audio from a stereo file is the famous IRCAM lab’s software suite for spatialization called SPAT. And through it the decisive step to setting up a commercial offshoot of the groundbreaking research lab: Ircam Amplify. The SPAT technology relies on a psycho-acoustic description of how our ears listen in a spatial environment, instead of the traditional physical-geometrical approach. It made us take another route than the usual spatialization process aiming to distribute the signal on a restricted number of speakers. We could take down physical barriers, remove walls, creating a continuous non-reverberating space, as if we were positioned in the middle of the desert.
From the stereo, we create a sound bubble around the listener. It is composed of a multitude of particles made up of the original stereo, therefore originating from the same source. However, each one will play a different role by performing its own part; some will define the stereo scene more precisely, while others will enhance the immersive effect by applying the same psychoacoustic phenomena derived from SPAT.
It’s downmixing time
As a result the listener will be surrounded by a continuity of immersive elements. Finally, to make the content compliant with industry formats like Atmos, we downmix it to distribute it between the physical speakers and the objects.
An immersive effect from stereo you won’t find anywhere else.
We even added personalization settings per genre, because we’re music fans first and foremost and not all genres are equal as spatialization is concerned.
All (most, depending on the personalization we apply) the channels and objects receive data for rendering, like this:
👉 Sign up and give it a try through the “Tasks” feature on your user dashboard
Think we're on the same wavelength?