Localization is all about a sound source’s placement. Is the sound coming from the left, right, the front, behind, and so on.
Humans are exceptionally and innately talented at sound localization. We rely on 4 primary factors to determine the placement of a sound source.
These are amplitude, timing or relative delay between ears, frequency, and lastly, the sound’s interaction with the environment.
Amplitude-based localization is your basic panning. Whenever you use a panpot, you’re taking advantage of this psychoacoustic effect.
If the amplitude is higher in the left speaker than the right speaker, we perceive the sound as being on the left, and vice versa.
Panning doesn’t amplify anything, it turns down the opposite channel. So when I pan to the right, the left channel’s amplitude is reduced. It’s a simple trick, but obviously it works well.
The next major cue is the timing difference between the left and right ears. Whereas a DAW typically has built-in amplitude panning, this is something you’ll need a stereo delay to achieve.
If the sound source is coming from the left, then it’ll arrive at the right ear about 3ms later. Like amplitude, this gives us the impression that a sound is coming from the more immediate side.
And as you can imagine, both amplitude and delay panning can be combined to create a more realistic impression of placement.
Next comes the frequency. Say the sound source is on the left again. To reach the right ear, it needs to bend around the head. In the process, some of the weaker frequencies, which will be the high frequencies, are absorbed and diffused.
So in this instance, the right ear will receive a signal with slightly attenuated highs. Since we’re so good at locating a sound, even this super subtle change is enough to affect placement. It can be combined with amplitude and delay panning for a realistic effect; however, even if I take a stereo signal and reduce the highs on either the left or right channel, it’ll pan it.
The combination of these 3 will make your panning more realistic than in most mixes, but reverb ties everything together.
If reverb occurs before amplitude, delay, and frequency-based panning, it’ll create the most realistic panning possible, since the sound interacts with the environment before it reaches our ears, not the other way around.
The minute variations and reflections give the sound a complex space to interact with- and just like all of the other cues, humans are talented at estimating a space’s size and shape, and a sound source’s placement within that space, by hearing the sound alone.
So let’s listen to the first 3 effects enabled one by one. Notice that with each change, the placement becomes increasingly realistic. Then, I’ll enable a reverb plugin to give the sound source an environment.
Feel free to use all of these psychoacoustic methods for panning, or combine the ones that you enjoy the most or feel is most convenient.
Watch the video to learn more >
You may have heard of the cocktail party effect. When multiple people are in a room having separate conversations, most of us can tune into one conversation yet have difficulty understanding multiple conversations at once.
The study of this effect is called auditory scene analysis, and it has a big impact on mixing with intention.
There are 5 main ways in which we group sound sources, and conversely, determine if sound sources are distinct.
These include harmonic content, Onset or when the sounds begin, frequency co-modulation or if the sources change frequency in a similar manner, amplitude co-modulation or if the sources change amplitude in a similar manner, and source location - so everything we covered in the first section of this video.
If multiple sound sources share these cues, we’ll perceive them as belonging to the same source.
This is the concept behind doubling - for example, if I double a guitar or vocal, the 2 signals are different enough not the be identical or perfectly in phase with one another.
However, they’re similar enough in all of the aforementioned cues to be perceived as coming from the same source.
Now that’s the obvious example, but I believe you could take any one of these cues and find ways to create cohesion or separation between instruments.
Let’s say I have 2 guitar parts - playing the same melody but one octave apart.
Already they’ll be grouped due to harmonic content, onset, frequency and likely amplitude co-modulation.
So, how could I help differentiate the 2 performances?
At this point the only way I can make them sound separate is by having distinct localization for each. If I kept these panned center, the listener, especially a casual music listener, will hear one instrument or group.
If I use the methods discussed in the pervious chapter - that is amplitude, delay, frequency, and environment to create distinct localization cues for each part, it’s a lot more likely a listener will perceive them as distinct sound sources.
The same could be done, but with innately distinct instruments.
Say I have a vocal melody and a vocal counter-melody. Let’s say their harmonic content is more different than similar, and the same could be said about the frequency and amplitude modulation.
Also, say the parts are recorded by 2 very different singers, each tracking in very different sounding environments.
How could we make the listener group these 2 sources together when they differ so greatly?
Compressing the vocal takes similarly would cause amplitude co-modulation.
De-essing would attenuate high frequency transients, which would help alleviate onset differences.
Additionally, if we apply EQ, we can cause their responses to sound more similar, although not perfectly.
If we saturate both using identical saturation settings, this should help slightly, since they’re distinct melodies should cause different harmonics; however, if the saturation is aggressive enough, this should cause a good deal of overlap
Lastly, we could send both to the same delay or reverb, as well as pan both to the same location.
In the end, they won’t sound as similar as a double of the same vocal, but we’ve greatly increased the likelihood that they’d be grouped.
As you can imagine, you can use any combination of these strategies to either group or separate performances - it could be used practically like what I showed here, or creatively.
The important thing is to remember these 5 cues so that you can use them however you see fit.
So let’s listen to the guitar example. I’ll show 2 very similar guitar parts, separated by an octave.
First, I’ll use the 5 cues to make them sound as identical as possible. Then, I’ll use the 5 cues to make them sound as distinct as possible.