If you’ve ever heard a choir sing, you’ve encountered this effect. In short, the multiple signals cause constructive interference, making the vocal sound more in tune, complex, and more enjoyable overall.
The chorus effect relies on a few other psychoacoustic concepts - first is the precedence effect.
If 2 melodic signals are identical, one signal can be delayed by up to 40 milliseconds before the listener can hear it as a separate signal.
The second effect at play is our perception of pitch - we hear separate notes as being 6% apart. In other words, if one performer sings A#, and another sings 6% lower, then the second performer is singing the note A.
If the pitch variation between singers is between roughly 0-3%, we’ll perceive them as singing the same note - even if one of the singers is slightly sharp or flat. The more singers perform, the closer their average is to the intended frequency.
Recording vocal doubles is a popular way to achieve this effect.
But if you don’t have doubles, you can use a doubling plugin to emulate it. Again, keep the pitch variation for each emulated double between 0-3% of the note’s frequency, and ensure that the delay isn’t greater than 40ms.
If the plugin uses cents as a measurement, know that there are 100 cents between 2 semitones. We can alter the cents up to 50 in either direction before it begins to sound out of tune.
Watch the video to learn more >
I notice this one gives people the most trouble. The most prominent masking frequency occurs between 250 and 350Hz.
In other words, the area around 250Hz covers up or masks many frequencies below and many frequencies above it.
If you attenuate this frequency on your vocals, you’ll notice a huge improvement in its clarity.
If you center a band around 275Hz and dip it by a couple of dB, the high mids and highs will come through much easier.
Additionally, if you add this same filter to an instrument or instruments that are competing with the vocal, it’ll help significantly.
Our perception of loudness is closely tied to the concept of a masking frequency. The equal-loudness contour gives us a good idea of which frequencies we’re most sensitive to.
This dip between 2-4kHz indicates that most people hear 2-4kHz as the loudest frequency. The graph is somewhat counterintuitive, so I’ll invert it to show which frequencies are easiest for us to hear and which need to be amplified significantly to be perceived at the same loudness as others.
I bring this up because 250-350Hz heavily masks 2-4kHz. Because we naturally anticipate 2-4kHz to be the most prominent range, especially when hearing vocals or speech, any attenuation to this range is incredibly noticeable and somewhat off-putting.
That’s why I recommend both attenuating 250-350Hz and amplifying 2kHz-4Khz to create a prominent and clear vocal.
(Watch the video to learn more >
3 main variables help us determine the location of a sound source - they are:
AmplitudeFrequency - primarily the high frequenciesTiming - or small variations between the sound’s arrival between our left and right ears
Most panning only uses amplitude - for example, our panpots lower the amplitude of one channel to make the sound seem like it’s coming from the other direction.
So, say I want a BGV to come from the left; by moving the panpot to the left, I attenuate the right channel. This makes sense—if a sound comes from our left, the left ear would hear it as being louder than our right ear.
Following this example, that is, the sound source being on the left, our left ear would hear the full frequency spectrum; however, our head would be in the way of the sound source and our right ear.
As the sound hits the head, the weakest frequencies, that is, the highest frequencies, will be blocked, absorbed, and diffused. As a result, our right ear will hear the same signal but with attenuated high frequencies. We can emulate this with an EQ that allows for independent left/right bands.
Attenuating highs with a high-shelf filter placed on the right channel mimics the effect that the listener’s head has on the high frequencies.
Lastly, sound takes time to travel. It’ll have to travel roughly 1ft as it wraps around the head to get to the opposite ear. If the speed of sound is 1140 feet per second, then the time it takes to reach the opposite ear is about 1.14ms.
With a sample delay plugin, we could enter this value or a value as close as possible to 1.14ms for the right channel. This emulates the signal taking 1.14 ms longer to hit the right ear.
Quick side note: the temperature of air plays a role in the speed of sound due to the density of air molecules. So, if we wanted to get very precise, we could emulate a cold or hot listening environment by altering the delay time.
So let’s listen to these 3 effects being introduced to our BGVs to create much more realistic panning.
Watch the video to learn more >
Lastly, let’s cover what I think is the most important concept when mixing vocals or mixing in general, which is our
It’s no surprise that if something is quiet enough, we can’t hear it.
But masking makes this idea more interesting. Since sounds cover up other sounds, we have to mix our vocals in the context of everything else to get a good understanding of what can and cannot be heard.
One of the first things to get masked and fall below our threshold for hearing is a vocal’s quieter details.
This is why compression is such a popular effect for vocals - after reducing peaks, we can amplify the vocal without it clipping. When we amplify the compressed vocal, all of the quieter details that were being masked and falling below our threshold for hearing become loud enough to actually hear.
If you rely solely on peak-down compressors to accomplish this, their effect on the peaks may be too aggressive. So, combine peak-down compression with post-compression makeup gain and a subsequent maximizer plugin to bring up these details without unwanted alterations to the vocal’s timbre.
Let’s listen to the vocal in the context of the mix to first hear how masking plays a part. Then, we’ll introduce the compressor with make-up gain. Lastly, we’ll enable a maximizer plugin to further amplify these quiet details. Notice that even when the gain changes are compensated for, the vocal sounds fuller and more detailed since these quieter parts no longer fall below our threshold for hearing.