3 Practical Ways to Improve Your Vocal Production Right Now

Achieving Super Clean HP Filtering

In the last video we did, we talked about how EQ introduces significant distortion to a signal, but in short, you can use an editor like Izotope RX to introduce aggressive filters without introducing this distortion.

Whereas an EQ uses filters that are easier on your computer so that we can use them in real-time, RX and other editors use FFT filters when processing, or thousands of divisions in the audio.

Processing these divisions results in much more accurate filtering, which lowers distortion in the long run.

Say I want to cut out the low end of the vocal up to the fundamental frequency. I could use a high pass filter via an EQ, or I could use RX, highlight the range below the fundamental, attenuate it, and then import this cleaned-up version into the session for subsequent processing.

It’s a little tedious for sure, but once you get the hang of it, this is an awesome, way to clean up a vocal while introducing less distortion than a traditional eq.

Let’s listen to a vocal processed with identical gain changes - one with EQ and one with RX editing, and notice the significant difference in clarity.

Watch the video to learn more >

Amplifying Formants for Intelligibility

If you want to improve a vocal’s intelligibility, vocal formants are a good thing to understand.

They’re clusters of frequencies that determine the perception of a particular vowel.

Each vowel has three formants: 1 in the low mid-range, 1 in the mid-range, and 1 in the high-mid range.

For example, the average frequency of formants for the /a/ sound, like in the word cat, is 730Hz, 1090Hz, and 2440Hz for males.

For females, it’s 850Hz, 1220Hz, and 2810Hz, so it’s a little higher in frequency.

Additionally, 3kHz and the range around it is perceived as being louder than other ranges due to the shape of our ears, the circumference of our ears drums, etc.

If we amplify the range around the 3rd formant, we can improve both intelligibility and make the vocal sound louder, increasing the perception of overall clarity.

I’ve suggested amplifying this range on vocals a few times, but I hope this solidifies that these things aren’t completely subjective - there’s a solid argument behind why amplifying 3kHz improves clarity.

So, let’s listen to a vocal, then the same vocal with 3kHz amplified by 3dB by gain compensated so their peaks match, and notice how the latter sounds clearer.

Watch the video to learn more >

Compression, Transients, and Intelligibility pt.1

A few videos back, I showed some graphics illustrating how compressors affect dynamics and a song's rhythm. Let’s apply these ideas to vocals.

The attack time of a compressor isn’t instant; it gradually attenuates the signal over the full period of its set time.

Meanwhile, the release only initiates after the signal falls below the threshold - meaning if we don’t set it properly, other transients can become entangled and compressed, even if they wouldn’t have caused compression if processed independently.

In short, if we want the initial transient of a vowel or consonant to cut through, we need to set the attack slightly longer, around 30ms. Meanwhile, we should set the release as short as possible without causing distortion, so around 50ms.

We can then balance the level of the transient and the level of the compressed signal by adjusting the wet gain. I could increase the wet gain to bring the vocal forward while leaving the initial transient intact. Or, I could decrease the wet gain and compensate with the output gain to emphasize the transient.

Let’s take a listen to these settings before we move into the 2nd half of this idea.

Watch the video to learn more >

Compression, Transients, and Intelligibility pt.2

If we combine this compression concept with the ideas about formants, we have a practical way to increase vocal intelligibility by using parallel processing.

I’ll set up a send from the vocal and insert a linear phase EQ and a compressor on the corresponding aux track.

With the EQ, I’ll isolate the third formant, roughly 2-5Khz, and then compress it with the settings I just described. That is a roughly 20ms attack, 50ms release, and the wet signal blended in until it maximizes intelligibility.

With these settings, the transient comes through, helping with pronunciation. Meanwhile, the 3rd formant’s quieter details are amplified, reducing the effect of masking, and helping the listener better understand the vowel.

Then, I can blend the aux track with the vocal until it sounds balanced, with a little extra clarity and intelligibility.

Let’s listen to the effect being blended in.

Watch the video to learn more >