Perfecting Vocals on Modern Playback Systems

The Importance of Mono Compatibility

Earbuds, laptops, and car speakers are mainly stereo - they can still play the left and right channels independently of one another.

Most phone speakers and portable speakers sum the left and right channels to mono.

As a result, a lot of the details between the left and right channels are lost, mainly the side image, which is the difference between the left and right channels.

Most main vocals are mono, so you don’t need to be too concerned; however, some of the details of your vocal effects, like reverb, delay, micro-pitch shifting, or chorusing, will be lost when played through these speakers.

With that in mind, make sure some of the vocal’s effects stay centered - if too much of the vocal occupies the side image, its loudness will be significantly reduced when converted to mono.

To hear how your vocal’s stereo effects will translate to portable speakers, introduce them with sends. Then insert the free plugin MSED to mute the side image.

This will let you monitor how that effect will translate to mono devices.

This doesn’t need to apply to just vocals - try it on any stereo instrument to get a better idea of its mono compatibility.

Let’s listen to a vocal I’ve designed to have a lot of info in the side image. It sounds good with both the mid and side images, but notice what how much is lost when the side image is muted.

Watch the video to learn more >

Increased Importance of Sibilance Control

There are 2 main ways earbuds contribute to sibilance.

First, the frequency response of earbuds emphasizes the range. For example, here are the left and right channels of Apple’s 3rd Generation AirPods.

Notice how much the high mids and highs are amplified.

This prioritizes the vocal’s 3rd formant, which is great for increasing vocal intelligibility; however, the range also extends to sibilance, making it harsh if it isn’t controlled.

The second way earbuds emphasize sibilance is by bypassing the pinna of the outer ear.

The natural diffusion and shaping of the outer ear is lost, and of course, any reflections and diffusion of the room are completely left out of the listening experience.

This causes the sound to seem like it’s internal or coming from inside the listener, not from some external source.

So, whereas a room, the interior of a car, or any environment could diffuse and absorb some of these highs, any sibilance perceived through earbuds is delivered directly to the listener.

With that in mind, be sure to use a de-esser to control sibilance. Alternatively, you can edit the sibilance with an FFT editor, which offers a lot more control than a dynamics processor.

To demonstrate how sibilance will be perceived through earbuds, let’s listen to the final frequency response of the track shaped with a curve that emulates earbuds.

Notice how aggressive sibilance becomes without any intervention. Then, notice how a de-esser brings it to a tolerable level.

Watch the video to learn more >

The Lack of Harmonic Distortion

When you think of a clean-sounding vocal, you don’t typically consider harmonic distortion as a way to achieve it.

However, harmonic distortion plays an incredibly important role in our perception of a musical sound.

A vocal will always have overtones - some will be harmonics or related to the fundamental frequency. Others won’t relate to the fundamental and will more often sound unmusical due to their lack of a relationship to the musical content.

I bring this up since most modern playback systems have low levels of harmonic distortion.

For example, Apple Earbuds have very low distortion levels, which is often portrayed as a good thing.

But there’s a reason various amplifiers are more popular than others—each imparts a unique harmonic formation. The ones that have the most musical effect become the most popular regardless of their age or if they’re seen as antiquated.

Even modern studio monitors have higher levels of distortion than most earbuds.

A classic tube amplifier will introduce a second-order harmonic, which will always be in-key with the performance due to its exact doubling of the fundamental.

As it relates to vocals, these in-key harmonics improve the relationship between musical and unmusical overtones, causing a fuller, more complex, and impressive sound.

If the playback system isn’t causing this, then it’s on us to include the needed harmonics to create a musical sound.

A saturator causes harmonic distortion according to the highest amplitude signal, or the peak-level. Typically this is the fundamental, but if you want to make sure you could use a multi band saturator and isolate the saturation to the fundamental.

This will avoid intermodulation distortion and improve the relationship between in-key and out-of-key overtones.

Let’s take a listen to harmonic distortion formed from the fundamental and notice how it doesn’t sound distorted, just fuller and more musical.

Watch the video to learn more >

AirPods don’t have any Air

Most modern playback systems lack air frequencies - or the range above 12kHz.

Whereas the easy-to-hear sibilance range right below the air range is accentuated, this more delicate, difficult-to-distingue range is relatively quiet.

For the AirPods, this range starts to dip significantly around 11kHz and above.

When the adaptive EQ function is enabled, and there doesn’t seem to be a way to turn it off, this range is dipped by roughly 10dBSPL.**

For something like JBL’s portable Flip speaker, the range is a little better, but still drops by a couple dB before cutting out above 17kHz.

Laptop speakers are, generally, the only modern playback system with an amplified high range.

So, what does this mean for vocals?

Unless you anticipate your track being played primarily on laptop speakers, there’s good reason to amplify the air frequencies.

If the AirPods can’t replicate the highs by 5dBSPL, compared to a linear frequency response, and 10dBSPL when the adaptive EQ function is enabled, then boosting the vocal’s range above 10kHz is a reasonable decision if you want it to translate well.

However, there’s a problem.

If the vocal, or the overall mix, if that’s what you’re affecting, has significant noise in this region, then boosting these frequencies will inevitably raise the perceived noise floor.

If you plan on amplifying the vocal’s air range to compensate for AirPods and portable speakers, you need first to ensure you have a low-noise recording.

Then, make sure no noise effects are being introduced. Waves Audio is notorious for this, with their odd choice to enable noise by default in many of their plugins.

Additionally, remember that any noise will be significantly amplified during the mastering process. When quieter details of the recording are pushed forward, noise will inevitably be raised as well.

If all else fails, and you’re still noticing a lot of noise and re-recording isn’t an option, RX is an excellent platform for surgically removing any noise.

Once the vocal is ready to have its highs amplified, an exciter is a better option than just an EQ. Fresh Air by Slate Digital is free and combines exciting or high-frequency harmonics with shelf filters. The harmonics will lower the perceived level of the noise floor through masking.

It’s a simple suggestion, but it has big implications for how your music is perceived by the majority of your listeners.

So, I’ll emulate how the air frequencies will be affected using an EQ. I’ll assume you’re using a relatively flat monitoring system, but if you’re already listening over earbuds, this demo won’t accurately depict the change.

Then, I’ll use an exciter to amplify the vocal's air. Notice how the amplification brings the overall frequency response to a more uniform level.

Watch the video to learn more >