Sage Audio

The Cochlea

The ear doesn’t process sound in a linear way - in other words, what goes in, isn’t what goes to the auditory nerve or what’s processed and interpreted.

Everything about the ear’s structure, from the pinna of the outer ear that directs the sound, to the ear canal that resonates at 3.5kHz., to the tiny bones called ossicles that amplify the sound by about 30dB, alters the input.

To conceptualize this, think of a compressor with a 2:1 ratio - once the input is processed by the compressor, the output no longer matches the input; in other words, it’s no longer a linear or 1:1 transfer. The ear doesn’t behave like a compressor with a 2:1 ratio, but you get the idea.

One of the most impactful ways it alters the input is through the shape and structure of the Cochlea.

If we examine the cochlea, we’ll notice a spiral-like structure. Inside this spiral structure is the basilar membrane; embedded in the membrane are roughly 10,000 hairs called cilia.

Each one of these hairs can vibrate up to 500 times per second, creating a tiny electrical signal with each movement - meaning they can encode up to about 500Hz. When a sound source is greater than 500Hz, multiple hair cells operate in sequence to encode the info.

The inside of the cochlea, or the centermost point of the spiral, processes low information. As the spiral unwinds, higher frequency info is encoded. This is due to the relative stiffness of the aforementioned basilar membrane.

Closer toward the cochlea’s center, the membrane is loose. It gradually increases in stiffness as it moves away from the cochlea’s center.

So, that’s a lot of info, but what does this mean for music production? In short, we’re much better at processing low-frequency information than high-frequency information - a concept that’s going to have big implications throughout the rest of the video.

Because a hair cell can encode up to 500Hz, it takes fewer cells to recreate it accurately. This doesn’t mean it’s louder; it means we’re able to identify pitch differences better at low frequencies than high ones.

Due to this limitation, we can differentiate pitch best below 500Hz, and do it relatively well up to 4kHz. Above 4kHz, pitch is a trickier thing to determine, which leads to large bandwidths between what we perceive as octaves.

For example, in the lows, the difference between the notes A2 and A3 is 110Hz.

But the difference between A6 and A7 is about 1760Hz. And this gap between perceived octaves becomes greater the higher the frequencies.

This explains why we equalize logarithmically. If you look at an EQ, the Q value of a filter stays the same regardless of the frequency, at least for the plugins I’ve come across. But notice that the bandwidth changes greatly.

In the lows, an octave Q value may cover a hundred to a couple of hundred frequencies. In the highs, that same Q value covers a thousand to thousands of frequencies.

This also helps us understand the accuracy of an EQ’s filter. For example, say I set a high Q value for a bell filter and attempt to attenuate a resonance.

In the lows, I could center this band on a frequency and attenuate it along with a few Hz in either direction.

In the highs, this same filter would attenuate a larger range, because again, the Q stays the same, but the bandwidth increases the higher the frequency.

This may seem like an issue, but it’s really not - since we’re less sensitive to the pitch of high frequencies, this inaccuracy in the highs doesn’t present a problem. Our ears are less sensitive to pitch variation in the highs, so it makes sense that an EQ would mirror this logarithmic structure with its processing.

The implications of logarithmic hearing will become a lot more practical and less conceptual in the next chapter, but for now, let’s observe what we’ve been discussing.

I’ll keep the Q value set to 1 octave and solo the bandwidth. In the lows, the bandwidth will include a smaller range. In the highs, it’ll include a larger range; however, notice how we only perceive one octave in each example.

Again, this is due to lessened pitch sensitivity in the high frequency range due to the cilia’s vibrational limitations, the shape of the cochlea, and the relative stiffness of the basilar membrane.

Watch the video to learn more >

Critical Bands and Implications for Masking

An FFT processor separates frequencies into windows. For example, if an FFT analyzer uses 1024 points, then each window is about 1Hz - meaning the processor can be accurate within about 1Hz.

Our ears do the same thing; however, they’re a lot less accurate.

Instead of thousands of bands like an EQ, the cochlea can handle about 24.

This means that frequencies which are close together often mask, cover up, or combine to create modulation.

Although this is definitely a limitation, if we understand these bands, we can adjust our mixes, productions, masters, etc, to greatly improve the clarity.

This will be a lot of info, but let’s look at this in depth.

Here’s the list of all measured critical bands, also called barks, in our hearing.

Watch the video to learn more >

For example, bark 4 has a center frequency of 350Hz.

The bandwidth extends from 300Hz to 400Hz. Meaning the bandwidth is 100Hz.

So, if I were to play a tone at say 310Hz, and one at 330Hz, it would be difficult for us to distinguish between the 2.

Now, as I covered in the first section of the video, we’re better at differentiating pitch between low frequencies than high.

As the frequencies become higher, so too does the bandwidth of the critical band.

So, whereas a 20Hz difference is difficult to notice in the lows, a 200 or 300Hz difference is difficult to notice in the highs.

Just to illustrate this, notice how modulation occurs between 2 tones that are close in frequency. It should sound like beating or pulsing.

Watch the video to learn more >

Using Psychoacoustics to Improve Our Productions

First, we can ensure that the kick and bass are separated by at least an octave. If the kick’s fundamental and bass’s fundamentals overlap too heavily, differentiating them becomes a lot more difficult.

Their fundamentals will be too close, causing masking and unwanted modulation. Also, their overtones or harmonics will be too close, resulting in the same problem.

The same goes for instruments with fundamentals that are higher in frequency, like a vocal and a guitar or a synth.

Next, we can be more intentional with our EQ filters.

Fortunately, EQs spread out the frequencies logarithmically, as we covered earlier. However, if we keep the center frequency of the band as close as possible to the center frequency of the critical band, while controlling the bandwidth, we can reduce masking.

This is a complex idea to explain, so let’s look at an example - and if at any point you say forget it, I’ll just use my ears, I totally understand.

Say I have the note A5 in my mix, or 880Hz, and I want to amplify it. Now, I know there’s a critical band that’s centered at 840Hz, and it has a bandwidth of 150Hz.

So, if I use a moderately narrow filter on 880Hz, ideally a 1/3rd octave band, I’ll keep the amplification primarily within the critical band. More importantly, I’ll amplify the note A5 more than any other frequency within the critical band.

As a result, the desired note is amplified, and the frequencies that could mask it or cause unwanted modulation are lower in amplitude relative to it.

If I were to use a larger bandwidth, the frequencies that could mask A5 would be amplified by the same amount as A5. Subsequently, they would mask A5. Again, 2 tones within a single critical band will interfere with one another due to our hearing’s limitations.

You might be wondering, why not just use a super-narrow filter set on A5?

If I were to use a very narrow filter, the phase rotation needed to create such a narrow filter would amplify the areas around the center frequency. By amplifying frequencies close to A5, we could create a beating effect - with the caveat to this being if we used linear phase processing to avoid this phase rotation.

Just to show I’m not just being too particular or making stuff up, this researcher created a graph showing the relationship between critical bands and 1/3rd octave filters. Although not perfect, they align with critical bands as closely as possible.

All-in-all, this isn’t something you need to be super concerned about on a regular basis.

Odds are your productions will sound great if you make the best sounding choices; but if you’re attempting to minimize masking and you have a complex production, try equalizing with critical bands in mind while using 1/3rd octave filters.

Let’s take a quick listen to a complex mix. I’ll use these filters within critical bands and amplify in-key aspects of the mix.

Watch the video to learn more >

Interesting Effects of the Outer Ear

I saved this section for last since I think it’s the best known concept - I’m sure most of you have heard of the Fletcher Munson curve, also known as the equal loudness contour.

By this point, you’re familiar with roughly 2- 5kHz being the range to which we’re most sensitive.

It’s also the area that contains the 3rd vocal formant - which has a strong combination of vowel and consonant info, making this a great range for speech and vocal intelligibility.

If you combine the ideas from the previous chapter with this one, then critical bands 14-19 are the ones to emphasize if you need additional clarity.

I’d recommend finding a couple of notes that are in-key with the song that fall within this range, and then amplifying a band centered on those frequencies.

You’ll add clarity, reduce masking, and emphasize in-key elements.

The ear canal acts like a resonator for this frequency range, mainly for 3.5kHz, due to the canal’s diameter.

Since 3.5kHz is right on the boundary of our ability to determine pitch accurately, given the limitations of the hair cells, it’s a great area to emphasize in-key elements.

It’s right where pitch is hard to determine, so giving the listener some extra help by boosting in-key aspects of an instrument, bus, mix, etc, will help them identify the pitch more easily, in turn making the recording sound more musical.

If you stuck around this long, thank you for watching - before we go, let’s listen to in-key elements amplified within critical bands 14-19.

Notice that we increase clarity and make the mix sound more musical.

Watch the video to learn more >

Psychoacoustics Pt.2 - the Logarithmic Ear

The Cochlea

Critical Bands and Implications for Masking

Using Psychoacoustics to Improve Our Productions

Interesting Effects of the Outer Ear