Audio Mixing to Improve the Clarity of Speech

This article was first published in the June, 2010, issue of
Larry’s Monthly Final Cut Studio Newsletter.

Source

One of the signs of getting older is that our hearing is not as sharp as it once was. (Yes, that includes me, too, and I’m still frustrated about it.) So one of the things I do in my mixes is to be sure that I make things as clear and easy to understand as possible.

But first, a bit of audio theory.

VOLUME AND FREQUENCY

We frequently think about the loudness of a sound making it understandable. But that is only part of the solution.

All sound is composed of two elements: volume and frequency. The volume determines how loud a sound is, the frequency determines the pitch.

NOTE: Volume is also describe as “levels” and “gain.” Your choice.

When someone speaks, their voice contains lots of different volumes and frequencies. You can easily prove this to yourself by trying to make words using just a tone generator, like the one in Bars and Tone. You will get a series of beeps, but no words.

Human speech is “bursty.” This means that each syllable is a short burst of sound. These are the puffs we see when looking at waveforms in Final Cut’s Viewer or Timeline. Singing is more “steady-state.”


Here, for example, is a waveform of me saying: “to learn more about our weekly podcasts, visit DigitalProductionBuzz dot com.”

Each syllable is its own puff. The two puffs at the end are “dot com”.

But looking at a waveform only shows the volume of the sound. We need to look at a frequency chart to see the distribution of frequencies.


Happily, Soundtrack Pro provides this as part of an Audio File project. Here is the spectrum analysis for the words “dot com”. “Dot” is on the left and “com” is on the right.

This chart represents the frequency range and characteristics of human hearing.

  • Blue represents the frequencies at which there is no sound at that point in time
  • Pale blue represents frequencies with some level of sound
  • Green represents frequencies with a moderate level of sound
  • Dark green represents frequencies with a significant level of sound
  • Yellow represents frequencies with a high level of sound

Notice how the bulk of the sound is in the lower frequencies — less than 500 cycles? Well, this is partly due to the fact that I’m a guy. Girls have somewhat higher voices, but a surprising amount of audio is in low frequencies for both guys and girls.

Notice, also, that I have almost no frequencies above about 5 kHz. Girls would max out around 8 kHz. Remember that point, we’ll come back to it.

FREQUENCY MEANS DIFFERENT THINGS


Human hearing ranges from 20 cycles (so deep as to feel more like a vibration than a pitch) to 20,000 cycles (so high as to feel more like the wind than a tone); assuming we are all about 18 years old with average hearing for that age. As we get older, our hearing declines, which I have already grumped about.

NOTE: Two other thoughts about getting older. First, we lose high frequency hearing first. Second, guys start to lose higher frequencies before girls do.

Human speech, however, is more restricted in frequencies. Speech ranges from roughly 150 cycles to 6,000 cycles for men, and 350 cycles to 8,000 cycles for women. Kids are slightly higher yet. And, there is plenty of individual variation.

That which adds richness, tone, and “sexiness” to a voice are the lower frequencies. (This is why many radio DJ’s try to pitch their voice as deep as they can.)

Vowels live in the low frequencies.

Consonants, however, live in the high frequencies. And it is consonants that provide diction to speech.

NOTE: Someone who mumbles has almost no high-frequencies in their voice, which is why they are so hard to understand. It isn’t just volume, it’s pitch.

I read somewhere that the difference between the letter “F” and “S” is 6,100 cycles for a guy and 8,000 cycles for a girl. In other words, clarity is in the high frequencies.

This last statement gives us guidance on how to approach mixing projects for, ah, older folks. We need to boost the higher frequencies to improve intelligibility.

ADDING EQ TO THE MIX

One of the reasons I like mixing in Soundtrack Pro is the high-quality and precise filters that it contains, coupled with a sophisticated interface. Final Cut’s filters can’t begin to compete.

While it is true that every voice is different and that you should never use the same preset for everyone, I’m going to give you a couple of settings you can use as starting points to improve your own mixes.


In Soundtrack, unlike Final Cut, filters are applied to the track, not the clip. So, select a track by clicking on it. Here, I clicked the track Larry Intro.


Then, click the Effects tab to select it; the default location is in the Left Pane.

The left side of the window shows filter categories. Click EQ.

The right side of the window shows filters. Either double-click the filter named Fat EQ or highlight it and click the Plus button.


The filter interface that appears has been known to scare small children. However, it isn’t as bad as all that.

As I said, human hearing extends from 20 cycles, on the left, to 20,000 cycles, on the right. The Fat EQ filter separates this into five bands, going from left to right:

  • Frequencies below human speech
  • Low-frequency human speech
  • Mid-range human speech
  • High-frequency human speech
  • Frequencies above human speech

While this filter has a LOT of flexibility, I want to concentrate on two things:

  • Making a voice sound warmer
  • Making a voice easier to understand

The warmth of a voice is in the lower frequencies — Band 2. As we increase this setting, we make a voice more inviting. As we decrease this setting, we make a voice more sterile.


The Fat EQ filter allows us to increase or decrease specific ranges of frequencies. This allows us to change portions of the sound without changing all of it. This is very similar to color correction, where we can change the color of the shadows while not changing the color of highlights.

To vary the amount of the change, drag the circular wheel up or down.

NOTE: When we change frequencies, we want to make small changes. We are not creating mountains here, we are creating molehills!


To vary the frequencies we are changing, grab the frequency number and drag it up or down. (You can also double-click it to enter a specific number.)

Notice that we are not changing a specific frequency. We are changing a range of frequencies around a central point. When we work with audio, we are always working with ranges, not specific frequencies.

Now that you know how to make changes to the filter, let’s look at some specific settings that we can use to improve our sound.

GENERAL SETTINGS FOR A MALE VOICE

NOTE: Before the emails start flying, allow me to state, again, that every voice is different. Use these as guides, then adjust until the voice sounds good to your ears.


I recently changed my opinions on where to set my presets and I’m continuing to refine them. However, to warm up the low-end of a voice, I’ll add +3 dB of gain around 170 cycles. Then, to improve clarity, I’ll add +4.5 dB of gain around 3500 cycles.

Finally, I’ll drop the Master Gain by -1 dB to help prevent distortion caused by boosting frequencies.

GENERAL SETTINGS FOR A FEMALE VOICE

I do much less work with female voices, but when I do, here is where I start.


To warm up the low-end of a voice, I’ll add +3 dB of gain around 390 cycles. Then, to improve clarity, I’ll add +4.5 dB of gain around 5500 cycles.

Finally, I’ll drop the Master Gain by -1 dB to help prevent distortion caused by boosting frequencies.

SUMMARY

By boosting the high-frequencies a bit, I add sparkle and clarity to the voice to help make sure that what my actors are saying is intelligible to the audience. Even us older folks…!


BIG NOTE If you, like me, use the Limiter filter to help even out levels, be sure to apply the Limiter filter LAST, so it is at the bottom of the filter stack. Otherwise, the EQ filter is likely to distort your audio.