How to Get More Apparent Voice Audio Power with Less Amplifier Power!

UPDATED slightly 1/3/2011.

Much of the audio spectral content of the human voice is in frequencies below 600 Hz, but most of the intelligibility is above 800 Hz.

This reminded me of a little thing I heard of in a ham radio book 25-30 years ago. According to this book, much of the spectral content of the human voice is usually in two distinct ranges known as "formants". The first formant is generally frequencies around 500-800 Hz and the second formant is generally frequencies from 1200 - 1500 Hz on up to 2500 Hz. A bandwidth compression trick in some ham equipment involved eliminating the frequencies between the formants and shifting downward (via frequency conversion) the second formant. In the receiver, this is reversed.

UPDATE 7/6/2008: The Wikipedia article on "formants" says something that I consider "fairly similar", but with some notation of some vowel sounds having second formant content as low as 800 Hz, and some notation of first formant being as high as 1000 Hz, and giving importance to the first formant.

UPDATE 1/3/2011: My experience is that the 2nd formant and higher frequencies are sufficient to carry intelligibility of a human voice.

In my own experience, I have known frequencies below about 600 Hz to have low importance for the intelligibility of the human voice. I thought for many years that one could filter out the lower frequencies and not lose much except maybe some pleasing tonal effects.

But recently (as of maybe 2002 or so) I did a spectral analysis of a brief test digital recording of a male voice, and I was amazed by what I saw. Those formants were distinctly visible. There was a lot below 700 Hz and a lesser amount between 700-1400 Hz, a bit more between 1400-3000 Hz than between 700-1400 Hz, and much less above 3000 Hz.

With the under-700 range not being very important for intelligibility, it looks like everything below 700 Hz can be filtered out with only small damage to recognition and intelligibility while removing a lot of audio power.

Of course, this spectral analysis was of a single sample of a single male voice. Other voices will probably differ in the frequency range of the second formant and the degree to which there is a second formant that contains all of the essential voice intelligibility.

UPDATE - I tried some more of this. I have seen the second formant go as low as 750 Hz with significant content. I did analyze a few different male singers, including ones singing in baritone range.

Note that there are audio spectral components of consonant sounds that are significant at frequencies higher than the second formant.

UPDATE 7/6/2008 - I have noticed sometimes a "third formant" that appears to me to have some importance for sensation of accent. However, when it exists, it is at frequencies generally 2000-3600 Hz, still low enough to be included when frequency range allows good consonant intelligibility.

So what does this mean?

You may have noticed how many voice public address loudspeakers and bullhorns sound bad and muddy or muddy-squawky. The really bad ones are usually reentrant horns which are a kind of folded horn. The folded horn is normally bad for reproducing high frequencies. However, a non-folded horn that will reproduce the low frequencies of the first formant while using "usual horn drivers" is normally impractically long and large. Reentrant horns also often have skimpy mouth area for their horn cutoff frequencies, while non-folded 1 inch and 2 inch horns normally have adequate mouth area for their often-higher horn cutoff frequencies. Inadequate mouth area results in resonant effects in the air column in the horn.

But if you only use the frequencies above 800 Hz (which will include enough of the second formant), you don't need a folded horn. Non-folded "1-inch" and "2-inch" (throat diameter) horns and their associated mid-high frequency drivers work just fine at these frequencies, and are as efficient as practical horn loudspeakers get and have a much clearer sound than reentrant horns such as bullhorns. And if frequencies below 800 Hz are filtered out, the amplifier power requirement is a lot less - maybe up to 70 percent less.

Although you will lose bits of the second formant by eliminating frequencies below 800 Hz, you will still improve things by using non-rentrant horns.

Note also that the usual 2-inch and many 1-inch non-rentrant horns have better defined directional characteristics than usual for reentrant horns. This is mostly because the reentrant horns work at lower frequencies than the non-reentrant horns.

I still have more testing to do, but it really looks as if voice-only PA systems will be more effective and less expensive if they are designed with elimination of frequencies below 600-800 Hz in mind.

NOTE - many 1-inch and 2-inch drivers made for mid-tweeter use are very intolerant of frequencies below about 500-800 Hz. Ones with metallic surrounds normally require crossovers or highpass filters of at least 12 dB/octave. You may need 18 dB/octave filtering at 800 Hz to get full power handling. Most non-folded 2 inch horns require you to exclude frequencies below 500 Hz and many 1-inch non-folded horns require you to exclude frequencies below 800 Hz with at least a 12 dB/octave rolloff.

The drivers for these horns can have excessive diaphragm movement at frequencies below the horn cutoff frequency even if they have nonmetallic surrounds.

NOTE: Amplifier clipping sometimes results in DC surges from the amplifier and sometimes results in significant low frequency spectral content so significant amplifier clipping should be avoided.

Other possible applications:

Single sideband supressed carrier radio transmissions don't require as much bandwidth if frequencies below 800 Hz are eliminated. Eliminating 800 Hz of bandwidth usage on the carrier side of the band will let you include more of the higher voice frequencies where there is intelligibility and consonant sounds. Or, one may try such filtering just to achieve reduced bandwidth usage by SSB transmissions.

Parabolic microphones and other highly directional microphones may be filtered to cut out frequencies below 800 Hz or so, since they are less directional at these lower frequencies.


Written by Don Klipstein.

Please read my Copyright and authorship info.
Please read my Disclaimer.