Beyond Sound: Elevating Speech Intelligibility In Commercial AV

September 27, 2023 | Sam Scott


Equipment manufacturers aren't telling you the whole story. They make it seem so easy.

“Put our sound bar on the wall for an all-in-one 20-person conference room!”

But if it looks too simple to be true, it probably is.

When manufacturers of conference room AV equipment share product information, they typically say that their speaker is optimized for listening from a certain distance. They will state, for example, that listeners sitting 15 feet away will be able to hear clearly.

These claims misrepresent the user experience because they oversimplify a complex problem – the need for people not just to hear sounds, but to cognitively perceive and recognize the words being spoken.

When collaborating, it's not enough to simply hear. Both parties must understand.

Hearing a sound and understanding what's being communicated are two fundamentally different things. In this article, I will explore that difference in depth.

Speech Intelligibility: Beyond Hearing, Understanding Clearly

Speech intelligibility is the capacity for spoken language to be interpreted accurately by listeners. It extends beyond the mere audibility of words and considers the recipients’ ability to grasp the meaning conveyed within a verbal message.

As a cornerstone of clear communication, speech intelligibility fosters the effective exchange of information, ideas and emotions — all of which can be lost in translation when sounds are heard, but intelligibility is lacking.

Measuring Speech Intelligibility With The Speech Intelligibility Index (STI)

Speech intelligibility can be measured using the "STI.” STI formally stands for speech transmission index but is more commonly called the speech intelligibility index. It is a measure of how many words are understood from a sample of synthesized human speech.

To measure STI, an audio test signal is played through a sound system under evaluation. This signal is re-recorded with a room microphone and compared against the original playback signal. It is then assigned an STI value between 0 and 1, indicating how much speech was considered to be intelligible (more on this later). 0 means nothing was understood, and 1 means the speech was perfectly clear and understandable.

Here's a breakdown of the STI values and their quality ratings according to IEC 60268-16, the international standard for rating speech intelligibility via STI:

STI ValueQuality Rating
0 - 0.3Bad
0.3 - 0.45Poor
0.45 - 0.6Fair
0.6 - 0.75Good
0.75 - 1Excellent


Understanding The STI Measurement

Now, how might a piece of audio testing equipment determine whether a sound recording should be considered intelligible to human ears?

Acoustical speech analysis has revealed two critical features of human speech:

  1. It falls within the frequency range of approximately 100 Hz to 10 kHz.

  2. It modulates slowly, typically between 0.63 and 12.5 Hz.

These modulations play a pivotal role in transmitting intelligible language. When the modulations are lost — even partially — speech intelligibility suffers.

As mentioned previously, measuring STI involves comparing versions of an audio test signal from before and after it is played through a sound system under evaluation. An STI-CIS Speech Intelligibility Meter will assess how much loss of these specific modulations the re-recorded signal has incurred and assign an STI value accordingly.


For more detailed technical information on how audiovisual consultants and technicians fine-tune AV systems to deliver exceptional speech clarity, see this article from NTI Audio on STI and STIPA measurement processes.


STIPA: Ensuring Clear Communication In Public Spaces

In environments like train stations, airports, stadiums, and universities, the intelligibility of emergency announcements is paramount. Individuals must be able to comprehend the critical information broadcast over public address (PA) systems to ensure their safety.

Conducting a Full STI analysis for such large environments can be incredibly time-intensive, requiring various test signals and hundreds of 15-minute measurements to be taken throughout the facility. So, a simplified method of STI measurement was developed specifically for PA systems: the Speech Transmission Index for Public Address (STIPA) method.

STIPA requires just one test signal of modulated pink noise to be used. Each measurement can be taken in 15 seconds, and STIPA’s performance is comparable to Full STI measurement in public environments.


What Factors Affect Speech Intelligibility In Commercial AV Systems?

Whether you’re working with a stadium-sized PA or a small Microsoft Teams Room, several common factors influence speech intelligibility:

Background Noise: Background noise is a major issue. Any other sounds present in the listening environment will compete with the speech signal and make it harder for listeners to discern the intended message.

Common examples include nearby street and construction noise, air conditioning and HVAC systems, office equipment such as photocopiers and computer fans, and nearby conversations or gatherings.

Reverberation: Reverberation (or “reverb”) is another significant concern. Reverb is a type of echo that occurs when sound waves bounce off of surfaces within the listening environment, causing noise to persist after the initial source has stopped.

While reverb can have a pleasant, spacious and ambient effect in other contexts, it is detrimental to speech intelligibility. As various elongated sounds bounce around and blend together, listeners will experience listening fatigue as they attempt to disentangle reverberant reflections from the primary speech signal in their minds.

Speaker Placement: Speaker placement plays a crucial role in speech intelligibility. Optimal placement can ensure that direct sound reaches the listeners’ ears before any surface reflections, mitigating the effects of reverb. When direct sound dominates, listeners don’t need to expend so much cognitive effort separating speech from noise.

Further, strategic speaker positioning will provide even coverage throughout the listening area, increasing the room’s capacity to facilitate effective communication.

Frequency Response: Every sound we hear comprises various sound wave frequencies. And while human speech can produce frequencies between ~100 Hz and 10kHz, the critical frequency range for speech intelligibility is ~500 Hz to 4 kHz. This is where the sounds of vowels and most consonants reside.

To amplify speech clearly, an audio system must prioritize speech reproduction within this frequency range.

Speaker Sensitivity and Amplification Quality: Having a well-placed speaker with optimal frequency response won’t do you much good if that speaker isn’t loud enough to deliver direct sound to the listener. It is vital to ensure that selected AV equipment matches the use case and environment.

Speaker sensitivity is measured in decibels (dB) and indicates how loudly a speaker will play given its input power level. An amplifier is what delivers that power to the speaker — it can be located either inside the speaker enclosure as part of one unit (a “powered speaker”) or externally in an equipment rack that feeds amplified signal to a “passive speaker” via speaker cable.

Human Listener: Often, AV systems are biased toward perfect hearers, while it’s estimated that 19% of adults have hearing loss in the speech-frequency range. For these listeners, the effects of these speech intelligibility factors are magnified: direct sound from speech is less easily perceived and disentangled from reverb and noise.

These one-in-five listeners could use your support. Consider the following best practices for optimizing speech intelligibility in your facility. Additionally, you can enhance your AV system with assistive listening capabilities.

For the 1-in-5 Canadian adults with hearing loss, speech intelligibility can be a major barrier to participation.

Best Practices For Crafting An Intelligible AV System

As should be clear by now, an audiovisual system cannot ensure clarity of speech through technology alone. Here is how you can address the various technological and environmental factors that influence speech intelligibility:

Reduce Background Noise: Specify HVAC systems with low noise and isolate the listening environment from disruptive sounds. Steps can be taken to minimize external building or hallway noise by using appropriate architectural materials and window coverings, reducing structural-borne vibration, and soundproofing walls and windows.

Control Reverberation: Employ acoustic treatments like wall panels, ceiling panels, curtains, and furniture to manage reverberation.

Optimal Speaker Placement: Aim for even audio distribution to cover the listening area. To avoid excessive reverb, speakers should be directed at listeners and not reflective surfaces.

Microphone Selection and Directivity: Select microphones designed to prioritize the human voice that won’t pick up every keyboard, fan and squeaky chair. Proper podium, tabletop, lavalier and handheld microphones have polar patterns suited to their intended application that will capture and reject sounds from certain directions.

Speech-Specific Frequency Response: Emphasize the critical frequency range of human speech, which is 500 Hz to 4 kHz, and minimize the reproduction of deep bass. Select high-quality speakers to achieve this. Further adjustments can be made inside the DSP.

Digital Signal Processing (DSP) Setup: Select appropriate audio processing equipment and configure (or “tune”) it for the room's acoustics. Echo cancellation must be introduced to eliminate spurious echoes — but not so much that it mutes useful speech. Frequency equalization may be used to optimize speech frequencies further.

Consultation and Commissioning: Work with AV consultants and contractors to refine the system. This process is a combination of art and science. An AV consultant will typically work with the installation contractor during commissioning to fine-tune the DSP settings for the environment in which it’s installed.



Speech intelligibility is fundamental to effective communication. An AV system that merely conveys sound isn't sufficient; it must facilitate understanding.

By considering factors such as background noise, reverberation, speaker placement, microphone selection, and frequency response, you can design and implement an AV solution that prioritizes speech clarity. Remember, it's not just about hearing – it's about comprehending, collaborating and connecting through crystal-clear communication.

If you’d like help specifying the right equipment, coordinating with your construction stakeholders or fine-tuning your new AV system, please don’t hesitate to contact us today.