What’s the importance of voice in communication? It seems like a simple question with a simple answer: We communicate through speech, and the voice makes speech.
That’s true, but it doesn’t tell the whole story. In fact, the voice is (at least) two things: an audible medium and a representation of the speaker’s identity. To communicate both ideas and identity, the human voice carries three types of information:
- Linguistic information.This is the content of the speech: words and sentences.
- Paralinguistic information. These are elements of speech beyond language. The speaker introduces these factors more or less intentionally. Paralinguistic traits include facial expressions, body language, pitch, speaking speed, and volume.
- Nonlinguistic information. These are signals the speaker cannot control, including indications of age, gender, health status, and emotional states.
These last two types of voice information reveal identity. For example, in crime investigations, specialists can extract this information from a suspect’s voice, determining features like gender, age, place of origin, body shape, and even occupation. When you hear a voice, your brain paints a picture—a portrait, even. And for brands, the quality of that portrait has real-world business effects.
Looking to scale your brand’s outreach with the power of voice? We can help. Learn more about custom branded synthetic voices from ReadSpeaker AI.
The Importance of Pitch and Tone in Communication for Brands
In 1995, researchers studied the effect of voice characteristics on sales volume for 21 direct sales people. Their conclusion? “How a sales message is communicated may be as important as what is communicated with respect to output sales performance.” This study found that people who spoke a bit faster, with shorter pauses—and who included more falling pitches in their speech—sold more products.
While that’s just one study in one specific context, it does suggest a link between pitch variability and direct business results. But pitch variability isn’t the only vocal trait that affects customer experiences. A complex combination of paralinguistic and nonlinguistic traits add up to a speaker’s tone of voice, with clear indications of mood and intent. As we’ve pointed out before, you can’t underestimate the importance of tone of voice for communication with customers.
Another strong factor in the customer experience of voice is recognizability, the listener’s association of a specific voice with a familiar individual persona. We know speakers—and brands—through their voices. That’s why an original, unique TTS voice is essential for brand interactions in digital voice channels. SoundHound, Inc’s Andrew Richards warns brands against reliance on generic TTS voices. “Don’t let the default TTS voice define who you are as a company,” Richards says. “You’ll risk sounding like everyone else, and creating a mismatch between how you want to be perceived and how users actually perceive your brand.”
We began this section with a discussion of voice’s role in sales, but careful use of voice in all your customer conversations affects far more than the sales team. It can also significantly broaden your audience, particularly when you provide a text-to-speech (TTS) option for all written communication.
Voice as an Accessibility Tool for Communication
To get a clearer picture of the importance of speech in communication, consider the alternative: writing. To read text, you have to see it. For the 253 million people with blindness or moderate-to-severe vision impairments around the globe, text may be inaccessible unless translated into speech. Braille devices can also help, but not all people with blindness or vision impairments read braille. The commonly cited statistic is that the braille literacy rate in the U.S. is just 10%. This number has been challenged, but no one suggests a braille literacy rate of anywhere near 100%, leaving many users in need of a TTS tool to consume online and printed content.
Screen readers—software that translates digital text into machine speech—remove barriers for people without vision impairments, too. In a 2021 survey, 3.2% of screen reader users said they had cognitive or learning disabilities, 2.4% said they had motor disabilities, and 7.7% said they didn’t have a disability of any kind. That last number isn’t surprising if you think about it; when you read an article, you’re stuck staring at a page or a screen. When TTS reads it to you, you’re free to cook dinner, drive a car, or take a walk. In every case, TTS takes written communication to a broader audience. But for the greatest success in this crucial task, make sure you provide TTS voices users will enjoy—like top-quality neural TTS voices from ReadSpeaker AI.
But what does it mean to call a TTS voice “high-quality?” Does the “high-quality” TTS voice generate language that’s easier to understand? Does it sound more “natural” and less “robotic?” Is the highest-quality TTS voice the one that sounds most human? The answer depends, of course, upon context.
Measuring Quality in TTS Voices
Researchers have proposed a variety of quality measures for synthetic speech, some subjective, some objective, and covering different areas of focus, such as intelligibility or naturalness. One of the most common measures of synthesized speech quality is the mean opinion score (MOS), an average of listener quality ratings on a scale of 1 to 5. Speech scientists derive these scores by polling large groups of listeners on voice quality and averaging the result. The MOS range between 3.6 and 4.0 designates a voice that, in testing, left “some users satisfied.” A score between 4.0 and 4.3 indicates general test user satisfaction, while scores from 4.3 to 5.0 reflect testers who said they were “very satisfied.” Incidentally, voice-over-IP (VOIP) calls typically fall into the 3.5 to 4.2 MOS range. ReadSpeaker AI’s TTS voices regularly score in the 4.0 MOS range and higher.
Another way to determine TTS quality is to compare the synthetic speech to high-definition recordings of human speakers. That’s just what ReadSpeaker AI did in a recent public listening test. Many listeners couldn’t distinguish between ReadSpeaker AI TTS and a human speaker. Overall, ReadSpeaker’s neural TTS voices scored just 0.2 points lower in MOS compared to human speech recordings.
Given the importance of voice in communication—and the growth of commercial voice channels, including voice commerce—it pays to invest in the best TTS voices available. Start by reaching out to ReadSpeaker AI today.