Banks have had IVR systems for several years, even decades, now. Although the goal has always been to improve customer experiences and reduce operational costs, IVRs have gained a reputation as sources of frustration and customer dissatisfaction. Now, advances in conversational AI are providing organizations the opportunity to realize the promise of automated customer contact centers by creating more opportunities for brands to personalize conversations and expand use cases to provide a greater variety of interactions between the voice assistant and the customer.
The implementation of voice AI in the customer center is making conversations easier by eliminating the need to respond to a lengthy menu of numbered options and creating frictionless experiences that improve customer satisfaction—instead of detracting from it. Like many other industries, the banking and financial industry is creatively using conversational AI to enhance, innovate, and improve its customer service strategy.
As consumer demand for fast and efficient resolution of their requests increases, brands are laser-focused on the customer journey and ensuring satisfaction at every touchpoint. Research from Dimension Data reveals that 84% of businesses that focused on enhancing the customer experience reported an increase in revenue, and 92% reported an increase in customer loyalty.
84% of businesses that focused on enhancing the customer experience reported an increase in revenue.
Banking and finance institutions that are still relying on legacy IVR systems, type and swipe mobile apps, and brick and mortar locations to provide all their customer service functions, should start considering the benefits of text-to-speech, natural-sounding voice AI, and custom voice assistants to meet the rising demands of customers for 24/7 access and rapid resolutions.
Creating a natural-sounding TTS voice
Text-to-speech is an important element of the conversational AI experience that begins with understanding what the user is saying using Automatic Speech Recognition (ASR) and then returns a result through Natural Language Understanding (NLU). How the result is delivered, including the sound, tone, and intonation, is the function of the TTS engine. In conversational language, often it’s not what we say, but how we say it that gives meaning to our speech. The quality of the TTS voice is key to transforming a voice assistant from a robotic disembodied voice to a human-like agent that demonstrates empathy and kindness.
The process for creating a TTS voice requires a combination of good voice talent and well-constructed neural networks. When making a TTS voice, audio is collected from a professional speaker, and then a neural network framework is used to train acoustic models. These models are trained with both audio and linguistic data that tells what sounds are being pronounced and which words or syllables are being emphasized for each sentence. All that information is then taken in by the neural network which performs a series of computations to predict what the audio should sound like.
Once the neural network has trained the model, it can predict the corresponding acoustic features and then convert those to audio. With such a complex process, the quality of data is crucial. A good voice talent has a friendly sounding voice, is expressive, and can take directions on how to pronounce certain words. When selecting a voice actor, it’s essential that they can read a whole sentence without stuttering or having to reset mid-sentence. Neural networks can’t process fragmented or sloppy material, so it has to be clean and clear.
A good voice talent has a friendly sounding voice, is expressive, and can take directions on how to pronounce certain words.
It’s important to note that implementing a TTS is a process that requires ongoing iterations and companies should have a system in place for continuous improvement. Customer service is never stagnant. Businesses change as new products are introduced and the outside world experiences changes with political, geophysical, and societal events. Through a process of constant iteration, companies can update their systems to include the vocabulary and responses required to remain relevant and responsive to customer requests.
In the banking and finance industry, customer security and the need for data privacy have been the drivers behind the development of custom voice assistants with unique sounding voices that fit with the sonic branding guidelines and communicate brand identity. Partnering with the right TTS voice provider is helping these companies create voices for their IVRs and mobile apps that are becoming as familiar to their customers as the sight of their logos.
Setting the tone of your voice assistant
Another important element of a TTS is being able to recognize intonation. A study by Dr. Mehrabian on verbal and nonverbal messages revealed that only 7% of communication is verbal with 38% accounting for the tone of voice. There’s a huge difference between saying “hello”, whispering, “hello?”, and screaming, “HELLO!” Even though it’s the same word, there’s a different experience and feeling based on each varied tone.
Only 7% of communication is verbal with 38% accounting for the tone of voice.
Dr. Mehrabian Study
Having a TTS voice that can replicate tones is key to enhancing the human-to-machine experience. People expect to interact with an agent or a voice assistant in the same way and can be put off when the response doesn’t match the context of their interaction. For instance, we might expect different responses from the TTS based on if we just paid off debt and are elated versus if we were charged something by accident and are frustrated. The right inflection and tone make a difference in human communication, and a responsive TTS voice must be able to classify different renditions of the same word, synthesize them, and codify the different emotions or intent of the messages.
Usually, with text-to-speech, one sentence is synthesized at a time, but with conversational AI systems, the ASR and dialog can remember context and eliminate the need to repeat information. Context-awareness allows a conversational exchange between the user and the voice assistant and makes it easier for the voice AI to respond with relevant information based on the context of things discussed earlier in the conversation. When customers experience empathetic and accurate responses that result from natural conversations, they’re more inclined to believe that they’re being cared for by the bank.
Custom voice assistants in banking
It’s no longer enough to have a conversational AI that sounds like everyone else’s voice assistant. As more brands enter the voice AI arena, there is a growing need for customized voices that match sonic branding guidelines. Custom voice assistants, or branded voices, vocal identities, and synthetic voices, are becoming important elements of sonic branding efforts.
It’s no longer enough to just have a conversational AI, but there needs to be a customized voice and sonic branding as well.
When considering a custom voice TTS for a conversational AI, banking and financial institutions aren’t limiting themselves to having only a chatbot, voice skill, website, or mobile app. Instead, leading institutions are building omnichannel voice experiences. These voices are becoming part of the corporate identity and their sound, gender, and ethnic identity are becoming part of the conversation around diversity and inclusion.
Currently, most voice assistants tend to sound young and female, perpetuating gender roles and reinforcing gender bias. With a custom TTS voice, companies can choose voices that represent different ethnic backgrounds, ages, and genders. Recently, the concept of implementing nonbinary voices has become a popular topic of conversation in the voice AI community.
For brands seeking a sonic identity through their custom TTS, it’s about choosing the voice that best represents their values and mission and one that delivers the most pleasing customer experience. Partnering with an established TTS provider with expertise and an extensive library of voices is a good first step for companies looking to match their brand with a custom TTS voice.
Customer service is an essential element of any bank and financial institution’s strategy, and conversational AI can help improve the customer experience and increase revenue. Organizations that currently don’t have a strategy in place for a customized voice assistant should consider the benefits before their customers open accounts with voice-enabled competitors.
Recently, I had the honor to speak with Keri Roberts, Brand Evangelist, ReadSpeaker, as part of a series, “AI in Banking Finance—Now and Into the Future.” During that videocast, we discussed how the brands and people in the banking and finance industries are creatively using conversational AI—one of the fastest-growing areas of technology. If you want to take a deeper dive into this topic, you can listen and watch the show in its entirety here.
Esther Klabbers is a speech scientist at Readspeaker.ai and has been in the text-to-speech field for 25 years with a Ph.D. from Eindhoven University in the Netherlands. Connect with her on LinkedIn and Twitter or read about Readspeaker.ai’s TTS work on their website.