More than half of the adults in the U.S. own a smart speaker, a smart home device, or both. Digital voice assistants like Amazon Alexa, Apple Siri, and even in-car personas like Hey Mercedes are expected to be active in more than 8 billion units globally by 2024. In other words, conversational artificial intelligence has gone mainstream. But anyone trying to understand this technology immediately wades into an alphabet soup of specialties, with near-identical terms like NLP and NLU adding to the confusion. This article explains the difference—and, more importantly, the lack of difference—between NLU and NLP.
Want to know more about conversational AI and the growing Internet of Voice? Contact the text-to-speech experts at ReadSpeaker.
Natural language processing (NLP) and natural language understanding (NLU) may sound similar, but they’re two distinct technologies—despite being closely related. It’s not a question of “NLU vs. NLP,” as if they’re opposites or even competitors. In fact, NLU is a subset of NLP. All NLU is NLP, but not all NLP is NLU. But we’re getting ahead of ourselves. Let’s back up and define our terms.
What is NLP? What is NLU? How about “natural language” itself?
Both NLP and NLU share the phrase “natural language,” so we’ll start there. Here’s what natural language means to computational linguists, the scientists who study the intersection of computing and human language:
- Natural language describes the way humans use words to communicate. It’s distinct from formal language, such as computer programming languages. In a formal language, communication occurs through strict rules and structured data. Natural language is only held together by loose (and often broken) grammar rules without structurally limited vocabularies. As a result, two speakers may communicate the same idea in completely different words; you can say one thing in many ways, as this very sentence illustrates.
Starting with the concept of natural language, we can then ask the questions:
What is NLP?
- Natural language processing, or NLP, is the use of artificial intelligence to analyze natural language and do something useful with it. That useful task may be recognizing, categorizing, generating, or understanding natural language.
What is NLU?
- Natural language understanding, or NLU, is just one subfield of NLP. It is the computational task of extracting relevant data from raw natural language. Essentially, NLU takes the unstructured data of natural language and organizes it into something a computer can use for further action.
The difference between NLU and NLP is one of focus: NLP includes manipulation of data extracted from language, as well as the extraction itself, and even the composition of original text that reads like a human writer (which is called natural language generation, or NLG). But NLU is only concerned with translating natural language into terms with which a computer can actually work. Real-world examples of these technologies may further clarify the relationship.
What are some examples of NLU and NLP in action?
Take the example of a text-based AI chatbot like IBM’s Watson Assistant webchat integration. Each conversation begins with the user writing a request using natural language. (This may be prompted by the chatbot, but for our purposes, it’s best to start at the moment the user enters text.) At that point, the conversational AI system proceeds through the following steps:
- An NLU module extracts useful data from the user’s text. For instance, if you write, “It’s my wife’s birthday next week, and I’d really like to take her on a cruise,” the NLU system can recognize that the intent is to book some time aboard a cruise ship.
- A dialogue management system processes the intent and suggests a potential outcome. It also keeps track of each step in the conversation, ensuring that the system doesn’t repeat itself or offer suggestions out of order.
- The NLG module takes the relevant outcome and composes a response that sounds like a person could have written it: “What date would you like your cruise to begin?” to continue the above example.
These steps repeat until the dialogue manager recognizes that the conversation has concluded. In this example, natural language understanding and generation work together, blending NLU with broader NLP processes.
So what about NLP tasks that don’t involve NLU?
A clear example of NLP without NLU is the case of machine translation, as through Google Translate. The system recognizes natural language, but it doesn’t have to extract features to provide a coherent translation. In that sense, it’s a natural language processing task that does not rely on natural language understanding.
That’s just one of the NLP operations that don’t incorporate NLU. “Part-of-speech tagging” is the process of labeling words as nouns, verbs, and all the other parts of speech you may remember from language classes. This is another example of NLP without NLU; the tagged speech may later play into an NLU process, but the system does not yet extract features from the language. It’s just organizing and tagging words.
The important point here is that the conversational AI systems behind marketing and customer service chatbots use many types of NLP, including NLU. And things really get exciting when you add voice capabilities to these systems, turning chatbots into voicebots.
How do voicebots use NLP and NLU?
In our earlier example, we discussed a text-based conversational user interface—but consumers often prefer voice-based communication channels. If you’ve interacted with a smart speaker or had a conversation with an AI assistant, you’ve experienced natural language processing with an added dimension we call speech processing. Where NLP deals with text, speech processing handles spoken language—both transcribing human speech and generating synthetic voices.
Voice conversational AI systems begin with a speech-to-text module, a technology often called automatic speech recognition or ASR. The ASR system transcribes the user’s natural speech into text—at which point it can continue through the steps outlined in our above description of the chatbot.
After the AI composes an appropriate response using NLG, a voicebot proceeds to the other side of speech processing: text to speech (TTS). ReadSpeaker’s TTS engines integrate into broader conversational systems to turn NLG-composed text into lifelike spoken utterances. Voice conversational AI is a collection of technologies: NLP, including NLU, dialogue management, and NLG; plus ASR and TTS.
How else is artificial intelligence involved in voicebots?
All of these capabilities descend from artificial intelligence, including the development of all-original TTS voices. At ReadSpeaker, we create synthetic voices using deep neural networks (DNN), which are computational models that work a bit like a human brain, with decentralized networks of processors that send signals in orchestrated patterns—and learn the shortest path from input to desired output through training.
That capability is producing increasingly realistic TTS voices, just in time for the proliferation of conversational AI systems. Deep neural networks are also leading to advanced TTS capabilities like cross-lingual voices, dynamic speaking styles, and even emotional augmentation—features that reduce the audible gap between human speech and synthetic speech to virtually nil. (Listen to examples of these features on the ReadSpeaker TTS voice demo.)
At last, we can make sense of the alphabet soup. These initialisms are the components of the conversational user interfaces that are transforming the way we interact with our machines: ASR, NLU and NLP (not, remember, “NLU vs. NLP); dialog management; NLG; and, finally, neural TTS from ReadSpeaker.