Distinguishing between the relevant and irrelevant bits of a conversation is a good life skill in general, but for voice assistants like Amazon’s Alexa, it’s indispensable. In order to respond appropriately to what’s being said — about anything from the weather to a nearby restaurant or a package in transit — they need to know whether the subject at hand is beyond their knowledge scope.
Researchers at Amazon tackled the problem with a natural language understanding (NLU) system that simultaneously recognizes in-domain (known) and out-of-domain (unknown) topics. The results will be presented at this year’s Interspeech conference in Hyderabad, India in early September.
“Sometimes … an Alexa customer might say something that doesn’t fit into any domain,” Yong-Bum Kim, a scientist within Amazon’s Alexa team and a lead author on the paper, wrote in a blog post. “It may be an honest request for a service that doesn’t exist yet, or it might be a case of the customer’s thinking out loud: ‘Oh wait, that’s not what I wanted.’ If a natural-language-understanding (NLU) system tries to assign a domain to an out-of-domain utterance, the result is likely to be a nonsensical response.”
The team began by assembling two datasets comprising utterances (i.e., voice commands): one covering 21 different domains and the other sampled from 1,500 frequently used Alexa skills.
When it came to choosing a model, they settled on a bidirectional long short-term memory (Bi-LSTM) architecture that (1) factored in the order in which the utterances were received and (2) considered the data sequences both forward and backward. They fed it both “word-level” and “character-level” information — specifically embeddings, or points in a 100-dimensional space that represent words — and the words’ constituent characters
The neural network produced a vector summary of useful individual character features, which the team combined with the aforementioned embeddings before passing them to a second Bi-LSTM. This one learned to recognize the summary of the entire input.
On average, the researchers’ system improved classification accuracy by 6 percent for a given target. And they achieved dramatically better results when they trained the system on the 21-domain dataset: 90.4 percent accuracy compared to the existing system’s 83.7 percent.
“By using a training mechanism that iteratively attempts to optimize the trade-off between those two goals, we significantly improve on the performance of a system that features a separately trained domain classifier and out-of-domain classifier,” Kim wrote. “[The] domain classification makes … determinations [such as the actions that a customer wants executed] much more efficient … by narrowing the range of possible interpretations.”