“Reference resolution” is a considerable challenge in natural language processing — in the context of AI assistants like Alexa, it entails correctly associating a word like “their” in the utterance like “play their latest album” with a given musician. Scientists at Amazon have previously addressed it by tapping AI that maps correspondences between variables used by different services, but these mappings tend to be application-specific and not particularly scalable.
That’s why now, researchers at the Seattle company are actively exploring a technique that rewrites commands in natural language by substituting names and other data for references (for instance, rewriting “Play their latest album” as “Play Imagine Dragons’ latest album”). Given a word of an input sequence, their contextual query rewrite engine adds a word to an ouput sequence according to probabilities computed by the machine learning algorithm.
They describe it in a paper (“Scaling Multi-Domain Dialogue State Tracking via Query Reformulation”) that’s scheduled to be presented at the North American chapter of the Association for Computational Linguistics.
“Because our rewrite engine learns general principles of reference, it doesn’t depend on any application-specific information, so it doesn’t require retraining when we expand Alexa’s capabilities,” explained Arit Gupta, a speech scientist in the Alexa AI group. He pointed out that additionally, it frees backend code from worrying about referring expressions, and it enables training data to be annotated by native language speakers who lack knowledge of Alexa’s internal nomenclature.
The team’s model susses out the intent — or the action that a user wants performed — and assigns the individual words to slots (variables such as “ArtistName”) that are used to identify items to be retrieved. It takes as input the words of the current utterance and the words of several prior dialogue rounds in addition to the intent classification of each turn and the slot tags for all words, and it replaces individual words with generic classifiers that complement the slot tags, such as “ENTITYU1” for the first entity named by the user.
With each new utterance, the dialogue is encoded and the system begins to rewrite the latest command one word at a time. And for each word, it decides whether to generate a new word from a list of commonly occurring words or to copy a word from the dialogue history.
As Gupta and colleagues point out, this allows the system to generalize much more effectively during training, and it ensures that its attention remains on the words’ syntactic and semantic roles. Toward these ends, in experiments involving an in-house data set, the researchers’ approach improved the F1 score — a measure of both false-positive and false-negative rates — by 22% when a term in the current utterance referred to a term in the most recent response, and by 25% when a term in the current utterance referred to a term in the previous utterance.
The dialogue corpora, which was assembled by asking Mechanical Turk annotators to replace referring terms in a Stanford University data set rewritten with their referents, is available in open source.