11.8 C
New York
Friday, October 30, 2020
Home Trends Amazon researchers boost Alexa’s ability to understand complex commands

Amazon researchers boost Alexa’s ability to understand complex commands

Amazon’s Alexa is becoming more proficient at understanding multistep requests in one shot. In a paper (“Parsing Coordination for Spoken Language Understanding”) and accompanying blog post published this morning, Sanchit Agarwal, an applied scientist in the Alexa AI organization, detailed a spoken-language understanding (SLU) system that maps voice commands to actions (intents) and entities (slots) 26 percent more accurately than off-the-shelf alternatives.

Agarwal and colleagues’ work will be presented at the upcoming IEEE Spoken Language Technology conference in Athens, Greece later this month. News of their research comes a day after Amazon scientists described an AI-driven method that can cut Alexa’s skill selection error rate by 40 percent.

“Narrow [SLU systems] usually have rigid constraints, such as allowing only one intent to be associated with an utterance and only one value to be associated with a slot type,” he wrote. “We [propose] a way to enable SLU systems to understand compound entities and intents.”

As Agarwal explained, he and colleagues used a deep neural network — layers of mathematical functions called neurons, loosely modeled on their biological equivalents — that was “taught” from structures in spoken-language data. First, a corpus was labeled according to a scheme indicating groups of words, or “chunks,” that should be treated as ensembles: “B” to indicate the beginning of a chunk, “I” to indicate the inside of a chunk, or “O” to indicate a word that lies outside a chunk. Then, prior to training, the words underwent embedding, a process that involved substituting vectors to represent them.

READ  WeChat e-wallet teams up with Line to target Japan’s 7M Chinese tourists

The embeddings were next passed to a bidirectional long-short-term memory (bi-LSTM) network, a type of recurrent neural network capable of learning long-term dependencies, which output a contextual embedding of each word in the input sentence. Those outputs were combined with a neural network layer that mapped each embedding to a distribution over the output “B,” “I,” and “O” labels, classifying each word of the input according to its most probable output label.

READ  The nation-state of the internet

An additional layer, known as a conditional-random-field, or CRF, learned to associate between the output labels and choose the most likely labels from all possible sequences. Thanks to a technique called adversarial training — during which the network was evaluated on how well or poorly it predicted the labels — the model learned to generalize.

“Instead of building separate parsers for different slot types (such as ListItem, FoodItem, Appliance, etc.), we built one parser that can handle multiple slot types,” Agarwal said. “For example, our parser can successfully identify [list items] in the utterance ‘add apples peanut butter and jelly to my list’ and [appliances] in the utterance ‘turn on the living room light and kitchen light’.”

Source: VentureBeat

To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

- Advertisment -

Most Popular

E dukan on rent- your business, your brand. Take your dukan online! 

A year ago, nobody would have thought the world will change like this. But here we are living a life that people call new...

Here’s the effect of COVID-19 on the Aviation and Shipping Industry!

The COVID-19 (coronavirus pandemic) has an adverse effect on the world's Economy and Trade. According to the reports, global economic growth is...

Top 4 cities in India to start a startup

India is an amazing place where you can start your start-up for the very first time. When it comes to counting the...

Recent Comments