Amazon scientist explains how Alexa resolves ambiguous requests

Nitin Naresh September 28, 2018

0 893 2 minutes read

During a blockbuster press event last week, Amazon took the wraps off a redesigned Echo Show, Echo Plus, and Echo Spot, and nine other new other voice-activated accessories, peripherals, and smart speakers powered by Alexa. Also in tow: the Alexa Presentation Language, which lets developers build “multimodal” Alexa apps — skills — that combine voice, touch, text, images, graphics, audio, and video in a single interface.
Developing the frameworks that underlie it was easier said than done, according to Amazon senior speech scientist Vishal Naik. In a blog post today, he explained how Alexa leverages multiple neural networks — layered math functions that loosely mimic the human brain’s physiology — to resolve ambiguous requests. The work is also detailed in a paper (“Context Aware Conversational Understanding for Intelligent Agents with a Screen“) that was presented earlier this year at the Association for the Advancement of Artificial Intelligence.
“If a customer says, ‘Alexa, play Harry Potter,’ the Echo Show screen could display separate graphics representing a Harry Potter audiobook, a movie, and a soundtrack,” he explained. “If the customer follows up by saying ‘the last one,’ the system must determine whether that means the last item in the on-screen list, the last Harry Potter movie, or something else.”
Naik and colleagues evaluated three bidirectional long short term memory neural networks (BiLSTM) — a category of recurrent neural network that’s capable of learning long-term dependencies — with slightly different architectures. (Basically, the memory cells in LSTMs allow the neural networks to combine their memory and inputs to improve their prediction accuracy, and because they’re bidirectional, they can access context from both past and future directions.)
Sourcing data from the Alexa Meaning Representation Language, an annotated semantic-representation language released in June of this year, the team jointly trained the AI models to classify commands by either intent, which designates the action a customer wants Alexa to take, or slot, which designates the entities (i.e., an audiobook, movie, or smart home device trigger) the intent acts on. And they fed them embeddings, or mathematical representations of words.
The first of the three neural networks considered both the aforementioned embeddings and the type of content that would be displayed on Alexa devices with screens (in the form of a vector) in its classifications. The second went a step further, taking into account not just the type of on-screen data, but the specific name of the data type (e.g., “Harry Potter” or “The Black Panther” in addition to “Onscreen_Movie”). The third, meanwhile, used convolutional filters to identify each name’s contribution toward the final classification’s accuracy, and based its predictions on the most relevant of the bunch.
To evaluate the three networks’ performance, the researchers established a benchmark that used hard-coded rules to factor in on-screen data. Given a command like “Play Harry Potter,” it might estimate a 50 percent and 10 percent probability it refers to the audiobook and soundtrack, respectively.
In the end, when evaluated with four different data sets (slots with and without screen information and intents with and without screen information), all three of the AI models that considered on-screen data “consistently outperform[ed]” both the benchmark and a voice-only test set. More importantly, they didn’t exhibit degraded accuracy when trained exclusively on speech inputs.
“[We] verified that the contextual awareness of our models does not cause a degradation of non-contextual functionality,” Naik and team wrote. “Our approach is naturally extensible to new visual use cases, without requiring manual rule writing.”
In future research, they hope to explore additional context cues and extend visual features to encode screen object locations for multiple object types displayed on-screen (for example, books and movies).
Source: VentureBeat
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

Amazon scientist explains how Alexa resolves ambiguous requests

Nitin Naresh

Read Next

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Why Is The Indian Rupee Sliding Again? RBI’s Hands-Off Approach Leaves Markets Guessing

India’s Markets Are Changing. The Easy Money Is Gone. The Winners, The Losers And The Biggest Bets Still To Come

Trump Didn’t Just Change America. He Changed How The World Sees American Democracy. Has America Started To Look Like India Politically?

Inside Groww’s Bold Plan To Expand Beyond Brokerage Without Losing Its Technology-First Edge. AI, Wealth Management And Lending Are All Part Of Groww’s Biggest Bet Yet.

Government Opens Talks With Cockroach Janta Party. But Can A Meme Become India’s Next Political Force?

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Why Is The Indian Rupee Sliding Again? RBI’s Hands-Off Approach Leaves Markets Guessing

India’s Markets Are Changing. The Easy Money Is Gone. The Winners, The Losers And The Biggest Bets Still To Come

Trump Didn’t Just Change America. He Changed How The World Sees American Democracy. Has America Started To Look Like India Politically?

Inside Groww’s Bold Plan To Expand Beyond Brokerage Without Losing Its Technology-First Edge. AI, Wealth Management And Lending Are All Part Of Groww’s Biggest Bet Yet.

Government Opens Talks With Cockroach Janta Party. But Can A Meme Become India’s Next Political Force?

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Covid-19:Why Indians might struggle against the Possible pandemic’s third wave?

The death of democracy in India

Indonesia short on oxygen, seeks help as virus cases soar

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

Floods- Why are Pune and Mumbai prone to it?

Read Next

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Why Is The Indian Rupee Sliding Again? RBI’s Hands-Off Approach Leaves Markets Guessing

India’s Markets Are Changing. The Easy Money Is Gone. The Winners, The Losers And The Biggest Bets Still To Come

Trump Didn’t Just Change America. He Changed How The World Sees American Democracy. Has America Started To Look Like India Politically?

Inside Groww’s Bold Plan To Expand Beyond Brokerage Without Losing Its Technology-First Edge. AI, Wealth Management And Lending Are All Part Of Groww’s Biggest Bet Yet.

Government Opens Talks With Cockroach Janta Party. But Can A Meme Become India’s Next Political Force?

Subscribe to our mailing list to get the new updates!

GOeureka uses blockchain to unlock 400,000 hotel rooms with zero commission

Spotify ends test that required family plan subscribers to share their GPS location

Related Articles

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Covid-19:Why Indians might struggle against the Possible pandemic’s third wave?

The death of democracy in India

Indonesia short on oxygen, seeks help as virus cases soar

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

Floods- Why are Pune and Mumbai prone to it?