Amazon scientist explains how Alexa resolves ambiguous requests

Nitin Naresh September 28, 2018

0 306 2 minutes read

During a blockbuster press event last week, Amazon took the wraps off a redesigned Echo Show, Echo Plus, and Echo Spot, and nine other new other voice-activated accessories, peripherals, and smart speakers powered by Alexa. Also in tow: the Alexa Presentation Language, which lets developers build “multimodal” Alexa apps — skills — that combine voice, touch, text, images, graphics, audio, and video in a single interface.
Developing the frameworks that underlie it was easier said than done, according to Amazon senior speech scientist Vishal Naik. In a blog post today, he explained how Alexa leverages multiple neural networks — layered math functions that loosely mimic the human brain’s physiology — to resolve ambiguous requests. The work is also detailed in a paper (“Context Aware Conversational Understanding for Intelligent Agents with a Screen“) that was presented earlier this year at the Association for the Advancement of Artificial Intelligence.
“If a customer says, ‘Alexa, play Harry Potter,’ the Echo Show screen could display separate graphics representing a Harry Potter audiobook, a movie, and a soundtrack,” he explained. “If the customer follows up by saying ‘the last one,’ the system must determine whether that means the last item in the on-screen list, the last Harry Potter movie, or something else.”
Naik and colleagues evaluated three bidirectional long short term memory neural networks (BiLSTM) — a category of recurrent neural network that’s capable of learning long-term dependencies — with slightly different architectures. (Basically, the memory cells in LSTMs allow the neural networks to combine their memory and inputs to improve their prediction accuracy, and because they’re bidirectional, they can access context from both past and future directions.)
Sourcing data from the Alexa Meaning Representation Language, an annotated semantic-representation language released in June of this year, the team jointly trained the AI models to classify commands by either intent, which designates the action a customer wants Alexa to take, or slot, which designates the entities (i.e., an audiobook, movie, or smart home device trigger) the intent acts on. And they fed them embeddings, or mathematical representations of words.
The first of the three neural networks considered both the aforementioned embeddings and the type of content that would be displayed on Alexa devices with screens (in the form of a vector) in its classifications. The second went a step further, taking into account not just the type of on-screen data, but the specific name of the data type (e.g., “Harry Potter” or “The Black Panther” in addition to “Onscreen_Movie”). The third, meanwhile, used convolutional filters to identify each name’s contribution toward the final classification’s accuracy, and based its predictions on the most relevant of the bunch.
To evaluate the three networks’ performance, the researchers established a benchmark that used hard-coded rules to factor in on-screen data. Given a command like “Play Harry Potter,” it might estimate a 50 percent and 10 percent probability it refers to the audiobook and soundtrack, respectively.
In the end, when evaluated with four different data sets (slots with and without screen information and intents with and without screen information), all three of the AI models that considered on-screen data “consistently outperform[ed]” both the benchmark and a voice-only test set. More importantly, they didn’t exhibit degraded accuracy when trained exclusively on speech inputs.
“[We] verified that the contextual awareness of our models does not cause a degradation of non-contextual functionality,” Naik and team wrote. “Our approach is naturally extensible to new visual use cases, without requiring manual rule writing.”
In future research, they hope to explore additional context cues and extend visual features to encode screen object locations for multiple object types displayed on-screen (for example, books and movies).
Source: VentureBeat
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

Can A Bigger ‘Sorry’ Apology Ad Undo The Fraud Committed By Baba Ramdev’s Patanjali? Why Has The License Not Been Cancelled, And Why Is There No Fine? Should Indian Citizens Forgive Him So Easily?

India’s Biggest Worry, Unemployment, Reuters Poll; How Modi Govt Has Failed To Address The Critical Issue Amid ‘White Washing’; Where Are Our Jobs?

Bye Bye Tesla! Tesla’s Change In Strategy Bores’ Gloomy Skies’ Over India Factory; Tesla’s Earnings Plunge, But The Company Promises Cheaper Car Model

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

Adani Family Infuses Funds To Ambuja Cement Via Warrants Program; What Are These Lesser-Known Financial Tool “Warrants” And Why Proxy Advisors Raised Concerns Over This Purchase?

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

Top 10 Best Digital Media Buying Platforms Companies In India 2024

After Bombing Pakistan, Why is Iran’s President Ebrahim Raisi Visiting Pakistan?

Citigroup Lawsuit, Managing Director Details ‘Pervasive’ Sexual Harassment; Unresolved Sexual Harassment Cases At All Time High In India Inc.

Amazon scientist explains how Alexa resolves ambiguous requests

Nitin Naresh

Read Next

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Read Next

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

GOeureka uses blockchain to unlock 400,000 hotel rooms with zero commission

Spotify ends test that required family plan subscribers to share their GPS location

Related Articles

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Adblock Detected