Amazon scientist explains how Alexa’s whisper mode works

Nitin Naresh September 26, 2018

0 1,073 2 minutes read

Last week during an event in Seattle, Amazon unveiled a host of features heading to new and existing smart speakers powered by its Alexa voice platform. One of them was “whisper mode,” which enables Alexa to respond to whispered speech by whispering back. In a blog post published today, Zeynab Raeesy, a speech scientist in Amazon’s Alexa Speech group, revealed the feature’s artificial intelligence (AI) underpinnings.
Much of the work is detailed in a paper (“LSTM-based Whisper Detection”) that will be presented at the IEEE Workshop on Spoken Language Technology in December.
“If you’re in a room where a child has just fallen asleep, and someone else walks in, you might start speaking in a whisper, to indicate that you’re trying to keep the room quiet. The other person will probably start whispering, too,” Raeesy wrote. “We would like Alexa to react to conversational cues in just such a natural, intuitive way.”
What makes whispered speech difficult to interpret, Raeesy explained, is the fact that it’s predominantly unvoiced — that is to say, it doesn’t involve the vibration of the vocal cords. It also tends to have less energy in lower frequency bands than ordinary speech.
She and colleagues investigated the use of two different neural networks — layers of mathematical functions loosely modeled after the human brain’s neurons — to distinguish between normal and whispered words.
The two neural networks differed architecturally — one was a multilayer perceptron (MLP) and the second was a long short-term memory (LSTM) network, which process inputs in sequential order — but were trained on the same data. Said data consisted of (1) log filter-bank energies, or representations of speech signals that record the signal energies in different frequency ranges, and (2) a set of features that “exploit[ed] the signal differences between whispered and normal speech.”
In testing, they found the LSTM generally performed better than the MLP, conferring a number of advantages. As Raeesy explained, other components of Alexa’s speech recognition engine rely entirely on log filter-bank energies, and sourcing the same input data for different components makes the entire system more compact.
It wasn’t all smooth sailing, though — at least initially. Because Alexa recognizes the end of a command or reply by a short period of silence (a technique known as “end-pointing”), the LSTM’s confidence tended to fall off toward the tail end of utterances. To solve the problem, the researchers averaged the LSTM’s outputs for the entire utterance; in the end, dropping the last 1.25 seconds of speech data was “crucial” to maintaining performance.
Whisper mode will be available in U.S. English in October.
Source: VentureBeat
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

Amazon scientist explains how Alexa’s whisper mode works

Nitin Naresh

Read Next

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Why Is The Indian Rupee Sliding Again? RBI’s Hands-Off Approach Leaves Markets Guessing

India’s Markets Are Changing. The Easy Money Is Gone. The Winners, The Losers And The Biggest Bets Still To Come

Trump Didn’t Just Change America. He Changed How The World Sees American Democracy. Has America Started To Look Like India Politically?

Inside Groww’s Bold Plan To Expand Beyond Brokerage Without Losing Its Technology-First Edge. AI, Wealth Management And Lending Are All Part Of Groww’s Biggest Bet Yet.

Government Opens Talks With Cockroach Janta Party. But Can A Meme Become India’s Next Political Force?

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Why Is The Indian Rupee Sliding Again? RBI’s Hands-Off Approach Leaves Markets Guessing

India’s Markets Are Changing. The Easy Money Is Gone. The Winners, The Losers And The Biggest Bets Still To Come

Trump Didn’t Just Change America. He Changed How The World Sees American Democracy. Has America Started To Look Like India Politically?

Inside Groww’s Bold Plan To Expand Beyond Brokerage Without Losing Its Technology-First Edge. AI, Wealth Management And Lending Are All Part Of Groww’s Biggest Bet Yet.

Government Opens Talks With Cockroach Janta Party. But Can A Meme Become India’s Next Political Force?

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Covid-19:Why Indians might struggle against the Possible pandemic’s third wave?

The death of democracy in India

Indonesia short on oxygen, seeks help as virus cases soar

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

Floods- Why are Pune and Mumbai prone to it?

Read Next

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Why Is The Indian Rupee Sliding Again? RBI’s Hands-Off Approach Leaves Markets Guessing

India’s Markets Are Changing. The Easy Money Is Gone. The Winners, The Losers And The Biggest Bets Still To Come

Trump Didn’t Just Change America. He Changed How The World Sees American Democracy. Has America Started To Look Like India Politically?

Inside Groww’s Bold Plan To Expand Beyond Brokerage Without Losing Its Technology-First Edge. AI, Wealth Management And Lending Are All Part Of Groww’s Biggest Bet Yet.

Government Opens Talks With Cockroach Janta Party. But Can A Meme Become India’s Next Political Force?

Subscribe to our mailing list to get the new updates!

FCC approves 5G plan to speed up deployment and reduce local fees

Dish’s AirTV Player can now record two shows at once

Related Articles

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Covid-19:Why Indians might struggle against the Possible pandemic’s third wave?

The death of democracy in India

Indonesia short on oxygen, seeks help as virus cases soar

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

Floods- Why are Pune and Mumbai prone to it?