Google researchers use AI to pick out voices in a crowd

Nitin Naresh October 12, 2018

0 757 2 minutes read

Separating a single person’s voice from a noisy crowd is something most people do subconsciously — it’s called the cocktail party effect. Smart speakers like Google Home and Amazon’s Echo typically have a tougher time, but thanks to artificial intelligence (AI), they might one day be able to filter out voices as well as any human.
Researchers at Google and the Idiap Research Institute in Switerzland describe a novel solution in a new paper (“VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking“) published on the preprint server Arxiv.org. They trained two separate neural networks — a speaker recognition network and a spectrogram masking network — that together “significantly” reduced the speech recognition word error rate (WER) on multispeaker signals.
Their work builds on a paper out of MIT’s Computer Science and Artificial Intelligence Lab earlier this year, which described a system — PixelPlayer — that learned to isolate the sounds of individual instruments from YouTube videos. And it calls to mind an AI system created by researchers at the University of Surrey in 2015, which output vocal spectrograms when fed songs as input.
“[We address] the task of isolating the voices of a subset of speakers of interest from the commonality of all the other speakers and noises,” the researchers wrote. “For example, such subset can be formed by a single target speaker issuing a spoken query to a personal mobile device, or the members of a house talking to a shared home device.”
The researchers’ two-part system, dubbed VoiceFilter, consisted of a long short term memory (LSTM) model — a type of machine learning algorithm that combines memory and inputs to improve its prediction accuracy — and a convolutional neural network (with one LSTM layer). The first took as inputs preprocessed voice samples and output speaker embeddings (i.e., representations of sound in vector form), while the latter predicted a soft mask, or filter, from the embeddings and a magnitude spectrogram computed from noisy audio. The mask was used to generate an enhanced magnitude spectrogram, which, when combined with the phase (sound waves) of the noisy audio and transformed, produced an enhanced waveform.
The AI system was taught to minimize the difference between the masked magnitude spectrogram and the target magnitude spectrogram computed from clean audio.
The team sourced two datasets for training samples: (1) roughly 34 million anonymized voice query logs in English from 138,000 speakers, and (2) a compilation of open source speech libraries LibriSpeech, VoxCeleb, and VoxCeleb2. The VoiceFilter network trained on speech samples from 2,338 contributors to the CSTR VCTK dataset — a corpus of speech data maintained by the University of Edinburgh — and LibriSpeech, and was evaluated with utterances from 73 speakers. (The training data consisted of three data inputs: clean audio as ground truth, noisy audio containing multiple speakers, and reference audio from the target speaker.)
In tests, VoiceFilter achieved a reduction in word error rate from 55.9 percent to 23.4 percent in two-speaker scenarios.
“We have demonstrated the effectiveness of using a discriminatively-trained speaker encoder to condition the speech separation task,” the researchers wrote. “Such a system is more applicable to real scenarios because it does not require prior knowledge about the number of speakers … Our system purely relies on the audio signal and can easily generalize to unknown speakers by using a highly representative embedding vector for the speaker.”
Source: VentureBeat
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

Nitin Naresh October 12, 2018

0 757 2 minutes read

Google researchers use AI to pick out voices in a crowd

Nitin Naresh

Read Next

Has Kotak Mahindra Bank Forgotten The Customer? Behind Kotak Mahindra Bank’s Premium Image Lies A Growing Question: Are Customers Being Heard?

Why Tata’s Biggest Aviation Challenge Isn’t Just Fixing Air India. It’s Preventing Indian Aviation From Becoming A One-Horse Race

Did Zelenskyy Just Open A Dangerous New Chapter In Global Geopolitics? Why Ukraine’s Latest Move Could Redraw The World’s Strategic Map

Jaismine Lamboria Hopes Army Journey Ends With Commonwealth Games Gold

Ankiti Bose: From Zilingo Unicorn Co-Founder to GST Fraud Accused — The Curious Case of Non-Appearance and Aggressive Defamation Suits

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Has Kotak Mahindra Bank Forgotten The Customer? Behind Kotak Mahindra Bank’s Premium Image Lies A Growing Question: Are Customers Being Heard?

Why Tata’s Biggest Aviation Challenge Isn’t Just Fixing Air India. It’s Preventing Indian Aviation From Becoming A One-Horse Race

Did Zelenskyy Just Open A Dangerous New Chapter In Global Geopolitics? Why Ukraine’s Latest Move Could Redraw The World’s Strategic Map

Jaismine Lamboria Hopes Army Journey Ends With Commonwealth Games Gold

Ankiti Bose: From Zilingo Unicorn Co-Founder to GST Fraud Accused — The Curious Case of Non-Appearance and Aggressive Defamation Suits

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Covid-19:Why Indians might struggle against the Possible pandemic’s third wave?

The death of democracy in India

Indonesia short on oxygen, seeks help as virus cases soar

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

Floods- Why are Pune and Mumbai prone to it?

Read Next

Has Kotak Mahindra Bank Forgotten The Customer? Behind Kotak Mahindra Bank’s Premium Image Lies A Growing Question: Are Customers Being Heard?

Why Tata’s Biggest Aviation Challenge Isn’t Just Fixing Air India. It’s Preventing Indian Aviation From Becoming A One-Horse Race

Did Zelenskyy Just Open A Dangerous New Chapter In Global Geopolitics? Why Ukraine’s Latest Move Could Redraw The World’s Strategic Map

Jaismine Lamboria Hopes Army Journey Ends With Commonwealth Games Gold

Ankiti Bose: From Zilingo Unicorn Co-Founder to GST Fraud Accused — The Curious Case of Non-Appearance and Aggressive Defamation Suits

For The First Time In Years, Modi Blinked. How India’s Gen Z Forced A Political Retreat And Raised Questions About His Invincibility

India Won The Race To E20. But Did It Get The Transition Right? The Next Challenge For India’s Ethanol Revolution Isn’t Producing More Fuel

Will Adani Launch An Airline? Should The Owner Of Critical Aviation Infrastructure Also Become A Competitor Within That Same Ecosystem?

After OpenAI’s AI Hacked Another Company’s Systems, The Debate Over AI Safety Just Got Real

A Two-Year Reprieve, Then A 200% Tariff. The Clock Starts Now For India’s Pharma Industry

Subscribe to our mailing list to get the new updates!

Facebook now says 30 million users had access tokens, personal data stolen in recent breach

Facebook bans hundreds of clickbait farms for ‘coordinated inauthentic behavior’

Related Articles

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Covid-19:Why Indians might struggle against the Possible pandemic’s third wave?

The death of democracy in India

Indonesia short on oxygen, seeks help as virus cases soar

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

Floods- Why are Pune and Mumbai prone to it?