Google researchers use AI to pick out voices in a crowd

Nitin Naresh October 12, 2018

0 291 2 minutes read

Separating a single person’s voice from a noisy crowd is something most people do subconsciously — it’s called the cocktail party effect. Smart speakers like Google Home and Amazon’s Echo typically have a tougher time, but thanks to artificial intelligence (AI), they might one day be able to filter out voices as well as any human.
Researchers at Google and the Idiap Research Institute in Switerzland describe a novel solution in a new paper (“VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking“) published on the preprint server Arxiv.org. They trained two separate neural networks — a speaker recognition network and a spectrogram masking network — that together “significantly” reduced the speech recognition word error rate (WER) on multispeaker signals.
Their work builds on a paper out of MIT’s Computer Science and Artificial Intelligence Lab earlier this year, which described a system — PixelPlayer — that learned to isolate the sounds of individual instruments from YouTube videos. And it calls to mind an AI system created by researchers at the University of Surrey in 2015, which output vocal spectrograms when fed songs as input.
“[We address] the task of isolating the voices of a subset of speakers of interest from the commonality of all the other speakers and noises,” the researchers wrote. “For example, such subset can be formed by a single target speaker issuing a spoken query to a personal mobile device, or the members of a house talking to a shared home device.”
The researchers’ two-part system, dubbed VoiceFilter, consisted of a long short term memory (LSTM) model — a type of machine learning algorithm that combines memory and inputs to improve its prediction accuracy — and a convolutional neural network (with one LSTM layer). The first took as inputs preprocessed voice samples and output speaker embeddings (i.e., representations of sound in vector form), while the latter predicted a soft mask, or filter, from the embeddings and a magnitude spectrogram computed from noisy audio. The mask was used to generate an enhanced magnitude spectrogram, which, when combined with the phase (sound waves) of the noisy audio and transformed, produced an enhanced waveform.
The AI system was taught to minimize the difference between the masked magnitude spectrogram and the target magnitude spectrogram computed from clean audio.
The team sourced two datasets for training samples: (1) roughly 34 million anonymized voice query logs in English from 138,000 speakers, and (2) a compilation of open source speech libraries LibriSpeech, VoxCeleb, and VoxCeleb2. The VoiceFilter network trained on speech samples from 2,338 contributors to the CSTR VCTK dataset — a corpus of speech data maintained by the University of Edinburgh — and LibriSpeech, and was evaluated with utterances from 73 speakers. (The training data consisted of three data inputs: clean audio as ground truth, noisy audio containing multiple speakers, and reference audio from the target speaker.)
In tests, VoiceFilter achieved a reduction in word error rate from 55.9 percent to 23.4 percent in two-speaker scenarios.
“We have demonstrated the effectiveness of using a discriminatively-trained speaker encoder to condition the speech separation task,” the researchers wrote. “Such a system is more applicable to real scenarios because it does not require prior knowledge about the number of speakers … Our system purely relies on the audio signal and can easily generalize to unknown speakers by using a highly representative embedding vector for the speaker.”
Source: VentureBeat
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

Nitin Naresh October 12, 2018

0 291 2 minutes read

MDH and Everest Spice Banned in Singapore and HongKong; Can they Cause Cancer?

Finally Ankiti Bose Founder & Ex-CEO Of Zilingo Filed Retaliatory Sexual Harassment Complaint Against Co-Founder For Blackmailing & Extortion

NOTA, No Votes and Unopposed Nominations: The Grey Areas of the Indian Election Process Explained

Can A Bigger ‘Sorry’ Apology Ad Undo The Fraud Committed By Baba Ramdev’s Patanjali? Why Has The License Not Been Cancelled, And Why Is There No Fine? Should Indian Citizens Forgive Him So Easily?

India’s Biggest Worry, Unemployment, Reuters Poll; How Modi Govt Has Failed To Address The Critical Issue Amid ‘White Washing’; Where Are Our Jobs?

Bye Bye Tesla! Tesla’s Change In Strategy Bores’ Gloomy Skies’ Over India Factory; Tesla’s Earnings Plunge, But The Company Promises Cheaper Car Model

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

Adani Family Infuses Funds To Ambuja Cement Via Warrants Program; What Are These Lesser-Known Financial Tool “Warrants” And Why Proxy Advisors Raised Concerns Over This Purchase?

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

Google researchers use AI to pick out voices in a crowd

Nitin Naresh

Read Next

Bye Bye Tesla! Tesla’s Change In Strategy Bores’ Gloomy Skies’ Over India Factory; Tesla’s Earnings Plunge, But The Company Promises Cheaper Car Model

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Bye Bye Tesla! Tesla’s Change In Strategy Bores’ Gloomy Skies’ Over India Factory; Tesla’s Earnings Plunge, But The Company Promises Cheaper Car Model

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Read Next

Bye Bye Tesla! Tesla’s Change In Strategy Bores’ Gloomy Skies’ Over India Factory; Tesla’s Earnings Plunge, But The Company Promises Cheaper Car Model

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Facebook now says 30 million users had access tokens, personal data stolen in recent breach

Facebook bans hundreds of clickbait farms for ‘coordinated inauthentic behavior’

Related Articles

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Adblock Detected