Google researchers use AI to pick out voices in a crowd

Nitin Naresh October 12, 2018

0 301 2 minutes read

Separating a single person’s voice from a noisy crowd is something most people do subconsciously — it’s called the cocktail party effect. Smart speakers like Google Home and Amazon’s Echo typically have a tougher time, but thanks to artificial intelligence (AI), they might one day be able to filter out voices as well as any human.
Researchers at Google and the Idiap Research Institute in Switerzland describe a novel solution in a new paper (“VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking“) published on the preprint server Arxiv.org. They trained two separate neural networks — a speaker recognition network and a spectrogram masking network — that together “significantly” reduced the speech recognition word error rate (WER) on multispeaker signals.
Their work builds on a paper out of MIT’s Computer Science and Artificial Intelligence Lab earlier this year, which described a system — PixelPlayer — that learned to isolate the sounds of individual instruments from YouTube videos. And it calls to mind an AI system created by researchers at the University of Surrey in 2015, which output vocal spectrograms when fed songs as input.
“[We address] the task of isolating the voices of a subset of speakers of interest from the commonality of all the other speakers and noises,” the researchers wrote. “For example, such subset can be formed by a single target speaker issuing a spoken query to a personal mobile device, or the members of a house talking to a shared home device.”
The researchers’ two-part system, dubbed VoiceFilter, consisted of a long short term memory (LSTM) model — a type of machine learning algorithm that combines memory and inputs to improve its prediction accuracy — and a convolutional neural network (with one LSTM layer). The first took as inputs preprocessed voice samples and output speaker embeddings (i.e., representations of sound in vector form), while the latter predicted a soft mask, or filter, from the embeddings and a magnitude spectrogram computed from noisy audio. The mask was used to generate an enhanced magnitude spectrogram, which, when combined with the phase (sound waves) of the noisy audio and transformed, produced an enhanced waveform.
The AI system was taught to minimize the difference between the masked magnitude spectrogram and the target magnitude spectrogram computed from clean audio.
The team sourced two datasets for training samples: (1) roughly 34 million anonymized voice query logs in English from 138,000 speakers, and (2) a compilation of open source speech libraries LibriSpeech, VoxCeleb, and VoxCeleb2. The VoiceFilter network trained on speech samples from 2,338 contributors to the CSTR VCTK dataset — a corpus of speech data maintained by the University of Edinburgh — and LibriSpeech, and was evaluated with utterances from 73 speakers. (The training data consisted of three data inputs: clean audio as ground truth, noisy audio containing multiple speakers, and reference audio from the target speaker.)
In tests, VoiceFilter achieved a reduction in word error rate from 55.9 percent to 23.4 percent in two-speaker scenarios.
“We have demonstrated the effectiveness of using a discriminatively-trained speaker encoder to condition the speech separation task,” the researchers wrote. “Such a system is more applicable to real scenarios because it does not require prior knowledge about the number of speakers … Our system purely relies on the audio signal and can easily generalize to unknown speakers by using a highly representative embedding vector for the speaker.”
Source: VentureBeat
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

Nitin Naresh October 12, 2018

0 301 2 minutes read

SpiceJet Flights Overbooked? How Overbooking Symbolises A Dark Side Of The Aviation Industry Paving Inconvenience To Passengers!

Period Pain Relief: Understanding Your Options and the Benefits of Meftal Spas

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Swiggy’s IPO Plans, Secures Shareholder Approval For A Potential $1.2 Billion IPO

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

What Is Project Nimbus? Why Are Google Employees Protesting It? Do Tech Companies Have Ties With The Military?

MDH and Everest Spice Banned in Singapore and HongKong; Can they Cause Cancer?

Finally Ankiti Bose Founder & Ex-CEO Of Zilingo Filed Retaliatory Sexual Harassment Complaint Against Co-Founder For Blackmailing & Extortion

Google researchers use AI to pick out voices in a crowd

Nitin Naresh

Read Next

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Swiggy’s IPO Plans, Secures Shareholder Approval For A Potential $1.2 Billion IPO

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Swiggy’s IPO Plans, Secures Shareholder Approval For A Potential $1.2 Billion IPO

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

Leave a Reply Cancel reply

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Top 10 Best Agriculture Companies in India 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Read Next

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Swiggy’s IPO Plans, Secures Shareholder Approval For A Potential $1.2 Billion IPO

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

Facebook now says 30 million users had access tokens, personal data stolen in recent breach

Facebook bans hundreds of clickbait farms for ‘coordinated inauthentic behavior’

Related Articles

Leave a Reply Cancel reply

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Top 10 Best Agriculture Companies in India 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Adblock Detected