MIT CSAIL is using unsupervised learning for language translations

Nitin Naresh October 30, 2018

0 351 2 minutes read

Machine learning has paved the way for faster and more accurate language translation than ever before, but it’s no Babel fish. Cutting-edge systems from Google, Amazon, Microsoft, and others require artificially intelligent (AI) models to ingest millions of documents that have been translated by hand, which they use to find matching words and phrases in the target language. But that’s not a viable approach for the thousands of dialects that lack large corpora.
That’s why researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Lab (MIT CASAIL) took a different tack. In a paper that’ll be presented this week at the Conference on Empirical Methods in Natural Language Processing, they describe an unsupervised model — i.e., a model that learns from test data that hasn’t been explicitly labeled or categorized — that can translate between texts in two languages without direct translational data between the two.
It follows Facebook’s forays into unsupervised machine learning translation. In August, Facebook AI Research (FAIR) — collaborating with the firm’s Applied Machine Learning division — devised a model that uses a combination of word-for-word translations, language models, and back translations to outperform systems for language pairings.
“[Our] model sees the words in the two languages as sets of vectors, and maps [those vectors] from one set to the other by essentially preserving relationships,” Tommi Jaakkola, a CSAIL researcher and the paper’s coauthor, told MIT News. “The approach could help translate low-resource languages or dialects, so long as they come with enough monolingual content.”
Core to the approach is what’s called the Gromov-Wasserstein distance, a statistical metric that records the distance between points in one computational space and matches them to similarly distanced points in another. Here, it’s applied to embeddings — mathematical representations of words called vectors — with words of similar meanings clustered together. In the end, the model is able to align the vectors in embeddings that are most closely correlated by relative distances, a sign they’re likely to be direct translations.
The researchers’ system — which was trained and tested on FASTTEXT, a dataset of publicly available word embeddings with 110 language pairs — assigns a probability that similarly distanced vectors in one language’s word embeddings will correspond with similar clusters in the second language. And it quantifies the similarity between languages with a numerical value, calculating the distance of vectors from one another in two embeddings.
The closer the vectors, the closer the score is to zero. Romance languages like French and Spanish tend toward 1, while Chinese falls between 6 and 9 when paired with other major languages.
Aligning word embeddings isn’t an entirely novel method, the researchers concede, but the system’s use of relational distances makes it more efficient than prior implementations, requiring a fraction of the computation power and little or no tuning.
“The model doesn’t know [there are months in a year]” for example, David Alvarez-Melis, CSAIL doctoral student and first author of the paper, said. “It just knows there is a cluster of 12 points that align with a cluster of 12 points in the other language, but they’re different to the rest of the words, so they probably go together well. By finding these correspondences for each word, it then aligns the whole space simultaneously.”
It’s not the only recent innovation in the machine translation space. In October, Baidu developed an AI system capable of simultaneously translating two languages at once. And in June, Google brought offline neural machine translation in 59 languages to Google Translate on iOS and Android.
Source: VentureBeat
To Read Our Daily News Updates, Please Visit Inventiva Or Subscribe Our Newsletter & Push.

Nitin Naresh October 30, 2018

0 351 2 minutes read

Top 5 Best Digital Gaming Companies In India 2024

Top 5 Best Digital Lending Companies In India 2024

SpiceJet Flights Overbooked? How Overbooking Symbolises A Dark Side Of The Aviation Industry Paving Inconvenience To Passengers!

Period Pain Relief: Understanding Your Options and the Benefits of Meftal Spas

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Swiggy’s IPO Plans, Secures Shareholder Approval For A Potential $1.2 Billion IPO

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

What Is Project Nimbus? Why Are Google Employees Protesting It? Do Tech Companies Have Ties With The Military?

MIT CSAIL is using unsupervised learning for language translations

Nitin Naresh

Read Next

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Leave a Reply Cancel reply

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Top 10 Best Agriculture Companies in India 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Read Next

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Nanotechnology-based Daily Wear Startup Turms Raises ₹6 Cr from Freshworks Founder, Others

Gmail’s iOS app gets a unified inbox

Related Articles

Leave a Reply Cancel reply

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Top 10 Best Agriculture Companies in India 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Adblock Detected