OpenAI and DeepMind AI system achieves ‘superhuman’ performance in Pong and Enduro

Nitin Naresh November 16, 2018

0 377 3 minutes read

Machines learning to play games by watching humans might sound like the plot of a science fiction novel, but that’s exactly what researchers at OpenAI — a nonprofit, San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel, among other tech luminaries — and Google subsidiary DeepMind claim to have accomplished.
In a paper published on the preprint server Arxiv.org (“Reward learning from human preferences and demonstrations in Atari”), they describe an AI system that combines two approaches to learning from human feedback: expert demonstrations and trajectory preferences. Their deep neural network — which, like other neural networks, consists of mathematical functions loosely modeled on neurons in the brain — achieved superhuman performance on two out of the nine Atari games tested (Pong and Enduro) and beat baseline models in seven.
The research was submitted to the Neural Information Processing Systems (NIPS 2018), which is scheduled to take place in Montreal, Canada during the first week in December.
“To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions,” the team wrote. “Instead, we can have humans communicate an objective to the agent directly.”
It’s a technique that’s been referred to in prior research as “inverse reinforcement learning,” and it holds promise for tasks involving poorly defined objectives that tend to trip up artificially intelligent (AI) systems. As the paper’s authors noted, reinforcement learning — which uses a system of rewards (or punishments) to drive AI agents to achieve specific goals — isn’t of much use if the goals in question lack feedback mechanisms.
Game-playing agents created by the researchers’ AI model didn’t merely mimic human behavior. If they had, they wouldn’t have been particularly scalable, because they would have required a human expert to teach them how to perform specific tasks and never would have be able to achieve “significantly” better performance than said experts.
The researchers’ system combined several forms of feedback, including imitation learning from expert demonstrations and a reward model that used trajectory preferences. Basically, it didn’t assume a directly available reward, such as an increase in score or an in-game bonus; instead, relying on feedback from a human in the loop, it attempted to approximate as closely as possible intended behavior by (1) imitating it from demonstrations and (2) maximizing the inferred reward function.
The model consisted of two parts: a deep Q-Learning network, which DeepMind tapped in prior research to achieve superhuman performance in Atari 2600 games, and a reward model, a convolutional neural network trained on labels supplied by an annotator — either a human or a synthetic system — during task training.
Agents learned over time both from the demonstrations and from experience. All the while, human experts prevented them from exploiting unexpected sources of reward that could harm performance, a phenomenon known as reward hacking.
In testing, the researchers set agents from the AI model on the Arcade Learning Environment, an open source framework for designing AI agents that can play Atari 2600 games. Atari games, the researchers wrote, have the advantage of being “among the most diverse environments” for reinforcement learning and provide “well-specified” reward functions.
After 50 million steps and a full schedule of 6,800 labels, the agents trained with the researchers’ system outperformed imitation learning baselines in all games tested except Private Eye (including Beamrider, Breakout, Enduro, Pong, Q*bert, and Seaquest). Human demonstrations benefited Hero, Montezuma’s Revenge, and Private Eye greatly, the researchers found, and typically halved the amount of human time required to achieve the same level of performance.
The research follows on the heels of an AI system — also the work of OpenAI scientists — that can best humans at Montezuma’s Revenge. (Most of that model’s performance improvements came from random network distillation, which introduced a bonus reward that’s based on predicting the output of a fixed and randomly initialized neural network on the next state.) When set loose on Super Mario, agents trained by the system discovered 11 levels, found secret rooms, and defeated bosses. And when tasked with volleying a ball in Pong with a human player, they tried to prolong the game rather than win.
It also comes after news in June of an OpenAI-developed bot that can defeat skilled teams in Valve’s Dota 2.
Source: VentureBeat
To Read Our Daily News Updates, Please Visit Inventiva Or Subscribe Our Newsletter & Push.

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Swiggy’s IPO Plans, Secures Shareholder Approval For A Potential $1.2 Billion IPO

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

What Is Project Nimbus? Why Are Google Employees Protesting It? Do Tech Companies Have Ties With The Military?

MDH and Everest Spice Banned in Singapore and HongKong; Can they Cause Cancer?

Finally Ankiti Bose Founder & Ex-CEO Of Zilingo Filed Retaliatory Sexual Harassment Complaint Against Co-Founder For Blackmailing & Extortion

NOTA, No Votes and Unopposed Nominations: The Grey Areas of the Indian Election Process Explained

Can A Bigger ‘Sorry’ Apology Ad Undo The Fraud Committed By Baba Ramdev’s Patanjali? Why Has The License Not Been Cancelled, And Why Is There No Fine? Should Indian Citizens Forgive Him So Easily?

India’s Biggest Worry, Unemployment, Reuters Poll; How Modi Govt Has Failed To Address The Critical Issue Amid ‘White Washing’; Where Are Our Jobs?

Bye Bye Tesla! Tesla’s Change In Strategy Bores’ Gloomy Skies’ Over India Factory; Tesla’s Earnings Plunge, But The Company Promises Cheaper Car Model

OpenAI and DeepMind AI system achieves ‘superhuman’ performance in Pong and Enduro

Nitin Naresh

Read Next

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

What Is Project Nimbus? Why Are Google Employees Protesting It? Do Tech Companies Have Ties With The Military?

NOTA, No Votes and Unopposed Nominations: The Grey Areas of the Indian Election Process Explained

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

What Is Project Nimbus? Why Are Google Employees Protesting It? Do Tech Companies Have Ties With The Military?

NOTA, No Votes and Unopposed Nominations: The Grey Areas of the Indian Election Process Explained

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Read Next

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

What Is Project Nimbus? Why Are Google Employees Protesting It? Do Tech Companies Have Ties With The Military?

NOTA, No Votes and Unopposed Nominations: The Grey Areas of the Indian Election Process Explained

UN warns over human rights impact of a ‘digital welfare state’

Airbnb made more than $1 billion in revenue last quarter

Related Articles

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Adblock Detected