OpenAI made a system that’s better at Montezuma’s Revenge than humans

Nitin Naresh November 1, 2018

0 302 3 minutes read

Artificial intelligence (AI) can generate synthetic scans of brain cancer, simultaneously translate between languages, and teach robots to manipulate objects with humanlike dexterity. And as new research from OpenAI reveals, it’s pretty darn good at playing video games, too.
On Tuesday, OpenAI — a nonprofit, San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel, among other tech luminaries — detailed in a research paper AI that can best humans at the retro platformer Montezuma’s Revenge. The top-performing iteration found 22 of the 24 rooms in the first level, and occasionally discovered all 24.
It follows news in June of an OpenAI-developed bot that can defeat skilled teams in Valve’s Dota 2.
As OpenAI noted in an accompanying blog post, Montezuma’s Revenge is notoriously difficult for machine learning algorithms to master. It was the only Atari 2600 title to foil Google subsidiary DeepMind’s headline-grabbing Deep Q-Learning network in 2015, which scored a 0 percent of the average human score (4.7K).
“Simple exploration strategies are highly unlikely to gather any rewards, or see more than a few of the 24 rooms in the level,” OpenAI wrote. “Since then, advances in Montezuma’s Revenge have been seen by many as synonymous with advances in exploration.”
OpenAI calls its method Random Network Distillation (RND), and said it’s designed to be applied to any reinforcement learning algorithm — i.e., models that use systems of rewards and punishments to drive AI agents in the direction of specific goals.
Traditionally, agents learn the next-state predictor model from their experiences and use the error of the prediction as an intrinsic reward. Unlike prior methods, RND introduces a bonus reward that’s based on predicting the output of a fixed and randomly initialized neural network on the next state.
In the course of a run, the agents played Montezuma’s Revenge completely randomly, improving their strategy through trial and error. Thanks to the RND component, they were incentivized to explore areas of the game map they might not have otherwise, managing to achieve the game’s objective even when it wasn’t explicitly communicated.
“Curiosity drives the agent to discover new rooms and find ways of increasing the in-game score, and this extrinsic reward drives it to revisit those rooms later in the training,” OpenAI explained. “Curiosity gives us an easier way to teach agents to interact with any environment, rather than via an extensively engineered task-specific reward function that we hope corresponds to solving a task. An agent using a generic reward function not specific to the particulars of an environment can acquire a basic level of competency in a wide range of environments, resulting in the agent’s ability to determine what behaviors are even in the absence of carefully engineered rewards.”

Above: The AI agents are driven by curiosity.

Image Credit: OpenAI

RND addressed another common issue in reinforcement learning schemes: the so-called noisy TV problem, in which an AI agent can become stuck looking for patterns in random data (like static on a TV).
“Like a gambler at a slot machine attracted to chance outcomes, the agent sometimes gets trapped by its curiosity,” OpenAI wrote. “The agent finds a source of randomness in the environment and keeps observing it, always experiencing a high intrinsic reward for such transitions.”
So how’d it perform? On average, OpenAI’s agents scored 10K over nine runs with a best mean return of 14.5K. A longer-running test yielded a run that achieved 17.5K, corresponding to passing the first level and finding all 24 rooms.
It wasn’t just Montezuma’s Revenge they mastered. When set loose on Super Mario, the agents discovered 11 levels, found secret rooms, and defeated bosses. They learned how to beat Breakout after a few hours of training. And when tasked with volleying a ball in Pong with a human player, they tried to prolong the game rather than win.
OpenAI has its fingers in a number of AI pies besides gaming.
Last year, it developed software that produces high-quality datasets for neural networks by randomizing the colors, lighting conditions, textures, and camera settings in simulated scenes. (Researchers used it to teach a mechanized arm to remove a can of Spam from a table of groceries.) More recently, in February, it released Hindsight Experience Replay (HER), an open source algorithm that effectively helps robots to learn from failure. And in July, it unveiled a system that directs robot hands in grasping and manipulating objects with state-of-the-art precision.
Source: VentureBeat
To Read Our Daily News Updates, Please Visit Inventiva Or Subscribe Our Newsletter & Push.

Can A Bigger ‘Sorry’ Apology Ad Undo The Fraud Committed By Baba Ramdev’s Patanjali? Why Has The License Not Been Cancelled, And Why Is There No Fine? Should Indian Citizens Forgive Him So Easily?

India’s Biggest Worry, Unemployment, Reuters Poll; How Modi Govt Has Failed To Address The Critical Issue Amid ‘White Washing’; Where Are Our Jobs?

Bye Bye Tesla! Tesla’s Change In Strategy Bores’ Gloomy Skies’ Over India Factory; Tesla’s Earnings Plunge, But The Company Promises Cheaper Car Model

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

Adani Family Infuses Funds To Ambuja Cement Via Warrants Program; What Are These Lesser-Known Financial Tool “Warrants” And Why Proxy Advisors Raised Concerns Over This Purchase?

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

Top 10 Best Digital Media Buying Platforms Companies In India 2024

After Bombing Pakistan, Why is Iran’s President Ebrahim Raisi Visiting Pakistan?

Citigroup Lawsuit, Managing Director Details ‘Pervasive’ Sexual Harassment; Unresolved Sexual Harassment Cases At All Time High In India Inc.

OpenAI made a system that’s better at Montezuma’s Revenge than humans

Nitin Naresh

Read Next

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Read Next

‘Wyzr’, Reliance’s New Kid On The Block, Is Set To Jolt The Dominance Of Electronic And Home Appliances Multinationals; Reliance Expanding Its Footprint In All Sectors.

IIT Graduates’ Annual Salary Soars Down, An Indication of Unstable Job Market

Vodafone Idea FPO; Is Rs 18,000 Crore Enough To Stall The Falling Star?

Android Pie has a battery life problem

Porsche 911 has 24 different models: All of them explained in a 5-minute video

Related Articles

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Adblock Detected