OpenAI made a system that’s better at Montezuma’s Revenge than humans

Nitin Naresh November 1, 2018

0 739 3 minutes read

Artificial intelligence (AI) can generate synthetic scans of brain cancer, simultaneously translate between languages, and teach robots to manipulate objects with humanlike dexterity. And as new research from OpenAI reveals, it’s pretty darn good at playing video games, too.
On Tuesday, OpenAI — a nonprofit, San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel, among other tech luminaries — detailed in a research paper AI that can best humans at the retro platformer Montezuma’s Revenge. The top-performing iteration found 22 of the 24 rooms in the first level, and occasionally discovered all 24.
It follows news in June of an OpenAI-developed bot that can defeat skilled teams in Valve’s Dota 2.
As OpenAI noted in an accompanying blog post, Montezuma’s Revenge is notoriously difficult for machine learning algorithms to master. It was the only Atari 2600 title to foil Google subsidiary DeepMind’s headline-grabbing Deep Q-Learning network in 2015, which scored a 0 percent of the average human score (4.7K).
“Simple exploration strategies are highly unlikely to gather any rewards, or see more than a few of the 24 rooms in the level,” OpenAI wrote. “Since then, advances in Montezuma’s Revenge have been seen by many as synonymous with advances in exploration.”
OpenAI calls its method Random Network Distillation (RND), and said it’s designed to be applied to any reinforcement learning algorithm — i.e., models that use systems of rewards and punishments to drive AI agents in the direction of specific goals.
Traditionally, agents learn the next-state predictor model from their experiences and use the error of the prediction as an intrinsic reward. Unlike prior methods, RND introduces a bonus reward that’s based on predicting the output of a fixed and randomly initialized neural network on the next state.
In the course of a run, the agents played Montezuma’s Revenge completely randomly, improving their strategy through trial and error. Thanks to the RND component, they were incentivized to explore areas of the game map they might not have otherwise, managing to achieve the game’s objective even when it wasn’t explicitly communicated.
“Curiosity drives the agent to discover new rooms and find ways of increasing the in-game score, and this extrinsic reward drives it to revisit those rooms later in the training,” OpenAI explained. “Curiosity gives us an easier way to teach agents to interact with any environment, rather than via an extensively engineered task-specific reward function that we hope corresponds to solving a task. An agent using a generic reward function not specific to the particulars of an environment can acquire a basic level of competency in a wide range of environments, resulting in the agent’s ability to determine what behaviors are even in the absence of carefully engineered rewards.”

Above: The AI agents are driven by curiosity.

Image Credit: OpenAI

RND addressed another common issue in reinforcement learning schemes: the so-called noisy TV problem, in which an AI agent can become stuck looking for patterns in random data (like static on a TV).
“Like a gambler at a slot machine attracted to chance outcomes, the agent sometimes gets trapped by its curiosity,” OpenAI wrote. “The agent finds a source of randomness in the environment and keeps observing it, always experiencing a high intrinsic reward for such transitions.”
So how’d it perform? On average, OpenAI’s agents scored 10K over nine runs with a best mean return of 14.5K. A longer-running test yielded a run that achieved 17.5K, corresponding to passing the first level and finding all 24 rooms.
It wasn’t just Montezuma’s Revenge they mastered. When set loose on Super Mario, the agents discovered 11 levels, found secret rooms, and defeated bosses. They learned how to beat Breakout after a few hours of training. And when tasked with volleying a ball in Pong with a human player, they tried to prolong the game rather than win.
OpenAI has its fingers in a number of AI pies besides gaming.
Last year, it developed software that produces high-quality datasets for neural networks by randomizing the colors, lighting conditions, textures, and camera settings in simulated scenes. (Researchers used it to teach a mechanized arm to remove a can of Spam from a table of groceries.) More recently, in February, it released Hindsight Experience Replay (HER), an open source algorithm that effectively helps robots to learn from failure. And in July, it unveiled a system that directs robot hands in grasping and manipulating objects with state-of-the-art precision.
Source: VentureBeat
To Read Our Daily News Updates, Please Visit Inventiva Or Subscribe Our Newsletter & Push.

OpenAI made a system that’s better at Montezuma’s Revenge than humans

Nitin Naresh

Read Next

Why Are Global Financial Giants Rushing To Buy Lenskart Shares That Early Investors Are Selling?

As SpaceX Touches A $1.75 Trillion Valuation, Are Investors Buying A Space Company Or Elon Musk’s Vision Of The Future?

Can Medicines Made In Space Transform Healthcare On Earth? Inside Pharma’s Race To Low Earth Orbit

If Betting Ads Are Banned In India, How Did They End Up On Zepto? The Questions ED’s Probe Could Force The Industry To Answer

TCS Bets On AI Agents And Paytm Hires Thousands. Why India Inc.’s AI Message Sounds Very Different From Silicon Valley

The IDFC Bank ₹645 Crore Fraud, Realty Baron Vikram Wadhwa’s Arrest And The Question India Keeps Avoiding: Who Protects Powerful Builders?

“Do Kaudi Ke Teachers” Said Anjana, Then Filed A ₹2 Crore Case On Khan Sir When He Took A Defense!

Zepto’s Betting Link? ED Says Zepto Promoted Parimatch

Zomato-Swiggy Post IPO Autopsy Amid Zepto IPO

India’s DNA Story; What Modern Genetics Reveals About Our Ancient Past And Future Health Challenges. After AI, Could Genetics Drive The Next Scientific Revolution?

Why Are Global Financial Giants Rushing To Buy Lenskart Shares That Early Investors Are Selling?

As SpaceX Touches A $1.75 Trillion Valuation, Are Investors Buying A Space Company Or Elon Musk’s Vision Of The Future?

Can Medicines Made In Space Transform Healthcare On Earth? Inside Pharma’s Race To Low Earth Orbit

If Betting Ads Are Banned In India, How Did They End Up On Zepto? The Questions ED’s Probe Could Force The Industry To Answer

TCS Bets On AI Agents And Paytm Hires Thousands. Why India Inc.’s AI Message Sounds Very Different From Silicon Valley

The IDFC Bank ₹645 Crore Fraud, Realty Baron Vikram Wadhwa’s Arrest And The Question India Keeps Avoiding: Who Protects Powerful Builders?

“Do Kaudi Ke Teachers” Said Anjana, Then Filed A ₹2 Crore Case On Khan Sir When He Took A Defense!

Zepto’s Betting Link? ED Says Zepto Promoted Parimatch

Zomato-Swiggy Post IPO Autopsy Amid Zepto IPO

India’s DNA Story; What Modern Genetics Reveals About Our Ancient Past And Future Health Challenges. After AI, Could Genetics Drive The Next Scientific Revolution?

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Covid-19:Why Indians might struggle against the Possible pandemic’s third wave?

The death of democracy in India

Indonesia short on oxygen, seeks help as virus cases soar

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

Floods- Why are Pune and Mumbai prone to it?

Read Next

Why Are Global Financial Giants Rushing To Buy Lenskart Shares That Early Investors Are Selling?

As SpaceX Touches A $1.75 Trillion Valuation, Are Investors Buying A Space Company Or Elon Musk’s Vision Of The Future?

Can Medicines Made In Space Transform Healthcare On Earth? Inside Pharma’s Race To Low Earth Orbit

If Betting Ads Are Banned In India, How Did They End Up On Zepto? The Questions ED’s Probe Could Force The Industry To Answer

TCS Bets On AI Agents And Paytm Hires Thousands. Why India Inc.’s AI Message Sounds Very Different From Silicon Valley

The IDFC Bank ₹645 Crore Fraud, Realty Baron Vikram Wadhwa’s Arrest And The Question India Keeps Avoiding: Who Protects Powerful Builders?

“Do Kaudi Ke Teachers” Said Anjana, Then Filed A ₹2 Crore Case On Khan Sir When He Took A Defense!

Zepto’s Betting Link? ED Says Zepto Promoted Parimatch

Zomato-Swiggy Post IPO Autopsy Amid Zepto IPO

India’s DNA Story; What Modern Genetics Reveals About Our Ancient Past And Future Health Challenges. After AI, Could Genetics Drive The Next Scientific Revolution?

Subscribe to our mailing list to get the new updates!

Android Pie has a battery life problem

Porsche 911 has 24 different models: All of them explained in a 5-minute video

Related Articles

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Covid-19:Why Indians might struggle against the Possible pandemic’s third wave?

The death of democracy in India

Indonesia short on oxygen, seeks help as virus cases soar

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

Floods- Why are Pune and Mumbai prone to it?