An AI Learned To Play Atari 6,000 Times Faster By Reading The Instructions

Despite impressive progress, today’s AI models are very inefficient learners, taking huge amounts of time and data to solve problems humans pick up almost instantaneously.

One of the most promising approaches to creating AI that can solve a diverse range of problems is reinforcement learning, which involves setting a goal and rewarding the AI for taking actions that work towards that goal.

This means these algorithms can spend the equivalent of several years blundering through video and board games until they hit on a winning formula.

Now a team from Carnegie Mellon University has found a way to help reinforcement learning algorithms learn much faster by combining them with a language model that can read instruction manuals.

Their approach, outlined in a pre-print published on arXiv, taught an AI to play a challenging Atari video game thousands of times faster than a state-of-the-art model developed by DeepMind.

“Our work is the first to demonstrate the possibility of a fully-automated reinforcement learning framework to benefit from an instruction manual for a widely studied game,” said Yue Wu, who led the research.

“We have been conducting experiments on other more complicated games like Minecraft, and have seen promising results. We believe our approach should apply to more complex problems.”

Atari video games have been a popular benchmark for studying reinforcement learning thanks to the controlled environment and the fact that the games have a scoring system, which can act as a reward for the algorithms.

First, they trained a language model to extract and summarize key information from the game’s official instruction manual.

This information was then used to pose questions about the game to a pre-trained language model similar in size and capability to GPT-3.

In the game PacMan this might be, “Should you hit a ghost if you want to win the game?”, for which the answer is no.

These answers are then used to create additional rewards for the reinforcement algorithm, beyond the game’s built-in scoring system.

These extra rewards are then fed into a well-established reinforcement learning algorithm to help it learn the game faster.

The researchers tested their approach on Skiing 6000, which is one of the hardest Atari games for AI to master.

The 2D game requires players to slalom down a hill, navigating in between poles and avoiding obstacles.

That might sound easy enough, but the leading AI had to run through 80 billion frames of the game to achieve comparable performance to a human.

In contrast, the new approach required just 13 million frames to get the hang of the game, although it was only able to achieve a score about half as good as the leading technique.

That means it’s not as good as even the average human, but it did considerably better than several other leading reinforcement learning approaches that couldn’t get the hang of the game at all.

The researchers say they have already begun testing their approach on more complex 3D games like Minecraft, with promising early results.

Reinforcement learning has long struggled to make the leap from video games, where the computer has access to a complete model of the world, to the messy uncertainty of physical reality.