Overview

The player controls a green stick man. Using a joystick and a firing button that activates a laser-like weapon, the player navigates a simple maze filled with many robots, who fire lasers back at the player character. A player can be killed by being shot, by running into a robot or an exploding robot, coming into contact with the electrified walls of the maze itself, or by being touched by the player’s nemesis, Evil Otto.

The function of Evil Otto, represented by a bouncing smiley face, is to quicken the pace of the game. Otto is unusual, with regard to games of the period, in that there is no way to kill him. Otto can go through walls with impunity and is attracted to the player character. If robots remain in the maze Otto moves slowly, about half as fast as the humanoid, but he speeds up to match the humanoid’s speed once all the robots are killed. Evil Otto moves exactly the same speed as the player going left and right but he can move faster than the player going up and down; thus, no matter how close Otto is, the player can escape as long as they can avoid moving straight up or down.

The player advances by escaping from the maze through an opening in the far wall. Each robot destroyed is worth 50 points. Ideally, all the robots in the current maze have been destroyed before the player escapes, thus gaining the player a per-maze bonus (ten points per robot). The game has 65,536 rooms (256x256 grid), but due to limitations of the random number generation there are fewer than 1024 maze layouts (876 unique). It has only one controller, but two-player games can be accomplished by alternating at the joystick. The game is most difficult when the player enters a new maze, as there is only a short interval between entering the maze and all the robots in range firing at the player. For the beginner, this often means several deaths in rapid succession, as each death means starting a new maze layout.

As a player’s score increases, the colors of the enemy robots change, and the robots can have more bullets on the screen at the same time. Once they reach the limit of simultaneous onscreen bullets, they cannot fire again until one or more of their bullets detonates; the limit applies to the robots as a group, not as individuals.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
2237.5 Human Deep Reinforcement Learning with Double Q-learning
2237.5 Human Dueling Network Architectures for Deep Reinforcement Learning
2178.6 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
1793.4 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
1433.4 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
1165.6 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
1011.1 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
1000.0 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
910.6 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
865.9 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
862.2 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
817.9 A3C FF Asynchronous Methods for Deep Reinforcement Learning
644.0 Prioritized DQN (rank) Prioritized Experience Replay
635.8 DDQN Deep Reinforcement Learning with Double Q-learning
493.4 DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
196.1 Random Deep Reinforcement Learning with Double Q-learning

No-op Starts

Result Algorithm Source
34798 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
3409.0 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
3117 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
2630.4 Human Dueling Network Architectures for Deep Reinforcement Learning
2545.6 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
2303.1 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
1896 NoisyNet DuDQN Noisy Networks for Exploration
1852.7 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
1645 C51 A Distributional Perspective on Reinforcement Learning
1641.4 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
1515.7 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
1472.6 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
1421.8 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1235 NoisyNet A3C Noisy Networks for Exploration
1225.4 DDQN A Distributional Perspective on Reinforcement Learning
1122 DuDQN Noisy Networks for Exploration
1053 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
1022 A3C Noisy Networks for Exploration
927.2 ACKTR Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
905 NoisyNet DQN Noisy Networks for Exploration
888.3 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
647.8 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
634 DQN Noisy Networks for Exploration
585.6 DQN A Distributional Perspective on Reinforcement Learning
585.6 DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
123.7 Random Dueling Network Architectures for Deep Reinforcement Learning

Normal Starts

| Result | Algorithm | Source | |——–|———–|——–|