Overview

Taking on the role of an explorer grave robbing Tutankhamun’s tomb while exploring dozens of rooms, the player is chased by creatures such as asps, vultures, parrots, bats, dragons, and even curses, all that kill the player on contact. The explorer can fight back by firing lasers at the creatures, but he can only cover the left and right directions. The player is also endowed with a single screen-clearing “flash bomb” per room or life. Finally, each room has warp zones that teleport the player around the room, which enemies cannot use.

To progress, the player collects keys open locked doors throughout the rooms, searching for the large exit door. Optional treasures can be picked-up for bonus points. Each room has a timer; when it reaches zero the explorer can no longer fire lasers, and once a room is cleared the remaining time is converted to bonus points.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
156.3 A3C FF Asynchronous Methods for Deep Reinforcement Learning
144.2 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
138.3 Human Massively Parallel Methods for Deep Reinforcement Learning
126.9 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
124.3 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
118.45 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
108.6 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
96.5 Prioritized DQN (rank) Prioritized Experience Replay
92.2 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
63.6 DDQN Deep Reinforcement Learning with Double Q-learning
56.9 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
48.0 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
33.6 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
32.4 DQN Massively Parallel Methods for Deep Reinforcement Learning
26.1 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
12.7 Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Algorithm Source
314.3 ACKTR Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
312 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
297 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
293 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
292.11 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
280 C51 A Distributional Perspective on Reinforcement Learning
280 DuDQN Noisy Networks for Exploration
275.4 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
272.6 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
269 NoisyNet DuDQN Noisy Networks for Exploration
267.82 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
263.2 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
249.4 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
245.9 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
244.97 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
241.0 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
232 NoisyNet DQN Noisy Networks for Exploration
218 DQN Noisy Networks for Exploration
213 A3C Noisy Networks for Exploration
211.4 DDQN A Distributional Perspective on Reinforcement Learning
211.4 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
190.6 DDQN Deep Reinforcement Learning with Double Q-learning
186.7 DQN Human-level control through deep reinforcement learning
167.6 Human Dueling Network Architectures for Deep Reinforcement Learning
167.6 Human Human-level control through deep reinforcement learning
164 NoisyNet A3C Noisy Networks for Exploration
114.3 Linear Human-level control through deep reinforcement learning
105.22 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
98.2 Contingency Human-level control through deep reinforcement learning
68.1 DQN A Distributional Perspective on Reinforcement Learning
11.4 Random Human-level control through deep reinforcement learning

Normal Starts

Result Algorithm Source
280.8 ACER Proximal Policy Optimization Algorithm
254.4 PPO Proximal Policy Optimization Algorithm
206.8 A2C Proximal Policy Optimization Algorithm