Overview

In the game, the player controls a scuba diver who must protect a treasure from an octopus at the top of the screen: The octopus tries to capture the treasure with its tentacles. Meanwhile, a great white shark tries to distract the diver by swimming back and forth toward the bottom of the screen.

The diver loses a life if he is captured by the shark or the octopus’s tentacles, or if the air meter runs out. The diver can refill his air meter by touching a long pole which extends from a boat that appears from time to time.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
13637.9 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
12093.7 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
11836.1 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
11686.5 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
11382.3 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
11185.1 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
10497.6 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
10476.1 A3C FF Asynchronous Methods for Deep Reinforcement Learning
9238.5 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
8960.3 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
8738.5 Prioritized DQN (rank) Prioritized Experience Replay
7871.5 DDQN Deep Reinforcement Learning with Double Q-learning
6796.0 Human Massively Parallel Methods for Deep Reinforcement Learning
5614.0 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
5439.9 DQN Massively Parallel Methods for Deep Reinforcement Learning
1747.8 Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Algorithm Source
22682 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
21890 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
21537.2 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
17557 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
15572.5 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
13136.0 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
12983.6 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
12636.5 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
12542 C51 A Distributional Perspective on Reinforcement Learning
12211 NoisyNet DuDQN Noisy Networks for Exploration
11971.1 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
10616.0 DDQN A Distributional Perspective on Reinforcement Learning
9919 DuDQN Noisy Networks for Exploration
9907.2 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
9543.8 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
8798 NoisyNet A3C Noisy Networks for Exploration
8207.8 DQN A Distributional Perspective on Reinforcement Learning
8181 NoisyNet DQN Noisy Networks for Exploration
8179 DQN Noisy Networks for Exploration
8049.0 Human Dueling Network Architectures for Deep Reinforcement Learning
7257 DQN Human-level control through deep reinforcement learning
7168 A3C Noisy Networks for Exploration
6997.1 DDQN Deep Reinforcement Learning with Double Q-learning
6182.16 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
6049.55 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
5719.3 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
4076.2 Human Human-level control through deep reinforcement learning
2500 Linear Human-level control through deep reinforcement learning
2292.3 Random Human-level control through deep reinforcement learning
2247 Contingency Human-level control through deep reinforcement learning

Normal Starts

Result Algorithm Source
8488.0 ACER Proximal Policy Optimization Algorithm
6254.9 PPO Proximal Policy Optimization Algorithm
5961.2 A2C Proximal Policy Optimization Algorithm