Overview

Pong is a two-dimensional sports game that simulates table tennis. The player controls an in-game paddle by moving it vertically across the left or right side of the screen. They can compete against another player controlling a second paddle on the opposing side. Players use the paddles to hit a ball back and forth. The goal is for each player to reach eleven points before the opponent; points are earned when one fails to return the ball to the other.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
19.1 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
19.0 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
18.9 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
18.9 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
18.9 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
18.8 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
18.7 Prioritized DQN (rank) Prioritized Experience Replay
18.4 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
17.7 DDQN Deep Reinforcement Learning with Double Q-learning
16.71 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
16.2 DQN Massively Parallel Methods for Deep Reinforcement Learning
15.5 Human Massively Parallel Methods for Deep Reinforcement Learning
11.4 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
10.7 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
5.6 A3C FF Asynchronous Methods for Deep Reinforcement Learning
-18.0 Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Algorithm Source
21.0 DDQN Deep Reinforcement Learning with Double Q-learning
21.0 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
21 NoisyNet DQN Noisy Networks for Exploration
21 DuDQN Noisy Networks for Exploration
21 NoisyNet DuDQN Noisy Networks for Exploration
21.0 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
21.0 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
21.0 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
20.98 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
20.9 ACKTR Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
20.9 DDQN A Distributional Perspective on Reinforcement Learning
20.9 C51 A Distributional Perspective on Reinforcement Learning
20.9 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
20.9 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
20.8 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
20.7 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
20.7 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
20.6 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
20.4 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
20 DQN Noisy Networks for Exploration
19.9 A2C Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
19.5 DQN A Distributional Perspective on Reinforcement Learning
18.9 DQN Human-level control through deep reinforcement learning
18.3 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
14.6 Human Dueling Network Architectures for Deep Reinforcement Learning
12 NoisyNet A3C Noisy Networks for Exploration
9.3 Human Human-level control through deep reinforcement learning
8.58 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
7 A3C Noisy Networks for Exploration
-1.2 TRPO Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
-17.4 Contingency Human-level control through deep reinforcement learning
-19 Linear Human-level control through deep reinforcement learning
-20.7 Random Human-level control through deep reinforcement learning

Normal Starts

Result Algorithm Source
21 DQN2013 Best Playing Atari with Deep Reinforcement Learning
21 UCC-I Trust Region Policy Optimization
21.0 DQN RL Baselines Zoo b76641e
20.9 TRPO - single path Trust Region Policy Optimization
20.9 TRPO - vine Trust Region Policy Optimization
20.7 ACER Proximal Policy Optimization Algorithm
20.7 PPO Proximal Policy Optimization Algorithm
20.667 ACER RL Baselines Zoo b76641e
20.507 PPO RL Baselines Zoo b76641e
20 DQN2013 Playing Atari with Deep Reinforcement Learning
19.7 A2C Proximal Policy Optimization Algorithm
19.224 ACKTR RL Baselines Zoo b76641e
19 HNeat Best Playing Atari with Deep Reinforcement Learning
18.973 A2C RL Baselines Zoo b76641e
16.49 DQN OpenAI Baselines cbd21ef
13.9 PPO (MPI) OpenAI Baselines cbd21ef
13.68 PPO OpenAI Baselines cbd21ef
12.1 DRQN Deep Recurrent Q-Learning for Partially Observable MDPs
9.56 ACKTR OpenAI Baselines cbd21ef
3.11 ACER OpenAI Baselines cbd21ef
2.82 TRPO (MPI) OpenAI Baselines cbd21ef
1.0 A2C OpenAI Baselines cbd21ef
-3 Human Playing Atari with Deep Reinforcement Learning
-9.9 DQN Ours Deep Recurrent Q-Learning for Partially Observable MDPs
-16 HNeat Pixel Playing Atari with Deep Reinforcement Learning
-17 Contingency Playing Atari with Deep Reinforcement Learning
-19 Sarsa Playing Atari with Deep Reinforcement Learning
-20.4 Random Playing Atari with Deep Reinforcement Learning