Overview

Tennis offers singles matches for one or two players; one player is colored pink, the other blue. The game has two user-selectable speed levels. When serving and returning shots, the tennis players automatically swing forehand or backhand as the situation demands, and all shots automatically clear the net and land in bounds.

The first player to win one six-game set is declared the winner of the match (if the set ends in a 6-6 tie, the set restarts from 0-0). This differs from professional tennis, in which player must win at least two out of three six-game sets.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
22.6 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
11.0 DDQN Deep Reinforcement Learning with Double Q-learning
4.4 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
-0.69 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
-2.0 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
-2.2 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
-2.3 Prioritized DQN (rank) Prioritized Experience Replay
-2.3 DQN Massively Parallel Methods for Deep Reinforcement Learning
-5.3 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
-6.3 A3C FF Asynchronous Methods for Deep Reinforcement Learning
-6.4 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
-6.7 Human Massively Parallel Methods for Deep Reinforcement Learning
-7.8 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
-10.2 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
-13.2 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
-21.4 Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Algorithm Source
23.7 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
23.6 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
23.6 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
23.6 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
23.6 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
23.4 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
23.3 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
23.1 C51 A Distributional Perspective on Reinforcement Learning
12.2 DQN A Distributional Perspective on Reinforcement Learning
10.87 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
8 DQN Noisy Networks for Exploration
5.1 DDQN A Distributional Perspective on Reinforcement Learning
5.1 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
1.7 DDQN Deep Reinforcement Learning with Double Q-learning
0.55 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
0.0 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
0 Contingency Human-level control through deep reinforcement learning
0 NoisyNet DQN Noisy Networks for Exploration
0 NoisyNet A3C Noisy Networks for Exploration
0 DuDQN Noisy Networks for Exploration
0 NoisyNet DuDQN Noisy Networks for Exploration
-0.0 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
-0.1 Linear Human-level control through deep reinforcement learning
-1.89 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
-2.5 DQN Human-level control through deep reinforcement learning
-6 A3C Noisy Networks for Exploration
-8.12 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
-8.3 Human Dueling Network Architectures for Deep Reinforcement Learning
-8.9 Human Human-level control through deep reinforcement learning
-23.8 Random Human-level control through deep reinforcement learning

Normal Starts

Result Algorithm Source
-14.8 PPO Proximal Policy Optimization Algorithm
-17.6 ACER Proximal Policy Optimization Algorithm
-22.2 A2C Proximal Policy Optimization Algorithm