Overview

The gameplay of Ms. Pac-Man is very similar to that of the original Pac-Man. The player earns points by eating pellets and avoiding ghosts (contact with one causes Ms. Pac-Man to lose a life). Eating an energizer (or “power pellet”) causes the ghosts to turn blue, allowing them to be eaten for extra points. Bonus fruits can be eaten for increasing point values, twice per round. As the rounds increase, the speed increases, and energizers generally lessen the duration of the ghosts’ vulnerability, eventually stopping altogether.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
15375.0 Human Massively Parallel Methods for Deep Reinforcement Learning
2570.2 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
2250.6 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
2064.1 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1865.9 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
1824.6 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
1401.8 DDQN Deep Reinforcement Learning with Double Q-learning
1263.05 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
1241.3 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
1007.8 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
964.7 Prioritized DQN (rank) Prioritized Experience Replay
850.7 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
763.5 DQN Massively Parallel Methods for Deep Reinforcement Learning
653.7 A3C FF Asynchronous Methods for Deep Reinforcement Learning
594.4 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
197.8 Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Algorithm Source
15693.4 Human Human-level control through deep reinforcement learning
7342.32 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
6951.6 Human Dueling Network Architectures for Deep Reinforcement Learning
6501.71 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
6349 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
6283.5 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
5822 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
5821 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
5546 NoisyNet DuDQN Noisy Networks for Exploration
5380.4 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
4416.9 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
3769.2 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
3749.2 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
3650 DuDQN Noisy Networks for Exploration
3415.05 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
3415 C51 A Distributional Perspective on Reinforcement Learning
3401 NoisyNet A3C Noisy Networks for Exploration
3327.3 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
3233.5 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
3210.0 DDQN Deep Reinforcement Learning with Double Q-learning
3085.6 DQN A Distributional Perspective on Reinforcement Learning
2724.3 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
2722 NoisyNet DQN Noisy Networks for Exploration
2711.4 DDQN A Distributional Perspective on Reinforcement Learning
2674 DQN Noisy Networks for Exploration
2436 A3C Noisy Networks for Exploration
2311 DQN Human-level control through deep reinforcement learning
1692 Linear Human-level control through deep reinforcement learning
1227 Contingency Human-level control through deep reinforcement learning
307.3 Random Human-level control through deep reinforcement learning

Normal Starts

Result Algorithm Source
3908.105 ACER RL Baselines Zoo b76641e
2718.5 ACER Proximal Policy Optimization Algorithm
2363 DQN Ours Deep Recurrent Q-Learning for Partially Observable MDPs
2255.09 PPO RL Baselines Zoo b76641e
2096.5 PPO Proximal Policy Optimization Algorithm
2048 DRQN Deep Recurrent Q-Learning for Partially Observable MDPs
1824 DQN Ours Deep Recurrent Q-Learning for Partially Observable MDPs
1781.818 DQN RL Baselines Zoo b76641e
1739 DRQN Deep Recurrent Q-Learning for Partially Observable MDPs
1626.9 A2C Proximal Policy Optimization Algorithm
1598.776 ACKTR RL Baselines Zoo b76641e
1581.111 A2C RL Baselines Zoo b76641e