Atari Amidar Environment

Overview

As in Pac-Man, the player is opposed by enemies who kill on contact. The enemies gradually increase in number as the player advances from one level to the next, and their speed also increases. On odd-numbered levels, the player controls an ape (in some versions labeled “Copier”), and must collect coconuts while avoiding headhunters (labeled “Police” and “Thief”). On even-numbered levels, the player controls a paint roller (labeled “Rustler”), and must paint over each spot of the board while avoiding pigs (labeled “Cattle” and “Thief”). Each level is followed by a short bonus stage.

Whenever a rectangular portion of the board is cleared (either by collecting all surrounding coconuts, or painting all surrounding edges), the rectangle is colored in, and in the even levels, bonus points are awarded (In odd-numbered levels, the player collects points for each coconut eaten). When the player clears all four corners of the board, he is briefly empowered to kill the enemies by touching them (just as when Pac-Man uses a “power pill”). Enemies killed in this way fall to the bottom of the screen and revitalise themselves after a few moments.

The game controls consist of a joystick and a single button labeled “Jump,” which can be used up to three times, resetting after a level is cleared or the player loses a life. Pressing the jump button does not cause the player to jump, but causes all the enemies to jump, enabling the player to walk under them.

Extra lives are given at 50,000 points, and per 80,000 scored up to 930,000; after that, no more lives.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result	Algorithm	Source
1540.4	Human	Massively Parallel Methods for Deep Reinforcement Learning
283.9	A3C FF 1 day	Asynchronous Methods for Deep Reinforcement Learning
263.9	A3C FF	Asynchronous Methods for Deep Reinforcement Learning
238.4	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
237.7	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
218.4	Prioritized DDQN (prop, tuned)	Prioritized Experience Replay
202.8	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
189.15	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
188.2	DDQN	Deep Reinforcement Learning with Double Q-learning
173.0	A3C LSTM	Asynchronous Methods for Deep Reinforcement Learning
172.7	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
169.1	DDQN (tuned)	Deep Reinforcement Learning with Double Q-learning
133.4	DQN	Massively Parallel Methods for Deep Reinforcement Learning
129.1	Prioritized DDQN (rank, tuned)	Prioritized Experience Replay
98.9	Prioritized DQN (rank)	Prioritized Experience Replay
11.8	Random	Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result	Algorithm	Source
5131.2	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
3537	NoisyNet DuDQN	Noisy Networks for Exploration
2946	IQN	Implicit Quantile Networks for Distributional Reinforcement Learning
2726	QR-DQN-0	Distributional Reinforcement Learning with Quantile Regression
2354.5	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
2296.8	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
2296	DuDQN	Noisy Networks for Exploration
1793.3	DDQN	A Distributional Perspective on Reinforcement Learning
1735	C51	A Distributional Perspective on Reinforcement Learning
1719.5	Human	Dueling Network Architectures for Deep Reinforcement Learning
1675.8	Human	Human-level control through deep reinforcement learning
1641	QR-DQN-1	Distributional Reinforcement Learning with Quantile Regression
1610	NoisyNet DQN	Noisy Networks for Exploration
1554.79	IMPALA (deep)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
1546.8	Reactor ND	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
1267.9	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
1189.7	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
1059.4	ACKTR	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
1015.8	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
978	DQN	A Distributional Perspective on Reinforcement Learning
924	DQN	Noisy Networks for Exploration
904	A3C	Noisy Networks for Exploration
833	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
739.5	DQN	Human-level control through deep reinforcement learning
702.1	DDQN	Deep Reinforcement Learning with Double Q-learning
497.62	IMPALA (shallow)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
491	NoisyNet A3C	Noisy Networks for Exploration
183.6	Contingency	Human-level control through deep reinforcement learning
136.82	IMPALA (deep, multitask)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
103.4	Linear	Human-level control through deep reinforcement learning
5.8	Random	Human-level control through deep reinforcement learning

Normal Starts

Result	Algorithm	Source
827.6	ACER	Proximal Policy Optimization Algorithm
674.6	PPO	Proximal Policy Optimization Algorithm
380.8	A2C	Proximal Policy Optimization Algorithm