Atari Q*Bert Environment

Overview

The game is played using a single, diagonally mounted four-way joystick. The player controls Q*bert, who starts each game at the top of a pyramid made of 28 cubes, and moves by hopping diagonally from cube to cube. Landing on a cube causes it to change color, and changing every cube to the target color allows the player to progress to the next stage.

At the beginning, jumping on every cube once is enough to advance. In later stages, each cube must be hit twice to reach the target color. Other times, cubes change color every time Q*bert lands on them, instead of remaining on the target color once they reach it. Both elements are then combined in subsequent stages. Jumping off the pyramid results in the character’s death.

A square video game screenshot that is a digital representation of a multicolored pyramid of cubes in front of a black background. An orange spherical character, a red ball, and a purple coiled snake are on the cubes. Multicolored discs are adjacent to the left and right sides of the pyramid. Above the pyramid are statistics related to gameplay. The eponymous Q*bert hops diagonally down the pyramid to avoid Coily, who is pursuing him. The game tracks the player’s progress above the pyramid. The player is impeded by several enemies, introduced gradually to the game:

Coily – Coily first appears as a purple egg that bounces to the bottom of the pyramid and then transforms into a snake that chases after Q*bert.
Ugg and Wrongway – Two purple creatures that hop along the sides of the cubes in an Escheresque manner. Starting at either the bottom left or bottom right corner, they keep moving toward the top right or top left side of the pyramid respectively, and fall off the pyramid when they reach the end.
Slick and Sam – Two green creatures that descend down the pyramid and revert cubes whose color has already been changed.

A collision with purple enemies is fatal to the character, whereas the green enemies are removed from the board upon contact. Colored balls occasionally appear at the second row of cubes and bounce downward; contact with a red ball is lethal to Qbert, while contact with a green one immobilizes the on-screen enemies for a limited time. Multicolored floating discs on either side of the pyramid serve as an escape from danger, particularly Coily. When Qbert jumps on a disc, it transports him to the top of the pyramid. If Coily is in close pursuit of the character, he will jump after Q*bert and fall to his death, awarding bonus points. This causes all enemies and balls on the screen to disappear, though they start to return after a few seconds.

Points are awarded for each color change (25), defeating Coily with a flying disc (500), remaining discs at the end of a stage (at higher stages, 50 or 100) and catching green balls (100) or Slick and Sam (300 each). Bonus points are also awarded for completing a screen, starting at 1,000 for the first screen of Level 1 and increasing by 250 for each subsequent completion. Extra lives are granted for reaching certain scores, which are set by the machine operator.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result	Algorithm	Source
21307.5	A3C LSTM	Asynchronous Methods for Deep Reinforcement Learning
18397.6	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
15148.8	A3C FF	Asynchronous Methods for Deep Reinforcement Learning
15035.9	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
14175.8	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
14063.0	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
13752.3	A3C FF 1 day	Asynchronous Methods for Deep Reinforcement Learning
12740.5	Prioritized DQN (rank)	Prioritized Experience Replay
12085.0	Human	Massively Parallel Methods for Deep Reinforcement Learning
11277.0	Prioritized DDQN (prop, tuned)	Prioritized Experience Replay
11020.8	DDQN (tuned)	Deep Reinforcement Learning with Double Q-learning
10713.3	DDQN	Deep Reinforcement Learning with Double Q-learning
9944.0	Prioritized DDQN (rank, tuned)	Prioritized Experience Replay
7089.83	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
4589.8	DQN	Massively Parallel Methods for Deep Reinforcement Learning
183.0	Random	Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result	Algorithm	Source
572510	QR-DQN-1	Distributional Reinforcement Learning with Quantile Regression
351200.12	IMPALA (deep)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
33817.5	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
27121	NoisyNet DuDQN	Noisy Networks for Exploration
26646	QR-DQN-0	Distributional Reinforcement Learning with Quantile Regression
25750	IQN	Implicit Quantile Networks for Distributional Reinforcement Learning
23784	C51	A Distributional Perspective on Reinforcement Learning
23151.5	ACKTR	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
22956.5	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
21509.2	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
21222.5	Reactor ND	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
19819	DuDQN	Noisy Networks for Exploration
19220.3	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
18901.25	IMPALA (shallow)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
18760.3	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
18586	A3C	Noisy Networks for Exploration
17896	NoisyNet A3C	Noisy Networks for Exploration
16956.0	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
15967.4	A2C	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
15545	NoisyNet DQN	Noisy Networks for Exploration
15088.5	DDQN	A Distributional Perspective on Reinforcement Learning
14875.0	DDQN	Deep Reinforcement Learning with Double Q-learning
13455.0	Human	Dueling Network Architectures for Deep Reinforcement Learning
13455.0	Human	Human-level control through deep reinforcement learning
13117.3	DQN	A Distributional Perspective on Reinforcement Learning
11241	DQN	Noisy Networks for Exploration
10815.55	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
10717.38	IMPALA (deep, multitask)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
10596	DQN	Human-level control through deep reinforcement learning
971.8	TRPO	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
960.3	Contingency	Human-level control through deep reinforcement learning
613.5	Linear	Human-level control through deep reinforcement learning
163.9	Random	Human-level control through deep reinforcement learning

Normal Starts

Result	Algorithm	Source
20025	UCC-I	Trust Region Policy Optimization
18900	Human	Playing Atari with Deep Reinforcement Learning
18880.469	ACER	RL Baselines Zoo b76641e
15316.6	ACER	Proximal Policy Optimization Algorithm
14510.0	PPO	RL Baselines Zoo b76641e
14293.3	PPO	Proximal Policy Optimization Algorithm
10065.7	A2C	Proximal Policy Optimization Algorithm
9569.575	ACKTR	RL Baselines Zoo b76641e
7732.5	TRPO - vine	Trust Region Policy Optimization
7184.73	PPO (MPI)	OpenAI Baselines cbd21ef
7012.06	PPO	OpenAI Baselines cbd21ef
6433.38	ACER	OpenAI Baselines cbd21ef
5742.333	A2C	RL Baselines Zoo b76641e
4500	DQN2013 Best	Playing Atari with Deep Reinforcement Learning
4429.3	ACKTR	OpenAI Baselines cbd21ef
3254.83	DQN	OpenAI Baselines cbd21ef
2486.18	TRPO (MPI)	OpenAI Baselines cbd21ef
2047.07	A2C	OpenAI Baselines cbd21ef
1973.5	TRPO - single path	Trust Region Policy Optimization
1952	DQN2013	Playing Atari with Deep Reinforcement Learning
1800	HNeat Best	Playing Atari with Deep Reinforcement Learning
1325	HNeat Pixel	Playing Atari with Deep Reinforcement Learning
960	Contingency	Playing Atari with Deep Reinforcement Learning
644.345	DQN	RL Baselines Zoo b76641e
614	Sarsa	Playing Atari with Deep Reinforcement Learning
157	Random	Playing Atari with Deep Reinforcement Learning