In the last post, we installed the Obstacle Tower environment locally and verified that it was installed correctly. In this post, we will take a look at the game itself. It is crucial to understand the environment well to debug and improve the agents’ performance.

How to Run Game

Download the environment if you have not already:

Platform Download Link
Linux (x86_64)
Mac OS X

Navigate to the folder you unzipped the downloaded environment. If you followed the guide, it should be under examples/ObstacleTower. Run the obstacletower.x86_64 by double-clicking. After the Unity logo, the game should start in a few seconds.

Here are the keyboard controls:

Keyboard Key Action
W, S, A, D Move character forward / backwards / left / right
K, L Rotate camera left / right
Space Jump

Procedural Generation

One of the exciting features of this environment is how nondeterministic it is. Every level and room is procedurally generated, along with variations on textures, lighting conditions, and object geometry. Thus, to train an agent that performs well in different levels, the agent must learn to generalize for different visuals and different floor structures.

I have sampled a picture from each floor of a single playthrough. The floors look similar to the human eye, but the color and theme variations could be quite misleading to the convolutional neural network (CNN).


There are four types of doors in the game:

  • a normal door that leads to the next room,
  • a locked door, and
  • a puzzle door, and
  • the door that leads to the next floor.

Opening a door give a reward of 0.1 if the dense reward function is used, and 0 if the sparse reward function is used.

Normal Door

Normal doors have green symbols, and they can opened simply by touching it. Note that the door’s design changes with the theme of the floor.

Locked Door

Locked doors have red symbols, and can only be opened if the player has a “key.” (see description below). By consuming a key, the player permanently unlocks the door.

Puzzle Door

Puzzle doors have purple symbols, and they are locked until the user completes the “puzzle room” (see description below).

Next Floor Door

Next floor doors have yellow arrow symbols, and can be opened simply by touching. Upon entering this door, the episode terminates, and a new episode begins with newly generated floor.


Time Orb

In some rooms, there are blue glowing orbs. Upon running into these orbs, the players’ countdown timer increases by 500. There are often rooms with multiple time orbs.


Keys are yellow key-shaped objects that the player can pick up. It is used to open “locked doors” (see description above). After opening a locked door, the key vanishes.

When you pick up a key, a key symbol appears in the top of your screen.

Puzzle Room


The Sokoban room requires the player to push a purple box onto a white plate. If the players wants to reset the Sokoban puzzle, moving to the red plate resets the puzzle. Upon completing the puzzle, the Sokoban door opens.


Jumping is not necessary in most floors. However, as the episode gets harder, jump becomes necessary.

On the left, we see a key on a cylindrical block. To complete the episode, the key must be collected, which requires jumping twice.

On the right, we see a map with pits. Falling into a pit results in an episode termination, so the player must move around or jump across the pits.


After playing the game for just 5-10 minutes, I came up with many ideas that could dramatically reduce the training time of the agent. I highly recommend others also interested in this environment to play the game themselves.

From my playthrough, a few things became clear to me:

  • Not all 54 actions are needed.
  • Extracting features from raw visual input would be harder than Montezuma’s Revenge for CNNs.
  • Distilling game knowledge is essential. (ex. Hierarchical Reinforcement Learning)

What’s Next?

Now that I played the game myself, I should now check how the agent would see and play the game. In the next post, I will check the observation space and the action space of the agent. Afterwards, I will run a simple baseline agent and discuss possible improvements to this baseline agents, listing some noteworthy papers.