NOTE. This stub post is for my lab teammates. It will be populated after posts 3 and 4 is published.

The Unity team used Rainbow and PPO agents to test their environments. Although they did not perform any hyperparameter tuning, the team made it clear that neither vanilla Rainbow nor vanilla PPO can solve the 25-floor environment.

Fortunately, they also listed some possible methods that have high potential to improve the score. In this post, you will understand the central idea of each of these methods.

Hierarchy

FuN

FeUdal Networks for Hierarchical Reinforcement Learning

arXiv

HIRO

Data-Efficient Hierarchical Reinforcement Learning

arXiv

Intrinsic Motivation

We did not include Brute by Machado et al. and Go-Explore by Ecoffet et al., as they need to exploit determinism of the training environment.

ICM

Curiosity-driven Exploration by Self-supervised Prediction

arXiv

RND

Exploration by Random Network Distillation

arXiv Slow Paper

CTS

Unifying Count-Based Exploration and Intrinsic Motivation

arXiv

PixelCNN

Count-Based Exploration with Neural Density Models

arXiv

Empowerment

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

arXiv

Meta-Learning

MAML

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

arXiv

RL^2

Fast Reinforcement Learning via Slow Reinforcement Learning

arXiv

Model Learning

I2A

Imagination-Augmented Agents for Deep Reinforcement Learning

arXiv

World Model

World Models

arXiv

What’s Next?