endtoendAI

Obstacle Tower 5: Possible Improvements to the Baselines

reinforcement-learning obstacle-tower competition

Feb 4, 2019 • Seungjae Ryan Lee

NOTE. This stub post is for my lab teammates. It will be populated after posts 3 and 4 is published.

The Unity team used Rainbow and PPO agents to test their environments. Although they did not perform any hyperparameter tuning, the team made it clear that neither vanilla Rainbow nor vanilla PPO can solve the 25-floor environment.

Fortunately, they also listed some possible methods that have high potential to improve the score. In this post, you will understand the central idea of each of these methods.

Hierarchy

FuN

FeUdal Networks for Hierarchical Reinforcement Learning

HIRO

Data-Efficient Hierarchical Reinforcement Learning

Intrinsic Motivation

We did not include Brute by Machado et al. and Go-Explore by Ecoffet et al., as they need to exploit determinism of the training environment.

ICM

Curiosity-driven Exploration by Self-supervised Prediction

RND

Exploration by Random Network Distillation

arXiv Slow Paper

CTS

Unifying Count-Based Exploration and Intrinsic Motivation

PixelCNN

Count-Based Exploration with Neural Density Models

Empowerment

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

Meta-Learning

MAML

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

RL^2

Fast Reinforcement Learning via Slow Reinforcement Learning

Model Learning

I2A

Imagination-Augmented Agents for Deep Reinforcement Learning

World Model

World Models

What’s Next?

Related Posts

Obstacle Tower 4: Understanding the Baselines

Obstacle Tower 6: Submitting a Random Agent

comments powered by Disqus