World Discovery Models

Global Architecture of the NDIGO Agent

What it is

DeepMind introduced a new agent that incorporates Neural Differential Information Gain Optimisation (NDIGO) to build an accurate world model. NDIGO measures novelty through how helpful a new observation was in predicting future observations. The authors show that NDIGO outperforms state of the art methods in terms of the quality of the learned representation.

Why it matters

Although reinforcement learning has achieved remarkable success, it is largely restricted to environments where a good external reward is provided. There have been various efforts to encourage the agent to explore and discover. However, these methods tend to have the “noisy-TV problem,” where the agent mistakes random patterns in the world as novel observations (For more details, read the papers in External Resources). Thus, most existing algorithms do not generalize well to stochastic and partially observable environments. The authors show that NDIGO can handle such environments.

Read more

External Resources

MuJoCo Soccer Environment

MuJoCo Soccer Environment

What it is

Google DeepMind open-sourced a new multi-agent environment (2-vs-2) where agents play soccer on a simulated physics environment. The authors also provide a baseline that uses population-based training, reward shaping, policy recurrence, and decomposed action-value function.

Why it matters

The traditional reinforcement learning benchmark has been the Atari 2600 games with the Arcade Learning Environment. However, as of 2019, it is safe to say that the benchmark has completed its main purpose, as reinforcement learning agents have achieved superhuman scores on almost all games. Thus, a more challenging testbed is now needed to accelerate progress. DeepMind seems to be interested on multiagent reinforcement learning environments, open sourcing Capture the Flag, Hanabi, and Soccer environment.

Read more

External Resources

PlaNet: A Deep Planning Network for RL

Learned Latent Dynamics Model

What it is

The Deep Planning Network (PlaNet) agent is a model-based agent that learns a latent dynamics model. PlaNet trained on 2K episodes outperforms model-free A3C trained on 100K episodes on all tasks and have similar performance to D4PG trained on 100K episodes.

Read more


Here are some other exciting news in RL: