Subscribe to RL Weekly

Get the highlights of reinforcement learning in both research and industry every week.

MineRL: Learn Minecraft from Human Priors

William H. Guss*1, Cayden Codel, Katja Hofmann2, Brandon Houghton1, Noboru Kuno2, Stephanie Milani3, Sharada Mohanty4, Diego Perez Liebana5, Ruslan Salakhutdinov1, Nicholay Topin1, Manuela Veloso1, Phillip Wang1

1Carnegie Mellon University 2Microsoft Research 3University of Maryland 4AICrowd 5Queen Mary University of London

MineRL Competition Task

What it is

MineRL is a competition for the upcoming NeurIPS 2019 conference. The competition uses the Minecraft environment, and the goal of the participants is to train the agent to obtain diamonds. This is a very difficult task, so the organizers also provide the MineRL dataset, which is a large-scale dataset of human demonstrations.

The competition started a few days ago and will end on October 25th. According to the organizers, Preferred Networks will be releasing a set of baselines for the competition soon.

Why it matters

Reinforcement learning competitions are amazing opportunities for new RL researchers to gain first-hand experience. MineRL offers a unique opportunity by providing human demonstration data. It is difficult for individual researchers to collect large amount of demonstrations to test their ideas. The competition alleviates this problem and allows researchers to implement their own algorithms without worrying about collecting data.

Read more

External Resources

Off-Policy Evaluation via Off-Policy Classification

Alex Irpan1, Kanishka Rao1, Konstantinos Bousmalis2, Chris Harris1, Julian Ibarz1, Sergey Levine13

1Google Brain 2DeepMind 3UC Berkeley


What it is & Why it matters

Traditionally, a trained agent is evaluated by interacting with the target environment. Although this is feasible when the target environment is a simulated environment, it may be problematic in real-life applications like robotics. In these cases, off-policy evaluation (OPE) methods should be used. Different from existing OPE methods that require a good model of the environment or use importance sampling, this paper frames OPE as a “positive-unlabeled” classification problem. A state-action pair is labeled “effective” if an optimal policy can achieve success in that situation, and “catastrophic” otherwise. The intuition lies in that a well-learnt Q-function should return high value for effective state-action pair and low value for catastrophic state-action pair.

Read more

Towards Interpretable Reinforcement Learning Using Attention Augmented Agents

Alex Mott*1, Daniel Zoran*1, Mike Chrzanowski1, Daan Wierstra1, Danilo J. Rezende1


What it is & Why it matters

The authors propose a LSTM architecture with a soft, top-down, spatial attention mechanism. The paper is not the first to propose using attention in RL agents, but the numerous experiments show how attention can be used to qualitatively evaluate and interpret agents’ abilities. The project website below shows how attention can be used to understand how the agent reacts to novel states, how the agent plans, and what the agent’s strategy is.

Read more

Some more exciting news in RL: