endtoendAI

Obstacle Tower 6: Submitting a Random Agent

 reinforcement-learning  obstacle-tower  competition

We submit a random agent to the Obstacle Tower Challenge that just began.

Obstacle Tower 5: Possible Improvements to the Baselines

 reinforcement-learning  obstacle-tower  competition

We play the Obstacle Tower game to understand the qualities of a successful agent.

Obstacle Tower 4: Understanding the Baselines

 reinforcement-learning  obstacle-tower  competition

We briefly introduce Rainbow and PPO, the two baselines that was tested on Obstacle Tower.

Obstacle Tower 3: Observation Space and Action Space

 reinforcement-learning  obstacle-tower  competition

We analyze the observation space and the action space provided by the Obstacle Tower environment.

Obstacle Tower 2: Playing the Game

 reinforcement-learning  obstacle-tower  competition

We play the Obstacle Tower game to understand the qualities of a successful agent.

Obstacle Tower 1: Installing the Environment

 reinforcement-learning  obstacle-tower  competition

Unity introduced the Obstacle Tower Challenge, a new reinforcement learning contest with a difficult environment. In this post, we guide the readers on installing the environment on Linux using conda.

AI for Prosthetics Week 9 - 10: Unorthodox Approaches

 reinforcement-learning  ai-for-prosthetics  competition

We end the series by exploring possible unorthodox approaches for the competition. These are approaches that deviate from the popular policy gradient methods such as DDPG or PPO.

Pommerman 1: Understanding the Competition

 reinforcement-learning  competition

Pommerman is one of NIPS 2018 Competition tracks, where the participants seek to build agents to compete against other agents in a game of Bomberman. In this post, we simply explain the basics of Pommerman, leaving reinforcement learning to later posts.

AI for Prosthetics Week 6: General Techniques of RL

 reinforcement-learning  ai-for-prosthetics  competition

This week, we take a step back from the competition and study common techniques used in Reinforcement Learning.

AI for Prosthetics Week 5: Understanding the Reward

 reinforcement-learning  ai-for-prosthetics  competition

The goal of reinforcement learning is defined by the reward signal - to maximize the cumulative reward throughout an episode. In some ways, the reward is the most important aspect of the environment for the agent: even if it does not know about values of states or actions (like Evolutionary Strategies), if it can consistently get high return (cumulative reward), it is a great agent.

AI for Prosthetics Week 3-4: Understanding the Observation Space

 reinforcement-learning  ai-for-prosthetics  competition

The observation can be roughly divided into five components: the body parts, the joints, the muscles, the forces, and the center of mass. For each body part component, the agent observes its position, velocity, acceleration, rotation, rotational velocity, and rotational acceleration.

AI for Prosthetics Week 2: Understanding the Action Space

 reinforcement-learning  ai-for-prosthetics  competition

Last week, we saw how a valid action has 19 numbers, each between 0 and 1. The 19 numbers represented the amount of force to put to each muscle. I know barely anything about muscles, so I decided to manually go through all the muscles to understand the effects of each muscle...

AI for Prosthetics Week 1: Understanding the Challenge

 reinforcement-learning  ai-for-prosthetics  competition

The AI for Prosthetics challenge is one of NIPS 2018 Competition tracks. In this challenge, the participants seek to build an agent that can make a 3D model of human with prosthetics run. This challenge is a continuation of the Learning to Run challenge (shown below) that was part of NIPS 2017 Competition Track. The challenge was enhanced in three ways...

I learned DQNs with OpenAI competition

 reinforcement-learning  competition

On April, OpenAI held a two-month-long competition called the Retro Contest where participants had to develop an agent that can achieve perform well on unseen custom-made stages of Sonic the Hedgehog. The agents were limited to 100 million steps per stage and 12 hours of time on a VM with 6 E5-2690v3 cores, 56GB of RAM, and a single K80 GPU.